Extracted%20from%3A%20
Page range: PDF page 234-246 (printed pages 214-226)
---
214 Predictive Methods Using Protein Sequences
L
A
R
N
D
C
Q
E
G
H
I
L
K
M
F
P
S
T
W
Y
V
KD I NFKI ERG QL LA VAG STG AG KTSL LM VI M G ELEPSEG KI KHS
Figure 7 .9 From predicting single amino acid sequence variant (SAV) effects to landscapes of suscepti-
bility to change. Shown are the SNAP2 (Hecht et al. 2015) predictions resulting from a complete in silico
mutagenesis study for the cystic fibrosis transmembrane conductance regulator, as shown by Predict-
Protein (Yachdav et al. 2014). Prediction results are represented as a heatmap in which every column
corresponds to one residue in the sequence. Rows represent mutations to all other non-native amino
acids. Note that not all those SAVs are reachable by a single nucleotide variant (SNV). SAVs predicted
as neutral are highlighted in green, while those predicted to affect molecular function are in red. Syn-
onymous mutations are black. While traditionally focusing on a few select variants that are of particular
interest, some modern tools are computationally efficient and accurate enough to predict the effect of
every possible variant in a protein. While predictors are not sensitive enough to regard every high-scoring
variant as potentially interesting, such an approach allows for the identification of sites that may be func-
tionally important, as the effect of every possible kind of variation for that residue is predicted. Here,
the clustering of high-effect scores falls exactly into the known nucleotide binding region from residues
458–465. This approach represents one very effective way in which the application of residue-based
prediction tools can lead to knowledge at the level of the whole protein.
on a dataset containing variants known to affect protein function. Using a different set of
variants that affect the function of transcription factor TP53, performance was significantly
higher(0.83–0.87).Performancewasevenhigheronasetofvariantsimplicatedinhumandis-
ease (0.83–0.94). Other published rankings of methods often disagree widely, depending on
thedatasetsused(Thusbergetal.2011;Dongetal.2015;Grimmetal.2015).Thishighvaria-
tionexemplifiesthedifficultiesinevaluatingtheeffectofaSAV.Onereasonforthevariation
isthatthetoolsdescribedabovepredictdifferenttypesofeffects,suchastheeffectonmolec-
ular function, a pathway, or the organism in general. One extreme example of the problem
is that SIFT, PolyPhen-2, and SNAP2 predictions seem to generally agree on the base set of
SAVswith knownexperimentalinformationthatshouldbeusedasthebasisforpredictions,
butdiffersubstantiallyonhowtoapplynever-before-seendatasuchasthenaturalsequence
variationobservedbetween60000individuals(Mahlichetal.2017)orbyrandomSAVs(Reeb
etal.2016).Anotherissueatplayisascertainmentbias,wherewell-studiedproteinsaremore
likelytobeincludedintrainingdata,leadingtooverestimationofgeneralizationperformance
(Grimmetal.2015).Giventheseissuesandtheoverallestimatedperformanceofthesetools,
theyarebestusedtogeneratehypothesesforfurthertesting(Miosgeetal.2015).
Summary
Seminal discoveries made in the 1960s by Anfinsen and others have clearly established that
the sequence of a protein determines its structure and, ultimately, its function. Owing to
therelativesimplicitywithwhichproteinsequencescanbeobtainedexperimentally,alarge
enterprisedevotedtopredictingstructureandfunctionfromsequencehasemerged.Structure
predictionhastremendouslymaturedtothepointthatsomeaspectsmaybeconsideredsolved,
atleasttotheextentthatcurrentexperimentaldataallow(Hopfetal.2012).Despitethesesub-
stantialadvances,thegeneralproblemofpredictingproteinfunctionfromsequencehasnot
Summary 215
beensolved.The1Dpredictionmethodspresentedinthischapter(secondarystructure,trans-
membrane,solventaccessibility,anddisorder)areimportantasinputtohigherlevelprediction
methods. Fortunately, given the wide range of prediction methods available, many of which
arediscussedinthischapter,itispossibletoannotateproteinsequenceswithamultitudeof
information, even without any prior knowledge. These predictions will still clearly contain
errors–but,oncetheuserunderstandsthestrengthsandweaknessesofeachmethod,these
toolscanbeincrediblyusefultowardallowingtheusertofilterthedelugeofsequencedatagen-
eratedtodayand,hopefully,generatehypothesesthatcanbeexperimentallytested.Giventhat
errorsmayexistinthedatathatthesemethodsdependupon,itisimportanttoidentifythepri-
maryevidenceusedforproteinfunctionprediction,whetherusingbest-in-classtools,mapped
by high-throughput experiments, or those carefully gathered by experts based on detailed
experiments. No existing resource informs users about these situations. Thus, the analysis
oftheproteinsthatyouareinterestedinistypicallydonebestbyusingthebest,appropriate
predictiontoolsforanyparticularquestion,alongwiththemostreliabledatabaseannotations.
Internet Resources
Essentialdatabasesandpredictionevaluations
CAFA biofunctionprediction.org/cafa
CAGI genomeinterpretation.org
CASP predictioncenter.org
CATH www.cathdb.info
InterPro www.ebi.ac.uk/interpro
neXtProt www.nextprot.org
PDB www.wwpdb.org
Pfam pfam.xfam.org
SCOP2 scop2.mrc-lmb.cam.ac.uk
UniProtKB www.uniprot.org
Predictionofproteinstructure
BETAWARE biocomp.unibo.it/savojard/betawarecl
BOCTOPUS2 boctopus.bioinfo.se
PolyPhobius%20phobius.sbc.su.se/poly.html
POODLE cblab.my-pharm.ac.jp/poodle
PrDOS prdos.hgc.jp/cgi-bin/top.cgi
Proteus wks80920.ccis.ualberta.ca/proteus
Proteus2 www.proteus2.ca/proteus2
PSIPRED,MEMSAT-SVM,and
DISOPRED3
bioinf.cs.ucl.ac.uk/psipred
RaptorX raptorx.uchicago.edu/StructurePropertyPred/predict
ReProf,TMSEG,andMeta-Disorder predictprotein.org
SPIDER3 sparks-lab.org/server/SPIDER3
SSpro5,ACCpro5 scratch.proteomics.ics.uci.edu
Predictionofproteinfunction
CADD cadd.gs.washington.edu
DeepLoc www.cbs.dtu.dk/services/DeepLoc
DomCut www.bork.embl-heidelberg.de/~suyama/domcut
DomPred,FFPred3.0,COGIC bioinf.cs.ucl.ac.uk/psipred
DOMpro scratch.proteomics.ics.uci.edu
DP-Bind lcg.rit.albany.edu/dp-bind
216 Predictive Methods Using Protein Sequences
FunFams www.cathdb.info/search/by_sequence
HomPPI ailab1.ist.psu.edu/PSHOMPPIv1.3
HomPRIP-NB%20ailab1.ist.psu.edu/HomPRIP-NB/index.html
LocTree3 rostlab.org/services/loctree3
MultiLoc2 abi-services.informatik.uni-tuebingen.de/multiloc2/webloc.cgi
PolyPhen-2 genetics.bwh.harvard.edu/pph2
Pprint crdd.osdd.net/raghava/pprint
PROVEAN provean.jcvi.org/index.php
PSIVER mizuguchilab.org/PSIVER
RNABindRPlus ailab1.ist.psu.edu/RNABindRPlus
ScoobyDomain www.ibi.vu.nl/programs/scoobywww
SIFT sift.bii.a-star.edu.sg
SNAP2 rostlab.org/services/snap2web
SomeNA,Metastudent,Ofran,and
RostPPIpredictor
www.predictprotein.org
Further Reading
Keskin,O.,Tuncbag,N.,andGursoy,A.(2016).Predictingprotein-proteininteractionsfromthe
moleculartotheproteomelevel. Chem.Rev. 116:4884–4909.Keskinetal.giveanexpansive
overviewofproteinbindinginallitsfacets,cove ringprotein–proteinandprotein–nucleicacid
bindingontheproteinandresiduelevel,aswellasadditionaltopicsnotcoveredinthischapter,
suchasdockingandotherpredictionalgorithmsbasedonproteinstructureinsteadofsequence.
Moult,J.,Fidelis,K.,Kryshtafovych,A.etal.(2016).Criticalassessmentofmethodsofprotein
structureprediction:progressandnewdirectionsinroundXI. Proteins84(Suppl1):4–14.This
isthemostrecentevaluationoftheCASPexperiment,whichisanindependentassessmentof
allmajoraspectsofproteinstructureprediction.Similarly,readersinterestedinfunction
predictionshouldinvestigateCAFA(Jiangetal.2016)andCAGI(seeInternetResources)for
varianteffectprediction.
References
Adzhubei,I.A.,Schmidt,S.,Peshkin,L.etal.(2010).Amethodandserverforpredictingdamaging
missensemutations. Nat.Methods. 7:248–249.
Ahmad,S.andSarai,A.(2005).PSSM-basedpredictionofDNAbindingsitesinproteins. BMC
Bioinf.6:33.
Akiva,E.,Brown,S.,Almonacid,D.E.etal.(2014).Thestructure-functionlinkagedatabase.
NucleicAcidsRes. 42:521–530.
Allis,C.D.andJenuwein,T.(2016).Themolecularhallmarksofepigeneticcontrol. Nat.Rev.
Genet.17:487–500.
AlmagroArmenteros,J.J.,Sønderby,C.K.,Sønderby,S.K.etal.(2017).DeepLoc:predictionof
proteinsubcellularlocalizationusingdeeplearning. Bioinformatics.33:3387–3395.
Aloy,P.,Stark,A.,Hadley,C.,andRussell,R.B.(2003).Predictionswithouttemplates:newfolds,
secondarystructure,andcontactsinCASP5. ProteinsStruct.Funct.Genet. 53
(Suppl6):436–456.
Altschul,S.F.andGish,W.(1996).Localalignmentstatistics. MethodsEnzymol. 266:460–480.
Andreeva,A.,Howorth,D.,Chothia,C.etal.(2014).SCOP2prototype:anewapproachtoprotein
structuremining. NucleicAcidsRes. 42:310–314.
References 217
Anfinsen,C.B.(1973).Principlesthatgovernthefoldingofproteinchains. Science.181:223–230.
Ashkenazi,S.,Snir,R.,andOfran,Y.(2012).Assessingtherelationshipbetweenconservationof
functionandconservationofsequenceusingphotosyntheticproteins. Bioinformatics.28:
3203–3210.
Attwood,T.K.,Coletta,A.,Muirhead,G.etal.(2012).ThePRINTSdatabase:afine-grainedprotein
sequenceannotationandanalysisresource-itsstatusin2012. Database.2012:1–9.
Auton,A.,Abecasis,G.R.,Altshuler,D.M.etal.(2015).Aglobalreferenceforhumangenetic
variation.Nature.526:68–74.
Bairoch,A.andBoeckmann,B.(1994).TheSWISS-PROTproteinsequencedatabank:current
status.NucleicAcidsRes. 22:3578–3580.
Berman,H.M.,Westbrook,J.,Feng,Z.etal.(2000).Theproteindatabank. NucleicAcidsRes. 28:
235–242.
Bernhofer,M.,Kloppmann,E.,Reeb,J.,andRost,B.(2016).TMSEG:novelpredictionof
transmembranehelices. Proteins84:1706–1716.
Blum,T.,Briesemeister,S.,andKohlbacher,O.(2009).MultiLoc2:integratingphylogenyandgene
ontologytermsimprovessubcellularproteinlocalizationprediction. BMCBioinf. 10:274.
Boutet,E.,Lieberherr,D.,Tognolli,M.etal.(2016).UniProtKB/Swiss-Prot,themanually
annotatedsectionoftheUniProtknowledgeBase:howtousetheentryview. MethodsMol.Biol.
1374:23–54.
Bru,C.,Courcelle,E.,Carrère,S.etal.(2005).TheProDomdatabaseofproteindomainfamilies:
moreemphasison3D. NucleicAcidsRes. 33:212–215.
Bryson,K.,Cozzetto,D.,andJones,D.T.(2007).Computer-assistedproteindomainboundary
predictionusingtheDomPredserver. Curr.ProteinPept.Sci. 8:181–188.
Buchan,D.W.A.,Minneci,F.,Nugent,T.C.O.etal.(2013).ScalablewebservicesforthePSIPRED
proteinanalysisworkbench. NucleicAcidsRes. 41:349–357.
Chen,X.W.andJeong,J.C.(2009).Sequence-basedpredictionofproteininteractionsiteswithan
integrativemethod. Bioinformatics.25:585–591.
Chen,P.andLi,J.(2010).Sequence-basedidentificationofinterfaceresiduesbyanintegrative
profilecombininghydrophobicandevolutionaryinformation. BMCBioinf. 11:402.
Chen,C.P.,Kernytsky,A.,andRost,B.(2002).Transmembranehelixpredictionsrevisited. Protein
Sci.11:2774–2791.
Cheng,J.,Sweredoski,M.J.,andBaldi,P.(2006).DOMpro:proteindomainpredictionusing
profiles,secondarystructure,relativesolventaccessibility,andrecursiveneuralnetworks. Data
Min.Knowl.Discovery 13:1–10.
Choi,Y.,Sims,G.E.,Murphy,S.etal.(2012).Predictingthefunctionaleffectofaminoacid
substitutionsandindels. PLoSOne 7(10):e46688.
Chou,P.Y.andFasman,G.D.(1974).Predictionofproteinconformation. Biochemistry.13(2):
222–245.
Claros,M.G.andVonHeijne,G.(1994).TopPredII:animprovedsoftwareformembraneprotein
structurepredictions. Comput.Appl.Biosci. 10:685–686.
Coleman,J.L.J.,Ngo,T.,andSmith,N.J.(2017).TheGprotein-coupledreceptorN-terminusand
receptorsignalling:N-teringanewera. Cell.Signalling. 33:1–9.
Cozzetto,D.,Buchan,D.W.A.,Bryson,K.,andJones,D.T.(2013).Proteinfunctionpredictionby
massiveintegrationofevolutionaryanalysesandmultipledatasources. BMCBioinf. 14:S1.
Cozzetto,D.,Minneci,F.,Currant,H.,andJones,D.T.(2016).FFPred3:feature-basedfunction
predictionforallgeneontologydomains. Sci.Rep. 6:31865.
Crick,F.H.(1958).Onproteinsynthesis. Symp.Soc.Exp.Biol. 12:138–163.
Cukuroglu,E.,Gursoy,A.,Nussinov,R.,andKeskin,O.(2014).Non-redundantuniqueinterface
structuresastemplatesformodelingproteininteractions. PLoSOne. 9:e86738.
Das,S.,Lee,D.,Sillitoe,I.etal.(2015).FunctionalclassificationofCATHsuperfamilies:a
domain-basedapproachforproteinfunctionannotation. Bioinformatics.31:3460–3467.
Deng,X.,Gumm,J.,Karki,S.etal.(2015).Anoverviewofpracticalapplicationsofproteindisorder
predictionanddriveforfaster,moreaccuratepredictions. Int.J.Mol.Sci. 16:15384–15404.
218 Predictive Methods Using Protein Sequences
Dong,Q.,Wang,X.,Lin,L.,andXu,Z.(2006).Domainboundarypredictionbasedonprofile
domainlinkerpropensityindex. Comput.Biol.Chem. 30:127–133.
Dong,C.,Wei,P.,Jian,X.etal.(2015).Comparisonandintegrationofdeleteriousnessprediction
methodsfornonsynonymousSNVsinwholeexomesequencingstudies. Hum.Mol.Genet. 24:
2125–2137.
Eddy,S.R.(2011).AcceleratedprofileHMMsearches. PLoSComput.Biol. 7:e1002195.
Elbarbary,R.A.,Lucas,B.A.,andMaquat,L.E.(2016).Retrotransposonsasregulatorsofgene
expression.Science.351:aac7247.
Esmaielbeiki,R.,Krawczyk,K.,Knapp,B.etal.(2016).Progressandchallengesinpredicting
proteininterfaces. BriefingsBioinf. 17:117–131.
Eyrich,V.,Martí-Renom,M.A.,Przybylski,D.etal.(2001).EVA:continuousautomaticevaluation
ofproteinstructurepredictionservers. Bioinformatics.17:1242–1243.
Ezkurdia,L.,Grana,O.,Izarzugaza,J.M.G.,andTress,M.L.(2009).Assessmentofdomain
boundarypredictionsandthepredictionofintramolecularcontactsinCASP8. ProteinsStruct.
Funct.Bioinf. 77:196–209.
Fagerberg,L.,Jonasson,K.,andHeijne,G.V.(2010).Predictionofthehumanmembrane
proteome.Proteomics.10:1141–1149.
Fariselli,P.,Savojardo,C.,Martelli,P.L.,andCasadio,R.(2009).Grammatical-restrainedhidden
conditionalrandomfieldsforbioinformaticsapplications. AlgorithmsMol.Biol. 4:13.
Fidelis,K.,Rost,B.,andZemla,A.(1999).AmodifieddefinitionofSov,asegment-basedmeasure
forproteinsecondarystructurepredictionassessment. Proteins.223:220–223.
Finn,R.D.,Bateman,A.,Clements,J.etal.(2014a).Pfam:theproteinfamiliesdatabase. Nucleic
AcidsRes. 42:222–230.
Finn,R.D.,Miller,B.L.,Clements,J.,andBateman,A.(2014b).IPfam:adatabaseofproteinfamily
anddomaininteractionsfoundintheproteindataBank. NucleicAcidsRes. 42:364–373.
Finn,R.D.,Attwood,T.K.,Babbitt,P.C.etal.(2016).InterProin2017-beyondproteinfamilyand
domainannotations. NucleicAcidsRes. 45:gkw1107.
Fischer,T.B.,Arunachalam,K.V.,Bailey,D.etal.(2003).Thebindinginterferencedatabase(BID):
acompilationofaminoacidhotspotsinproteininterfaces. Bioinformatics.19:1453–1454.
Foster,L.J.,deHoog,C.L.,Zhang,Y.etal.(2006).Amammalianorganellemapbyprotein
correlationprofiling. Cell.125(1):187–199.
Frishman,D.andArgos,P.(1995).Knowledge-basedproteinsecondarystructureassignment.
ProteinsStruct.Funct.Genet. 23(4):566–579.
Fukuchi,S.,Amemiya,T.,Sakamoto,S.etal.(2014).IDEALin2014illustratesinteraction
networkscomposedofintrinsicallydisorderedproteinsandtheirbindingpartners. Nucleic
AcidsRes. 42:320–325.
Galperin,M.Y.,Makarova,K.S.,Wolf,Y.I.,andKoonin,E.V.(2015).Expandedmicrobialgenome
coverageandimprovedproteinfamilyannotationintheCOGdatabase. NucleicAcidsRes. 43:
D261–D269.
Gardy,J.L.andBrinkman,F.S.(2006).Methodsforpredictingbacterialproteinsubcellular
localization.Nat.Rev.Microbiol. 4(10):741–751.
Garnier,J.,Osguthorpe,D.J.,andRobson,B.(1978).Analysisoftheaccuracyandimplicationsof
simplemethodsforpredictingthesecondarystructureofglobularproteins. J.Mol.Biol. 120:
97–120.
Garnier,J.,Gibrat,J.-F.,andRobson,B.(1996).GORmethodforpredictingproteinsecondary
structurefromaminoacidsequence. MethodsEnzymol. 266:540–553.
Garrow,A.G.,Agnew,A.,andWesthead,D.R.(2005).TMB-Hunt:awebservertoscreensequence
setsfortransmembranebeta-barrelproteins. NucleicAcidsRes. 33(Suppl2):188–192.
Gaudet,P.,Michel,P.A.,Zahn-Zabal,M.etal.(2017).TheneXtProtknowledgebaseonhuman
proteins:2017update. NucleicAcidsRes. 45(D1):D177–D182.
GeneOntologyConsortium(2000).Geneontology:toolfortheunificationofbiology. Nat.Genet.
25:25–29.
References 219
GeneOntologyConsortium(2015).GeneOntologyConsortium:goingforward. NucleicAcidsRes.
43:D1049–D1056.
Goldberg,T.,Hamp,T.,andRost,B.(2012).LocTree2predictslocalizationforalldomainsoflife.
Bioinformatics.28:i458–i465.
Goldberg,T.,Hecht,M.,Hamp,T.etal.(2014).LocTree3predictionoflocalization. NucleicAcids
Res.42(WebServerissue):1–6.
Goodwin,S.,McPherson,J.D.,andMcCombie,W.R.(2016).Comingofage:tenyearsof
next-generationsequencingtechnologies. Nat.Rev.Genet. 17:333–351.
Gouw,M.,Michael,S.,Samano-Sanchez,H.etal.(2018).Theeukaryoticlinearmotif
resource–2018update. NucleicAcidsRes. 46(D1):D428–D434.
Graessel,A.,Hauck,S.M.,vonToerne,C.etal.(2015).Acombinedomicsapproachtogeneratethe
surfaceatlasofhumannaiveCD4 +TcellsduringearlyT-cellreceptoractivation. Mol.Cell.
Proteomics.14(8):2085–2102.
Greene,L.H.,Lewis,T.E.,Addou,S.etal.(2007).TheCATHdomainstructuredatabase:new
protocolsandclassificationlevelsgiveamorecomprehensiveresourceforexploringevolution.
NucleicAcidsRes. 35:291–297.
Grimm,D.G.,Azencott,C.A.,Aicheler,F.etal.(2015).Theevaluationoftoolsusedtopredictthe
impactofmissensevariantsishinderedbytwotypesofcircularity. Hum.Mutat. 36:513–523.
Habchi,J.,Tompa,P.,Longhi,S.,andUversky,V.N.(2014).Introducingproteinintrinsicdisorder.
Chem.Rev. 114:6561–6588.
Haft,D.H.,Selengut,J.D.,Richter,R.A.etal.(2013).TIGRFAMsandgenomepropertiesin2013.
NucleicAcidsRes. 41:387–395.
Hamp,T.andRost,B.(2012).Alternativeprotein-proteininterfacesarefrequentexceptions. PLoS
Comput.Biol. 8(8):e1002623.
Hamp,T.,Kassner,R.,Seemayer,S.etal.(2013).Homology-basedinferencesetsthebarhighfor
proteinfunctionprediction. BMCBioinf. 14(Suppl3):S7.
Hayat,S.,Peters,C.,Shu,N.etal.(2016).Inclusionofdyad-repeatpatternimprovestopology
predictionoftransmembrane β-barrelproteins. Bioinformatics.32:1571–1573.
Hecht,M.,Bromberg,Y.,andRost,B.(2013).Newsfromtheproteinmutabilitylandscape.
J.Mol.Biol. 425(21):3937–3948.
Hecht,M.,Bromberg,Y.,andRost,B.(2015).Betterpredictionoffunctionaleffectsforsequence
variants.BMCGenomics. 16(Suppl8):S1.
Heffernan,R.,Paliwal,K.,Lyons,J.etal.(2015).Improvingpredictionofsecondarystructure,
localbackboneangles,andsolventaccessiblesurfaceareaofproteinsbyiterativedeeplearning.
Sci.Rep. 5:11476.
Heffernan,R.,Yang,Y.,Paliwal,K.,andZhou,Y.(2017).Capturingnon-localinteractionsbylong
short-termmemorybidirectionalrecurrentneuralnetworksforimprovingpredictionofprotein
secondarystructure,backboneangles,contactnumbersandsolventaccessibility.
Bioinformatics.33(18):2842–2849.
vonHeijne,G.(2006).Membrane-proteintopology. Nat.Rev.Mol.CellBiol. 7:909–918.
Heinig,M.andFrishman,D.(2004).STRIDE:awebserverforsecondarystructureassignment
fromknownatomiccoordinatesofproteins. NucleicAcidsRes. 32(WebServerissue):500–502.
Hirose,S.,Shimizu,K.,Kanai,S.etal.(2007).StructuralbioinformaticsPOODLE-L:atwo-level
SVMpredictionsystemforreliablypredictinglongdisorderedregions. Struct.Bioinf. 23:
2046–2053.
Hirose,S.,Shimizu,K.,andNoguchi,T.(2010).POODLE-I:disorderedregionpredictionby
integratingPOODLEseriesandstructuralinformationpredictorsbasedonaworkflow
approach.InSilicoBiol. 10:185–191.
Hönigschmid,P.(2012). ImprovementofDNA-andRNA-proteinbindingprediction .Diploma
thesis.TUM–TechnicalUniversityofMunich.
Hopf,T.A.,Colwell,L.J.,Sheridan,R.etal.(2012).Three-dimensionalstructuresofmembrane
proteinsfromgenomicsequencing. Cell.149:1607–1621.
220 Predictive Methods Using Protein Sequences
Hopf,T.A.,Schärfe,C.P.I.,Rodrigues,J.P.G.L.M.etal.(2014).Sequenceco-evolutiongives3D
contactsandstructuresofproteincomplexes. eLife.3:e03430.
Horton,P.,Park,K.J.,Obayashi,T.etal.(2007).WoLFPSORT:proteinlocalizationpredictor.
NucleicAcidsRes. 35(WebServerissue):W585–W587.
Hu,Y.,Lehrach,H.,andJanitz,M.(2009).Comparativeanalysisofanexperimentalsubcellular
proteinlocalizationassayandinsilicopredictionmethods. J.Mol.Histol. 40(5–6):343–352.
Hubbard,S.J.andThornton,J.M.(1993). NACCESS.DepartmentofBiochemistryandMolecular
Biology.UniversityCollegeLondon.
Huerta-Cepas,J.,Szklarczyk,D.,Forslund,K.etal.(2016).EGGNOG4.5:ahierarchicalorthology
frameworkwithimprovedfunctionalannotationsforeukaryotic,prokaryoticandviral
sequences.NucleicAcidsRes. 44:D286–D293.
Huh,W.K.,Falvo,J.V.,Gerke,L.C.etal.(2003).Globalanalysisofproteinlocalizationinbudding
yeast.Nature.425(6959):686–691.
Hwang,S.,Guo,Z.,andKuznetsov,I.B.(2007).DP-bind:awebserverforsequence-based
predictionofDNA-bindingresiduesinDNA-bindingproteins. Bioinformatics.23:
634–636.
Ishida,T.andKinoshita,K.(2007).PrDOS:predictionofdisorderedproteinregionsfromamino
acidsequence. NucleicAcidsRes. 35:460–464.
Ishida,T.andKinoshita,K.(2008).Predictionofdisorderedregionsinproteinsbasedonthemeta
approach.Bioinformatics.24(11):1344–1348.
Jacoby,E.,Bouhelal,R.,Gerspacher,M.,andSeuwen,K.(2006).The7TMG-protein-coupled
receptortargetfamily. ChemMedChem.1:761–782.
Jensen,L.J.andBateman,A.(2011).Theriseandfallofsupervisedmachinelearningtechniques.
Bioinformatics.27:3331–3332.
Jia,Y.andLiu,X.-Y.(2006).Fromsurfaceself-assemblytocrystallization:predictionofprotein
crystallizationconditions. J.Phys.Chem.B. 110:6949–6955.
Jiang,Y.,Oron,T.R.,Clark,W.T.etal.(2016).Anexpandedevaluationofproteinfunction
predictionmethodsshowsanimprovementinaccuracy. GenomeBiol 17(1):184.
Jones,D.T.(1999).Proteinsecondarystructurepredictionbasedonposition-specificscoring
matrices.J.Mol.Biol. 292:195–202.
Jones,D.T.andCozzetto,D.(2015).DISOPRED3:precisedisorderedregionpredictionswith
annotatedprotein-bindingactivity. Bioinformatics.31:857–863.
Jones,P.,Binns,D.,Chang,H.Y.etal.(2014).InterProScan5:genome-scaleproteinfunction
classification.Bioinformatics.30:1236–1240.
Joo,K.,Lee,S.J.,andLee,J.(2012).SANN:solventaccessibilitypredictionofproteinsbynearest
neighbormethod. Proteins80(7):1791–1797.
Kabsch,W.andSander,C.(1983).Dictionaryofproteinsecondarystructure:patternrecognition
ofhydrogen-bondedandgeometricalfeatures. Biopolymers.22:2577–2637.
Kajan,L.,Yachdav,G.,Vicedo,E.etal.(2013).Cloudpredictionofproteinstructureandfunction
withPredictProteinforDebian. Biomed.Res.Int. 2013:398968.
Käll,L.,Krogh,A.,andSonnhammer,E.L.L.(2004).Acombinedtransmembranetopologyand
signalpeptidepredictionmethod. J.Mol.Biol. 338:1027–1036.
Käll,L.,Krogh,A.,andSonnhammer,E.L.L.(2005).AnHMMposteriordecoderforsequence
featurepredictionthatincludeshomologyinformation. Bioinformatics.21:i251.
Keskin,O.,Tuncbag,N.,andGursoy,A.(2016).Predictingprotein-proteininteractionsfromthe
moleculartotheproteomelevel. Chem.Rev. 116:4884–4909.
Kessel,A.andBen-Tal,N.(2011). IntroductiontoProteins ,438–440.London,UK:CRCPress.
Kihara,D.(2005).Theeffectoflong-rangeinteractionsonthesecondarystructureformationof
proteins.ProteinSci. 14:1955–1963.
Kinch,L.N.,Li,W.,Monastyrskyy,B.etal.(2016).EvaluationoffreemodelingtargetsinCASP11
andROLL. Proteins84(Suppl1):51–66.
Kircher,M.,Witten,D.M.,Jain,P.etal.(2014).Ageneralframeworkforestimatingtherelative
pathogenicityofhumangeneticvariants. Nat.Genet.
46:310–315.
References 221
Klimke,W.,Agarwala,R.,Badretdin,A.etal.(2009).TheNationalCenterforBiotechnology
Information’sproteinclustersdatabase. NucleicAcidsRes. 37:216–223.
Kloppmann,E.,Punta,M.,andRost,B.(2012).Structuralgenomicspluckshigh-hanging
membraneproteins. Curr.Opin.Struct.Biol. 22:326–332.
Köhler,S.,Vasilevsky,N.A.,Engelstad,M.etal.(2016).Thehumanphenotypeontologyin2017.
NucleicAcidsRes. 45:gkw1039.
Krissinel,E.andHenrick,K.(2007).Inferenceofmacromolecularassembliesfromcrystalline
state.J.Mol.Biol. 372:774–797.
Krogh,A.,Larsson,B.,vonHeijne,G.,andSonnhammer,E.L.(2001).Predictingtransmembrane
proteintopologywithahiddenMarkovmodel:applicationtocompletegenomes. J.Mol.Biol.
305:567–580.
Kumar,M.,Gromiha,M.M.,andRaghava,G.P.S.(2008).PredictionofRNAbindingsitesina
proteinusingSVMandPSSMprofile. Proteins.71:189–194.
Kumar,P.,Henikoff,S.,andNg,P.C.(2009).Predictingtheeffectsofcodingnon-synonymous
variantsonproteinfunctionusingtheSIFTalgorithm. Nat.Protoc. 4:1073–1081.
Kyte,J.andDoolittle,R.F.(1982).Asimplemethodfordisplayingthehydropathiccharacterofa
protein.J.Mol.Biol. 157:105–132.
Lam,S.D.,Dawson,N.L.,Das,S.etal.(2016).Gene3D:expandingtheutilityofdomain
assignments.NucleicAcidsRes. 44:D404–D409.
deLasRivas,J.andFontanillo,C.(2010).Protein-proteininteractionsessentials:keyconceptsto
buildingandanalyzinginteractomenetworks. PLoSComput.Biol. 6:1–8.
Lee,B.andRichards,F.M.(1971).Theinterpretationofproteinstructures:estimationofstatic
accessibility.J.Mol.Biol. 55(3):379–400.
vanderLee,R.,Buljan,M.,Lang,B.etal.(2014).Classificationofintrinsicallydisorderedregions
andproteins. Chem.Rev. 114:6589–6631.
Letunic,I.,Doerks,T.,andBork,P.(2015).SMART:recentupdates,newdevelopmentsandstatus
in2015. NucleicAcidsRes. 43:D257–D260.
Liu,J.andRost,B.(2001).Comparingfunctionandstructurebetweenentireproteomes. Protein
Sci.10:1970–1979.
Liu,J.andRost,B.(2003).Domains,motifsandclustersintheproteinuniverse. Curr.Opin.Chem.
Biol.7:5–11.
Liu,J.andRost,B.(2004).CHOPproteinsintostructuraldomain-likefragments. ProteinsStruct.
Funct.Genet. 55:678–688.
Lobanov,M.Y.andGalzitskaya,O.V.(2015).Howcommonisdisorder?Occurrenceofdisordered
residuesinfourdomainsoflife. Int.J.Mol.Sci. 16:19490–19507.
Lobley,A.(2010).HumanProteinFunctionPrediction:applicationofmachinelearningfor
integrationofheterogeneousdatasources.PhDthesis.UniversityCollegeLondon,Lo0ndon,
UK.
Magnan,C.N.andBaldi,P.(2014).SSpro/ACCpro5:almostperfectpredictionofprotein
secondarystructureandrelativesolventaccessibilityusingprofiles,machinelearningand
structuralsimilarity. Bioinformatics.30:2592–2597.
Mahlich,Y.,Reeb,J.,Schelling,M.etal.(2017).Commonsequencevariantsaffectmolecular
functionmorethanrarevariants. Sci.Rep. 7:1608.
Marks,D.S.,Colwell,L.J.,Sheridan,R.etal.(2011).Protein3Dstructurecomputedfrom
evolutionarysequencevariation. PLoSOne 6:e28766.
Marks,D.S.,Hopf,T.A.,Chris,S.,andSander,C.(2012).Proteinstructurepredictionfrom
sequencevariation. Nat.Biotechnol. 30:1072–1080.
Martinez,D.A.andNelson,M.A.(2010).Thenextgenerationbecomesthenowgeneration. PLoS
Genet.6:e1000906.
Mi,H.,Poudel,S.,Muruganujan,A.etal.(2016).PANTHERversion10:expandedproteinfamilies
andfunctions,andanalysistools. NucleicAcidsRes. 44:D336–D342.
Miosge,L.A.,Field,M.A.,Sontani,Y.etal.(2015).Comparisonofpredictedandactual
consequencesofmissensemutations. Proc.Natl.Acad.Sci.USA. 112:E5189–E5198.
222 Predictive Methods Using Protein Sequences
Mirabello,C.andPollastri,G.(2013).Porter,PaleAle4.0:high-accuracypredictionofprotein
secondarystructureandrelativesolventaccessibility. Bioinformatics.29(16):2056–2058.
Monastyrskyy,B.,Kryshtafovych,A.,Moult,J.etal.(2014).Assessmentofproteindisorderregion
predictionsinCASP10. ProteinsStruct.Funct.Bioinf. 82:127–137.
Montgomerie,S.,Sundararaj,S.,Gallin,W.J.,andWishart,D.S.(2006).Improvingtheaccuracyof
proteinsecondarystructurepredictionusingstructuralalignment. BMCBioinf. 7:301–301.
Montgomerie,S.,Cruz,J.A.,Shrivastava,S.etal.(2008).PROTEUS2:awebserverfor
comprehensiveproteinstructurepredictionandstructure-basedannotation. NucleicAcidsRes.
36(WebServerissue):202–209.
Mooney,C.,Cessieux,A.,Shields,D.C.,andPollastri,G.(2013).SCL-Epred:ageneraliseddenovo
eukaryoticproteinsubcellularlocalisationpredictor. AminoAcids. 45(2):291–299.
Morrow,J.K.andZhang,S.(2012).Computationalpredictionofproteinhotspotresidues. Curr.
Pharm.Des. 18:1255–1265.
Mosca,R.,Céol,A.,Stein,A.etal.(2014).3did:acatalogofdomain-basedinteractionsofknown
three-dimensionalstructure. NucleicAcidsRes. 42:374–379.
Moult,J.,Pedersen,J.T.,Judson,R.,andFidelis,K.(1995).Alarge-scaleexperimenttoassess
proteinstructurepredictionmethods. ProteinsStruct.Funct.Genet. 23:ii–iv.
Murakami,Y.andMizuguchi,K.(2010).ApplyingtheNaiveBayesclassifierwithkerneldensity
estimationtothepredictionofprotein-proteininteractionsites. Bioinformatics.26:1841–1848.
Nair,R.andRost,B.(2002).Inferringsub-cellularlocalisationthroughautomatedlexicalanalysis.
Bioinformatics.18(Suppl1):S78–S86.
Necci,M.,Piovesan,D.,Dosztányi,Z.,andTosatto,S.C.E.(2017).MobiDB-lite:fastandhighly
specificconsensuspredictionofintrinsicdisorderinproteins. Bioinformatics.33:btx015.
Ng,P.C.andHenikoff,S.(2003).SIFT:predictingaminoacidchangesthataffectproteinfunction.
NucleicAcidsRes. 31:3812–3814.
Nugent,T.andJones,D.T.(2009).Transmembraneproteintopologypredictionusingsupport
vectormachines. BMCBioinf. 10:159.
Nugent,T.andJones,D.T.(2012).Accuratedenovostructurepredictionoflargetransmembrane
proteindomainsusingfragment-assemblyandcorrelatedmutationanalysis. Proc.Natl.Acad.
Sci.USA 109:E1540–E1547.
Oates,M.E.,Romero,P.,Ishida,T.etal.(2013).D2P2:databaseofdisorderedproteinpredictions.
NucleicAcidsRes. 41:508–516.
Oates,M.E.,Stahlhacke,J.,Vavoulis,D.V.etal.(2015).TheSUPERFAMILY1.75databasein2014:
adoublingofdata. NucleicAcidsRes. 43:D227–D233.
O’Donovan,C.,Martin,M.J.,Gattiker,A.etal.(2002).High-qualityproteinknowledgeresource:
SWISS-PROTandTrEMBL. BriefingsBioinf. 3:275–284.
Ofran,Y.andRost,B.(2003a).Analysingsixtypesofprotein-proteininterfaces. J.Mol.Biol. 325:
377–387.
Ofran,Y.andRost,B.(2003b).Predictedprotein-proteininteractionsitesfromlocalsequence
information.FEBSLett. 544:236–239.
Ofran,Y.andRost,B.(2007).ISIS:interactionsitesidentifiedfromsequence. Bioinformatics.23
(2):e13–e16.
Overington,J.,Al-Lazikani,B.,andHopkins,A.L.(2006).Howmanydrugtargetsarethere? Nat.
Rev.DrugDiscov. 5:993–996.
Pang,C.I.,Lin,K.,Wouters,M.A.etal.(2008).Identifyingfoldableregionsinproteinsequence
fromthehydrophobicsignal. NucleicAcidsRes. 36:578–588.
Pawson,T.andNash,P.(2003).Assemblyofcellregulatorysystemsthroughproteininteraction
domains.Science.300(5618):445–452.
Pedruzzi,I.,Rivoire,C.,Auchincloss,A.H.etal.(2013).HAMAPin2013,newdevelopmentsin
theproteinfamilyclassificationandannotationsystem. NucleicAcidsRes. 41:584–589.
Pedruzzi,I.,Rivoire,C.,Auchincloss,A.H.etal.(2015).HAMAPin2015:updatestotheprotein
familyclassificationandannotationsystem. NucleicAcidsRes. 43:D1064–D1070.
References 223
Piovesan,D.,Tabaro,F.,Mi ˇceti´c,I.etal.(2016).DisProt7.0:amajorupdateofthedatabaseof
disorderedproteins. NucleicAcidsRes. 45:gkw1056.
Pollastri,G.,Przybylski,D.,Rost,B.,andBaldi,P.(2002).Improvingthepredictionofprotein
secondarystructureinthreeandeightclassesusingrecurrentneuralnetworksandprofiles.
ProteinsStruct.Funct.Bioinf. 47:228–235.
Punta,M.andRost,B.(2008).Neuralnetworkspredictproteinstructureandfunction. Methods
Mol.Biol. 458:203–230.
Radivojac,P.,Clark,W.T.,Oron,T.R.etal.(2013).Alarge-scaleevaluationofcomputational
proteinfunctionprediction. Nat.Methods. 10:221–227.
Ramilowski,J.A.,Goldberg,T.,Harshbarger,J.etal.(2015).Adraftnetworkof
ligand-receptor-mediatedmulticellularsignallinginhuman. Nat.Commun. 6:7866.
Rao,V.S.,Srinivas,K.,Sujini,G.N.,andKumar,G.N.S.(2014).Protein-proteininteraction
detection:methodsandanalysis. Int.J.Proteomics. 2014:1–12.
Reeb,J.,Kloppmann,E.,Bernhofer,M.,andRost,B.(2014).Evaluationoftransmembranehelix
predictionsin2014. ProteinsStruct.Funct.Bioinf. 83:473–484.
Reeb,J.,Hecht,M.,Mahlich,Y.etal.(2016).Predictedmoleculareffectsofsequencevariantslink
tosystemlevelofdisease. PLoSComput.Biol. 12(8):e1005047.
Remmert,M.,Biegert,A.,Hauser,A.,andSöding,J.(2012).HHblits:lightning-fastiterative
proteinsequencesearchingbyHMM-HMMalignment. Nat.Methods. 9(2):173–175.
Res,I.,Mihalek,I.,andLichtarge,O.(2005).Anevolutionbasedclassifierforpredictionofprotein
interfaceswithoutusingproteinstructures. Bioinformatics.21(10):2496–2501.
Rezácová,P.,Borek,D.,Moy,S.F.etal.(2008).Crystalstructureandputativefunctionofsmall
Toprimdomain-containingproteinfromBacillusstearothermophilus. Proteins.70:311–319.
Rose,P.W.,Prli ´c,A.,Altunkaya,A.etal.(2017).TheRCSBproteindatabank:integrativeviewof
protein,geneand3Dstructuralinformation. NucleicAcidsRes. 45(D1):D271–D281.
Rost,B.(1996).PHD:predictingone-dimensionalproteinstructurebyprofilebasedneural
networks.MethodsEnzymol. 266:525–539.
Rost,B.(2001).Proteinsecondarystructurepredictioncontinuestorise. J.Struct.Biol. 134:
204–218.
Rost,B.(2002).Enzymefunctionlessconservedthananticipated. J.Mol.Biol. 318:595–608.
Rost,B.andSander,C.(1993).Improvedpredictionofproteinsecondarystructurebyuseof
sequenceprofilesandneuralnetworks. Proc.Natl.Acad.Sci.USA 90:7558–7562.
Rost,B.andSander,C.(1994a).Combiningevolutionaryinformationandneuralnetworksto
predictproteinsecondarystructure. ProteinsStruct.Funct.Bioinf. 19:55–72.
Rost,B.andSander,C.(1994b).Conservationandpredictionofsolventaccessibilityinprotein
families.ProteinsStruct.Funct.Genet. 20(3):216–226.
Rost,B.,Yachdav,G.,andLiu,J.(2004).ThePredictProteinserver. NucleicAcidsRes. 32(Suppl2):
W321–W326.
Rychlewski,L.andFischer,D.(2005).LiveBench-8:thelarge-scale,continuousassessmentof
automatedproteinstructureprediction. ProteinSci. 14(1):240–245.
Savojardo,C.,Fariselli,P.,andCasadio,R.(2013).BETAWARE:amachine-learningtooltodetect
andpredicttransmembranebeta-barrelproteinsinprokaryotes. Bioinformatics.29:
504–505.
Schaarschmidt,J.,Monastyrskyy,B.,Kryshtafovych,A.,andBonvin,A.M.J.J.(2018).Assessment
ofcontactpredictionsinCASP12:co-evolutionanddeeplearningcomingofage. Proteins86
(Suppl1):51–66.
Schlessinger,A.,Schaefer,C.,Vicedo,E.etal.(2011).Proteindisorder–abreakthroughinvention
ofevolution? Curr.Opin.Struct.Biol. 21:412–418.
SchrodingerLLC.(2015).ThePyMOLMolecularGraphicsSystem,Version1.9.
Shimizu,K.,Hirose,S.,andNoguchi,T.(2007).StructuralbioinformaticsPOODLE-S:web
applicationforpredictingproteindisorderbyusingphysicochemicalfeaturesandreduced
aminoacidsetofaposition-specificscoringmatrix. Struct.Bioinf. 23:2337–2338.
224 Predictive Methods Using Protein Sequences
Shoemaker,B.A.,Zhang,D.,Tyagi,M.etal.(2012).IBIS(inferredbiomolecularinteractionserver)
reports,predictsandintegratesmultipletypesofconservedinteractionsforproteins. Nucleic
AcidsRes. 40:834–840.
Sigrist,C.J.A.,DeCastro,E.,Cerutti,L.etal.(2013).Newandcontinuingdevelopmentsat
PROSITE.NucleicAcidsRes. 41:344–347.
Šiki´c,M.,Tomi ´c,S.,andVlahovi ˇcek,K.(2009).Predictionofprotein-proteininteractionsitesin
sequencesand3Dstructuresbyrandomforests. PLoSComput.Biol. 5(1):e1000278.
Sillitoe,I.,Cuff,A.L.,Dessailly,B.H.etal.(2013).Newfunctionalfamilies(FunFams)inCATHto
improvethemappingofconservedfunctionalsitesto3Dstructures. NucleicAcidsRes. 41:
490–498.
Sillitoe,I.,Lewis,T.E.,Cuff,A.etal.(2015).CATH:comprehensivestructuralandfunctional
annotationsforgenomesequences. NucleicAcidsRes. 43:D376–D381.
Söding,J.(2005).ProteinhomologydetectionbyHMM-HMMcomparison. Bioinformatics.21:
951–960.
Stevens,T.J.andArkin,I.T.(2000).Domorecomplexorganismshaveagreaterproportionof
membraneproteinsintheirgenomes? Proteins39:417–420.
Suyama,M.andOhara,O.(2003).DomCut:predictionofinter-domainlinkerregionsinamino
acidsequences. Bioinformatics.19:673–674.
Szent-Györgyi,A.G.andCohen,C.(1957).Roleofprolineinpolypeptidechainconfigurationof
proteins.Science.126:697.
Thorn,K.S.andBogan,A.A.(2001).ASEdb:adatabaseofalaninemutationsandtheireffectson
thefreeenergyofbindinginproteininteractions. Bioinformatics.17:284–285.
Thusberg,J.,Olatubosun,A.,andVihinen,M.(2011).Performanceofmutationpathogenicity
predictionmethodsonmissensevariants. Hum.Mutat. 32:358–368.
Tien,M.Z.,Meyer,A.G.,Sydykova,D.K.etal.(2013).Maximumallowedsolventaccessibilitiesof
residuesinproteins. PLoSOne. 8(11):e80635.
Tompa,P.,Davey,N.E.,Gibson,T.J.,andBabu,M.M.(2014).Amillionpeptidemotifsforthe
molecularbiologist. Mol.Cell. 55(2):161–169.
Touw,W.G.,Baakman,C.,Black,J.etal.(2015).AseriesofPDB-relateddatabanksforeveryday
needs.NucleicAcidsRes. 43(D1):D364–D368.
Tsirigos,K.D.,Elofsson,A.,andBagos,P.G.(2016).PRED-TMBB2:improvedtopologyprediction
anddetectionofbeta-barreloutermembraneproteins. Bioinformatics.32(17):i665–i671.
Tuncbag,N.,Kar,G.,Keskin,O.etal.(2009).Asurveyofavailabletoolsandwebserversfor
analysisofprotein-proteininteractionsandinterfaces. BriefingsBioinf. 10:217–232.
UniProtConsortium(2016).UniProt:theuniversalproteinknowledgebase. NucleicAcidsRes. 45:
1–12.
Vicedo,E.,Schlessinger,A.,andRost,B.(2015).Environmentalpressuremaychangethe
compositionproteindisorderinprokaryotes. PLoSOne. 10:1–21.
Viklund,H.,Granseth,E.,andElofsson,A.(2006).Structuralclassificationandpredictionof
reentrantregionsinalpha-helicaltransmembraneproteins:applicationtocompletegenomes. J.
Mol.Biol. 361:591–603.
VonHeijne,G.(1992).Membraneproteinstructureprediction.Hydrophobicityanalysisandthe
positive-insiderule. J.Mol.Biol. 225:487–494.
VonHeijne,G.andGavel,Y.(1988).Topogenicsignalsinintegralmembraneproteins. Eur.J.
Biochem.174:671–678.
Walia,R.R.,Xue,L.C.,Wilkins,K.etal.(2014).RNABindRPlus:apredictorthatcombines
machinelearningandsequencehomology-basedmethodstoimprovethereliabilityof
predictedRNA-bindingresiduesinproteins. PLoSOne 9(5):e97725.
Wang,B.,Chen,P.,Huang,D.S.etal.(2006).Predictingproteininteractionsitesfromresidue
spatialsequenceprofileandevolutionrate. FEBSLett. 580:380–384.
Wang,S.,Li,W.,Liu,S.,andXu,J.(2016a).RaptorX-property:awebserverforproteinstructure
propertyprediction. NucleicAcidsRes. 44(W1):W430–W435.
References 225
Wang,S.,Peng,J.,Ma,J.,andXu,J.(2016b).Proteinsecondarystructurepredictionusingdeep
convolutionalneuralfields. Sci.Rep. 6:18962.
Wang,S.,Sun,S.,Li,Z.etal.(2017).Accuratedenovopredictionofproteincontactmapby
ultra-deeplearningmodel. PLoSComput.Biol. 13(1):e1005324.
Wright,P.E.andDyson,H.J.(2014).Intrinsicallydisorderedproteinsincellularsignallingand
regulation.Nat.Rev.Mol.CellBiol. 16:18–29.
Wu,C.H.,Nikolskaya,A.,Huang,H.etal.(2004).PIRSF:familyclassificationsystematthe
proteininformationresource. NucleicAcidsRes. 32:D112–D114.
Xue,L.C.,Dobbs,D.,andHonavar,V.(2011).HomPPI:aclassofsequencehomologybased
protein-proteininterfacepredictionmethods. BMCBioinf. 12:244.
Yachdav,G.,Kloppmann,E.,Kajan,L.etal.(2014).PredictProtein–anopenresourceforonline
predictionofproteinstructuralandfunctionalfeatures. NucleicAcidsRes. 42:W337–W343.
Yan,J.,Friedrich,S.,andKurgan,L.(2016).Acomprehensivecomparativereviewofsequence
basedpredictorsofDNA-andRNA-bindingresidues. BriefingsBioinf. 17:88–105.
Yang,Y.,Gao,J.,Wang,J.etal.(2016a).Sixty-fiveyearsofthelongmarchinproteinsecondary
structureprediction:thefinalstretch. BriefingsBioinf. 19(3):482–494.
Yang,J.,Jin,Q.Y.,Zhang,B.,andShen,H.B.(2016b).R2C:improvingabinitioresiduecontact
mappredictionusingdynamicfusionstrategyandGaussiannoisefilter. Bioinformatics.32:
2435–2443.
Zhang,H.,Zhang,T.,Chen,K.etal.(2011).Criticalassessmentofhigh-throughputstandalone
methodsforsecondarystructureprediction. BriefingsBioinf. 12(6):672–688.
Zhao,H.,Yang,Y.,andZhou,Y.(2013).PredictionofRNAbindingproteinscomesofagefromlow
resolutiontohighresolution. Mol.Biosyst. 9:2417–2425.