第7章 基于蛋白质序列的预测方法
7.4 总结、网络资源、延伸阅读与参考文献
总结(Summary)
20 世纪 60 年代 Anfinsen 等人的奠基性工作已经清楚表明:蛋白质序列决定其结构,而结构最终决定其功能。由于蛋白质序列相对容易通过实验获得,围绕“如何从序列预测结构与功能”已经发展出庞大的研究体系。
从结构预测角度看,这一领域已经高度成熟,至少有些子问题在现有实验数据允许的范围内,已经接近“可解”。但与之相比,仅从序列直接预测蛋白质功能这个更一般的问题,至今仍未真正解决。
本章讨论的一维预测方法——例如二级结构、跨膜区、溶剂可及性和无序区预测——仍然非常重要,因为它们常常是更高层级结构与功能预测模型的输入。幸运的是,尽管各种工具都存在误差,研究者如今已经能够借助大量互补方法,在缺乏先验实验知识的情况下,为蛋白质序列补充丰富的结构与功能线索。
不过,这些预测结果必须结合方法边界来解释。用户需要理解每类工具的优势与弱点,才能真正利用它们去筛选当今海量序列数据,并进一步提出可实验检验的生物学假设。尤其在蛋白质功能预测中,还应尽量回到原始证据(primary evidence):这些证据可能来自最佳类别工具的推断、高通量实验的映射,或专家依据深入实验整理出的数据库注释。当前没有任何单一资源能完美告知用户这些证据的可靠性层级。因此,对目标蛋白的实际分析,通常仍应依赖针对具体问题选用合适预测工具,并结合最可靠数据库注释的综合判断。
网络资源(Internet Resources)
核心数据库与预测评测
蛋白质结构预测
蛋白质功能预测
延伸阅读(Further Reading)
- Keskin, O., Tuncbag, N., and Gursoy, A. (2016). Predicting protein-protein interactions from the molecular to the proteome level. Chem. Rev. 116:4884–4909. 这篇综述对蛋白质结合问题做了范围很广的系统回顾,既覆盖蛋白—蛋白结合与蛋白—核酸结合,也讨论了蛋白层面和残基层面的预测,并补充了本章未详细展开的若干主题,例如基于蛋白质结构而非序列的 docking 与其他预测路线。
- Moult, J., Fidelis, K., Kryshtafovych, A. et al. (2016). Critical assessment of methods of protein structure prediction: progress and new directions in round XI. Proteins 84(Suppl 1):4–14. 这是对 CASP 实验较新的总结性评估,概括了蛋白质结构预测多个核心方向的独立测试结果。对于对功能预测感兴趣的读者,还应同时关注 CAFA(Jiang et al. 2016)以及 CAGI(见上方网络资源)在变异效应预测方面的评测实践。
参考文献(References)
以下参考文献题录按原书英文原文保留:
Adzhubei, I.A., Schmidt, S., Peshkin, L. et al. (2010). A method and server for predicting damaging missense mutations. Nat. Methods. 7:248–249.
Ahmad, S. and Sarai, A. (2005). PSSM-based prediction of DNA binding sites in proteins. BMC Bioinf. 6:33.
Akiva, E., Brown, S., Almonacid, D.E. et al. (2014). The structure-function linkage database. Nucleic Acids Res. 42:521–530.
Allis, C.D. and Jenuwein, T. (2016). The molecular hallmarks of epigenetic control. Nat. Rev. Genet. 17:487–500.
Almagro Armenteros, J.J., Sønderby, C.K., Sønderby, S.K. et al. (2017). DeepLoc: prediction of protein subcellular localization using deep learning. Bioinformatics. 33:3387–3395.
Aloy, P., Stark, A., Hadley, C., and Russell, R.B. (2003). Predictions without templates: new folds, secondary structure, and contacts in CASP5. Proteins Struct. Funct. Genet. 53 (Suppl 6):436–456.
Altschul, S.F. and Gish, W. (1996). Local alignment statistics. Methods Enzymol. 266:460–480.
Andreeva, A., Howorth, D., Chothia, C. et al. (2014). SCOP2 prototype: a new approach to protein structure mining. Nucleic Acids Res. 42:310–314.
Anfinsen, C.B. (1973). Principles that govern the folding of protein chains. Science. 181:223–230.
Ashkenazi, S., Snir, R., and Ofran, Y. (2012). Assessing the relationship between conservation of function and conservation of sequence using photosynthetic proteins. Bioinformatics. 28:3203–3210.
Attwood, T.K., Coletta, A., Muirhead, G. et al. (2012). The PRINTS database: a fine-grained protein sequence annotation and analysis resource-its status in 2012. Database. 2012:1–9.
Auton, A., Abecasis, G.R., Altshuler, D.M. et al. (2015). A global reference for human genetic variation. Nature. 526:68–74.
Bairoch, A. and Boeckmann, B. (1994). The SWISS-PROT protein sequence databank: current status. Nucleic Acids Res. 22:3578–3580.
Berman, H.M., Westbrook, J., Feng, Z. et al. (2000). The protein data bank. Nucleic Acids Res. 28:235–242.
Bernhofer, M., Kloppmann, E., Reeb, J., and Rost, B. (2016). TMSEG: novel prediction of transmembrane helices. Proteins 84:1706–1716.
Blum, T., Briesemeister, S., and Kohlbacher, O. (2009). MultiLoc2: integrating phylogeny and gene ontology terms improves subcellular protein localization prediction. BMC Bioinf. 10:274.
Boutet, E., Lieberherr, D., Tognolli, M. et al. (2016). UniProtKB/Swiss-Prot, the manually annotated section of the UniProt knowledgeBase: how to use the entry view. Methods Mol. Biol. 1374:23–54.
Bru, C., Courcelle, E., Carrère, S. et al. (2005). The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res. 33:212–215.
Bryson, K., Cozzetto, D., and Jones, D.T. (2007). Computer-assisted protein domain boundary prediction using the DomPred server. Curr. Protein Pept. Sci. 8:181–188.
Buchan, D.W.A., Minneci, F., Nugent, T.C.O. et al. (2013). Scalable web services for the PSIPRED protein analysis workbench. Nucleic Acids Res. 41:349–357.
Chen, X.W. and Jeong, J.C. (2009). Sequence-based prediction of protein interaction sites with an integrative method. Bioinformatics. 25:585–591.
Chen, P. and Li, J. (2010). Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information. BMC Bioinf. 11:402.
Chen, C.P., Kernytsky, A., and Rost, B. (2002). Transmembrane helix predictions revisited. Protein Sci. 11:2774–2791.
Cheng, J., Sweredoski, M.J., and Baldi, P. (2006). DOMpro: protein domain prediction using profiles, secondary structure, relative solvent accessibility, and recursive neural networks. Data Min. Knowl. Discovery 13:1–10.
Choi, Y., Sims, G.E., Murphy, S. et al. (2012). Predicting the functional effect of amino acid substitutions and indels. PLoS One 7(10):e46688.
Chou, P.Y. and Fasman, G.D. (1974). Prediction of protein conformation. Biochemistry. 13(2):222–245.
Claros, M.G. and Von Heijne, G. (1994). TopPred II: an improved software for membrane protein structure predictions. Comput. Appl. Biosci. 10:685–686.
Coleman, J.L.J., Ngo, T., and Smith, N.J. (2017). The G protein-coupled receptor N-terminus and receptor signalling: N-tering a new era. Cell. Signalling. 33:1–9.
Cozzetto, D., Buchan, D.W.A., Bryson, K., and Jones, D.T. (2013). Protein function prediction by massive integration of evolutionary analyses and multiple data sources. BMC Bioinf. 14:S1.
Cozzetto, D., Minneci, F., Currant, H., and Jones, D.T. (2016). FFPred3: feature-based function prediction for all gene ontology domains. Sci. Rep. 6:31865.
Crick, F.H. (1958). On protein synthesis. Symp. Soc. Exp. Biol. 12:138–163.
Cukuroglu, E., Gursoy, A., Nussinov, R., and Keskin, O. (2014). Non-redundant unique interface structures as templates for modeling protein interactions. PLoS One. 9:e86738.
Das, S., Lee, D., Sillitoe, I. et al. (2015). Functional classification of CATH superfamilies: a domain-based approach for protein function annotation. Bioinformatics. 31:3460–3467.
Deng, X., Gumm, J., Karki, S. et al. (2015). An overview of practical applications of protein disorder prediction and drive for faster, more accurate predictions. Int. J. Mol. Sci. 16:15384–15404.
Dong, Q., Wang, X., Lin, L., and Xu, Z. (2006). Domain boundary prediction based on profile domain linker propensity index. Comput. Biol. Chem. 30:127–133.
Dong, C., Wei, P., Jian, X. et al. (2015). Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Hum. Mol. Genet. 24:2125–2137.
Eddy, S.R. (2011). Accelerated profile HMM searches. PLoS Comput. Biol. 7:e1002195.
Elbarbary, R.A., Lucas, B.A., and Maquat, L.E. (2016). Retrotransposons as regulators of gene expression. Science. 351:aac7247.
Esmaielbeiki, R., Krawczyk, K., Knapp, B. et al. (2016). Progress and challenges in predicting protein interfaces. Briefings Bioinf. 17:117–131.
Eyrich, V., Martí-Renom, M.A., Przybylski, D. et al. (2001). EVA: continuous automatic evaluation of protein structure prediction servers. Bioinformatics. 17:1242–1243.
Ezkurdia, L., Grana, O., Izarzugaza, J.M.G., and Tress, M.L. (2009). Assessment of domain boundary predictions and the prediction of intramolecular contacts in CASP8. Proteins Struct. Funct. Bioinf. 77:196–209.
Fagerberg, L., Jonasson, K., and Heijne, G.V. (2010). Prediction of the human membrane proteome. Proteomics. 10:1141–1149.
Fariselli, P., Savojardo, C., Martelli, P.L., and Casadio, R. (2009). Grammatical-restrained hidden conditional random fields for bioinformatics applications. Algorithms Mol. Biol. 4:13.
Fidelis, K., Rost, B., and Zemla, A. (1999). A modified definition of Sov, a segment-based measure for protein secondary structure prediction assessment. Proteins. 223:220–223.
Finn, R.D., Bateman, A., Clements, J. et al. (2014a). Pfam: the protein families database. Nucleic Acids Res. 42:222–230.
Finn, R.D., Miller, B.L., Clements, J., and Bateman, A. (2014b). IPfam: a database of protein family and domain interactions found in the protein data Bank. Nucleic Acids Res. 42:364–373.
Finn, R.D., Attwood, T.K., Babbitt, P.C. et al. (2016). InterPro in 2017-beyond protein family and domain annotations. Nucleic Acids Res. 45:gkw1107.
Fischer, T.B., Arunachalam, K.V., Bailey, D. et al. (2003). The binding interface database (BID): a compilation of amino acid hotspots in protein interfaces. Bioinformatics. 19:1453–1454.
Foster, L.J., de Hoog, C.L., Zhang, Y. et al. (2006). A mammalian organelle map by protein correlation profiling. Cell.125(1):187–199.
Frishman, D. and Argos, P. (1995). Knowledge-based protein secondary structure assignment. Proteins Struct. Funct. Genet. 23(4):566–579.
Fukuchi, S., Amemiya, T., Sakamoto, S. et al. (2014). IDEAL in 2014 illustrates interaction networks composed of intrinsically disordered proteins and their binding partners. Nucleic Acids Res. 42:320–325.
Galperin, M.Y., Makarova, K.S., Wolf, Y.I., and Koonin, E.V. (2015). Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 43:D261–D269.
Gardy, J.L. and Brinkman, F.S. (2006). Methods for predicting bacterial protein subcellular localization. Nat. Rev. Microbiol. 4(10):741–751.
Garnier, J., Osguthorpe, D.J., and Robson, B. (1978). Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol. 120:97–120.
Garnier, J., Gibrat, J.-F., and Robson, B. (1996). GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol. 266:540–553.
Garrow, A.G., Agnew, A., and Westhead, D.R. (2005). TMB-Hunt: a web server to screen sequence sets for transmembrane beta-barrel proteins. Nucleic Acids Res. 33(Suppl 2):188–192.
Gaudet, P., Michel, P.A., Zahn-Zabal, M. et al. (2017). The neXtProt knowledgebase on human proteins: 2017 update. Nucleic Acids Res. 45(D1):D177–D182.
Gene Ontology Consortium (2000). Gene ontology: tool for the unification of biology. Nat. Genet. 25:25–29.
Gene Ontology Consortium (2015). Gene Ontology Consortium: going forward. Nucleic Acids Res. 43:D1049–D1056.
Goldberg, T., Hamp, T., and Rost, B. (2012). LocTree2 predicts localization for all domains of life. Bioinformatics. 28:i458–i465.
Goldberg, T., Hecht, M., Hamp, T. et al. (2014). LocTree3 prediction of localization. Nucleic Acids Res. 42(Web Server issue):1–6.
Goodwin, S., McPherson, J.D., and McCombie, W.R. (2016). Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17:333–351.
Gouw, M., Michael, S., Samano-Sanchez, H. et al. (2018). The eukaryotic linear motif resource–2018 update. Nucleic Acids Res. 46(D1):D428–D434.
Graessel, A., Hauck, S.M., von Toerne, C. et al. (2015). A combined omics approach to generate the surface atlas of human naive CD4+ T cells during early T-cell receptor activation. Mol. Cell. Proteomics. 14(8):2085–2102.
Greene, L.H., Lewis, T.E., Addou, S. et al. (2007). The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res. 35:291–297.
Grimm, D.G., Azencott, C.A., Aicheler, F. et al. (2015). The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity. Hum. Mutat. 36:513–523.
Habchi, J., Tompa, P., Longhi, S., and Uversky, V.N. (2014). Introducing protein intrinsic disorder. Chem. Rev. 114:6561–6588.
Haft, D.H., Selengut, J.D., Richter, R.A. et al. (2013). TIGRFAMs and genome properties in 2013. Nucleic Acids Res. 41:387–395.
Hamp, T. and Rost, B. (2012). Alternative protein-protein interfaces are frequent exceptions. PLoS Comput. Biol. 8(8):e1002623.
Hamp, T., Kassner, R., Seemayer, S. et al. (2013). Homology-based inference sets the bar high for protein function prediction. BMC Bioinf. 14(Suppl 3):S7.
Hayat, S., Peters, C., Shu, N. et al. (2016). Inclusion of dyad-repeat pattern improves topology prediction of transmembrane β-barrel proteins. Bioinformatics. 32:1571–1573.
Hecht, M., Bromberg, Y., and Rost, B. (2013). News from the protein mutability landscape. J. Mol. Biol. 425(21):3937–3948.
Hecht, M., Bromberg, Y., and Rost, B. (2015). Better prediction of functional effects for sequence variants. BMC Genomics. 16(Suppl 8):S1.
Heffernan, R., Paliwal, K., Lyons, J. et al. (2015). Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci. Rep. 5:11476.
Heffernan, R., Yang, Y., Paliwal, K., and Zhou, Y. (2017). Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics. 33(18):2842–2849.
von Heijne, G. (2006). Membrane-protein topology. Nat. Rev. Mol. Cell Biol. 7:909–918.
Heinig, M. and Frishman, D. (2004). STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins. Nucleic Acids Res. 32(Web Server issue):500–502.
Hirose, S., Shimizu, K., Kanai, S. et al. (2007). Structural bioinformatics POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions. Struct. Bioinf. 23:2046–2053.
Hirose, S., Shimizu, K., and Noguchi, T. (2010). POODLE-I: disordered region prediction by integrating POODLE series and structural information predictors based on a workflow approach. In Silico Biol. 10:185–191.
Hönigschmid, P. (2012). Improvement of DNA- and RNA-protein binding prediction. Diploma thesis. TUM–Technical University of Munich.
Hopf, T.A., Colwell, L.J., Sheridan, R. et al. (2012). Three-dimensional structures of membrane proteins from genomic sequencing. Cell. 149:1607–1621.
Hopf, T.A., Schärfe, C.P.I., Rodrigues, J.P.G.L.M. et al. (2014). Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife. 3:e03430.
Horton, P., Park, K.J., Obayashi, T. et al. (2007). WoLF PSORT: protein localization predictor. Nucleic Acids Res. 35(Web Server issue):W585–W587.
Hu, Y., Lehrach, H., and Janitz, M. (2009). Comparative analysis of an experimental subcellular protein localization assay and in silico prediction methods. J. Mol. Histol. 40(5–6):343–352.
Hubbard, S.J. and Thornton, J.M. (1993). NACCESS. Department of Biochemistry and Molecular Biology. University College London.
Huerta-Cepas, J., Szklarczyk, D., Forslund, K. et al. (2016). EGGNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 44:D286–D293.
Huh, W.K., Falvo, J.V., Gerke, L.C. et al. (2003). Global analysis of protein localization in budding yeast. Nature. 425(6959):686–691.
Hwang, S., Guo, Z., and Kuznetsov, I.B. (2007). DP-bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics. 23:634–636.
Ishida, T. and Kinoshita, K. (2007). PrDOS: prediction of disordered protein regions from amino acid sequence. Nucleic Acids Res. 35:460–464.
Ishida, T. and Kinoshita, K. (2008). Prediction of disordered regions in proteins based on the meta approach. Bioinformatics. 24(11):1344–1348.
Jacoby, E., Bouhelal, R., Gerspacher, M., and Seuwen, K. (2006). The 7TM G-protein-coupled receptor target family. ChemMedChem. 1:761–782.
Jensen, L.J. and Bateman, A. (2011). The rise and fall of supervised machine learning techniques. Bioinformatics. 27:3331–3332.
Jia, Y. and Liu, X.-Y. (2006). From surface self-assembly to crystallization: prediction of protein crystallization conditions. J. Phys. Chem. B. 110:6949–6955.
Jiang, Y., Oron, T.R., Clark, W.T. et al. (2016). An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol. 17(1):184.
Jones, D.T. (1999). Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292:195–202.
Jones, D.T. and Cozzetto, D. (2015). DISOPRED3: precise disordered region predictions with annotated protein-binding activity. Bioinformatics. 31:857–863.
Jones, P., Binns, D., Chang, H.Y. et al. (2014). InterProScan5: genome-scale protein function classification. Bioinformatics. 30:1236–1240.
Joo, K., Lee, S.J., and Lee, J. (2012). SANN: solvent accessibility prediction of proteins by nearest neighbor method. Proteins 80(7):1791–1797.
Kabsch, W. and Sander, C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 22:2577–2637.
Kajan, L., Yachdav, G., Vicedo, E. et al. (2013). Cloud prediction of protein structure and function with PredictProtein for Debian. Biomed. Res. Int. 2013:398968.
Käll, L., Krogh, A., and Sonnhammer, E.L.L. (2004). A combined transmembrane topology and signal peptide prediction method. J. Mol. Biol. 338:1027–1036.
Käll, L., Krogh, A., and Sonnhammer, E.L.L. (2005). An HMM posterior decoder for sequence feature prediction that includes homology information. Bioinformatics. 21:i251.
Keskin, O., Tuncbag, N., and Gursoy, A. (2016). Predicting protein-protein interactions from the molecular to the proteome level. Chem. Rev. 116:4884–4909.
Kessel, A. and Ben-Tal, N. (2011). Introduction to Proteins, 438–440. London, UK: CRC Press.
Kihara, D. (2005). The effect of long-range interaction on the secondary structure formation of proteins. Protein Sci. 14:1955–1963.
Kinch, L.N., Li, W., Monastyrskyy, B. et al. (2016). Evaluation of free modeling targets in CASP11 and ROLL. Proteins 84(Suppl 1):51–66.
Kircher, M., Witten, D.M., Jain, P. et al. (2014). A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46:310–315.
Klimke, W., Agarwala, R., Badretdin, A. et al. (2009). The National Center for Biotechnology Information’s protein clusters database. Nucleic Acids Res. 37:216–223.
Kloppmann, E., Punta, M., and Rost, B. (2012). Structural genomics plucks high-hanging membrane proteins. Curr. Opin. Struct. Biol. 22:326–332.
Köhler, S., Vasilevsky, N.A., Engelstad, M. et al. (2016). The human phenotype ontology in 2017. Nucleic Acids Res. 45:gkw1039.
Krissinel, E. and Henrick, K. (2007). Inference of macromolecular assemblies from crystalline state. J. Mol. Biol. 372:774–797.
Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E.L. (2001). Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305:567–580.
Kumar, M., Gromiha, M.M., and Raghava, G.P.S. (2008). Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins. 71:189–194.
Kumar, P., Henikoff, S., and Ng, P.C. (2009). Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4:1073–1081.
Kyte, J. and Doolittle, R.F. (1982). A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157:105–132.
Lam, S.D., Dawson, N.L., Das, S. et al. (2016). Gene3D: expanding the utility of domain assignments. Nucleic Acids Res. 44:D404–D409.
de Las Rivas, J. and Fontanillo, C. (2010). Protein-protein interaction essentials: key concepts to building and analyzing interactome networks. PLoS Comput. Biol. 6:1–8.
Lee, B. and Richards, F.M. (1971). The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 55(3):379–400.
van der Lee, R., Buljan, M., Lang, B. et al. (2014). Classification of intrinsically disordered regions and proteins. Chem. Rev. 114:6589–6631.
Letunic, I., Doerks, T., and Bork, P. (2015). SMART: recent updates, new developments and status in 2015. Nucleic Acids Res. 43:D257–D260.
Liu, J. and Rost, B. (2001). Comparing function and structure between entire proteomes. Protein Sci. 10:1970–1979.
Liu, J. and Rost, B. (2003). Domains, motifs and clusters in the protein universe. Curr. Opin. Chem. Biol. 7:5–11.
Liu, J. and Rost, B. (2004). CHOP proteins into structural domain-like fragments. Proteins Struct. Funct. Genet. 55:678–688.
Lobanov, M.Y. and Galzitskaya, O.V. (2015). How common is disorder? Occurrence of disordered residues in four domains of life. Int. J. Mol. Sci. 16:19490–19507.
Lobley, A. (2010). Human Protein Function Prediction: application of machine learning for integration of heterogeneous data sources. PhD thesis. University College London, London, UK.
Magnan, C.N. and Baldi, P. (2014). SSpro/ACCpro5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity. Bioinformatics. 30:2592–2597.
Mahlich, Y., Reeb, J., Schelling, M. et al. (2017). Common sequence variants affect molecular function more than rare variants. Sci. Rep. 7:1608.
Marks, D.S., Colwell, L.J., Sheridan, R. et al. (2011). Protein 3D structure computed from evolutionary sequence variation. PLoS One 6:e28766.
Marks, D.S., Hopf, T.A., Chris, S., and Sander, C. (2012). Protein structure prediction from sequence variation. Nat. Biotechnol. 30:1072–1080.
Martinez, D.A. and Nelson, M.A. (2010). The next generation becomes the now generation. PLoS Genet. 6:e1000906.
Mi, H., Poudel, S., Muruganujan, A. et al. (2016). PANTHER version 10: expanded protein families and functions, and analysis tools. Nucleic Acids Res. 44:D336–D342.
Miosge, L.A., Field, M.A., Sontani, Y. et al. (2015). Comparison of predicted and actual consequences of missense mutations. Proc. Natl. Acad. Sci. USA. 112:E5189–E5198.
Mirabello, C. and Pollastri, G. (2013). Porter, PaleAle 4.0: high-accuracy prediction of protein secondary structure and relative solvent accessibility. Bioinformatics. 29(16):2056–2058.
Monastyrskyy, B., Kryshtafovych, A., Moult, J. et al. (2014). Assessment of protein disorder region predictions in CASP10. Proteins Struct. Funct. Bioinf. 82:127–137.
Montgomerie, S., Sundararaj, S., Gallin, W.J., and Wishart, D.S. (2006). Improving the accuracy of protein secondary structure prediction using structural alignment. BMC Bioinf. 7:301–301.
Montgomerie, S., Cruz, J.A., Shrivastava, S. et al. (2008). PROTEUS2: a web server for comprehensive protein structure prediction and structure-based annotation. Nucleic Acids Res. 36(Web Server issue):202–209.
Mooney, C., Cessieux, A., Shields, D.C., and Pollastri, G. (2013). SCL-Epred: a generalised de novo eukaryotic protein subcellular localisation predictor. Amino Acids. 45(2):291–299.
Morrow, J.K. and Zhang, S. (2012). Computational prediction of protein hotspot residues. Curr. Pharm. Des. 18:1255–1265.
Mosca, R., Céol, A., Stein, A. et al. (2014). 3did: a catalog of domain-based interactions of known three-dimensional structure. Nucleic Acids Res. 42:374–379.
Moult, J., Pedersen, J.T., Judson, R., and Fidelis, K. (1995). A large-scale experiment to assess protein structure prediction methods. Proteins Struct. Funct. Genet. 23:ii–iv.
Murakami, Y. and Mizuguchi, K. (2010). Applying the Naive Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites. Bioinformatics. 26:1841–1848.
Nair, R. and Rost, B. (2002). Inferring sub-cellular localisation through automated lexical analysis. Bioinformatics. 18(Suppl 1):S78–S86.
Necci, M., Piovesan, D., Dosztányi, Z., and Tosatto, S.C.E. (2017). MobiDB-lite: fast and highly specific consensus prediction of intrinsic disorder in proteins. Bioinformatics. 33:btx015.
Ng, P.C. and Henikoff, S. (2003). SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31:3812–3814.
Nugent, T. and Jones, D.T. (2009). Transmembrane protein topology prediction using support vector machines. BMC Bioinf. 10:159.
Nugent, T. and Jones, D.T. (2012). Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis. Proc. Natl. Acad. Sci. USA 109:E1540–E1547.
Oates, M.E., Romero, P., Ishida, T. et al. (2013). D2P2: database of disordered protein predictions. Nucleic Acids Res. 41:508–516.
Oates, M.E., Stahlhacke, J., Vavoulis, D.V. et al. (2015). The SUPERFAMILY 1.75 database in 2014: a doubling of data. Nucleic Acids Res. 43:D227–D233.
O’Donovan, C., Martin, M.J., Gattiker, A. et al. (2002). High-quality protein knowledge resource: SWISS-PROT and TrEMBL. Briefings Bioinf. 3:275–284.
Ofran, Y. and Rost, B. (2003a). Analysing six types of protein-protein interfaces. J. Mol. Biol. 325:377–387.
Ofran, Y. and Rost, B. (2003b). Predicted protein-protein interaction sites from local sequence information. FEBS Lett. 544:236–239.
Ofran, Y. and Rost, B. (2007). ISIS: interaction sites identified from sequence. Bioinformatics. 23(2):e13–e16.
Overington, J., Al-Lazikani, B., and Hopkins, A.L. (2006). How many drug targets are there? Nat. Rev. Drug Discov. 5:993–996.
Pang, C.I., Lin, K., Wouters, M.A. et al. (2008). Identifying foldable regions in protein sequence from the hydrophobic signal. Nucleic Acids Res. 36:578–588.
Pawson, T. and Nash, P. (2003). Assembly of cell regulatory systems through protein interaction domains. Science. 300(5618):445–452.
Pedruzzi, I., Rivoire, C., Auchincloss, A.H. et al. (2013). HAMAP in 2013, new developments in the protein family classification and annotation system. Nucleic Acids Res. 41:584–589.
Pedruzzi, I., Rivoire, C., Auchincloss, A.H. et al. (2015). HAMAP in 2015: updates to the protein family classification and annotation system. Nucleic Acids Res. 43:D1064–D1070.
Piovesan, D., Tabaro, F., Mi ˇceti´c, I. et al. (2016). DisProt 7.0: a major update of the database of disordered proteins. Nucleic Acids Res. 45:gkw1056.
Pollastri, G., Przybylski, D., Rost, B., and Baldi, P. (2002). Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins Struct. Funct. Bioinf. 47:228–235.
Punta, M. and Rost, B. (2008). Neural networks predict protein structure and function. Methods Mol. Biol. 458:203–230.
Radivojac, P., Clark, W.T., Oron, T.R. et al. (2013). A large-scale evaluation of computational protein function prediction. Nat. Methods. 10:221–227.
Ramilowski, J.A., Goldberg, T., Harshbarger, J. et al. (2015). A draft network of ligand-receptor-mediated multicellular signalling in human. Nat. Commun. 6:7866.
Rao, V.S., Srinivas, K., Sujini, G.N., and Kumar, G.N.S. (2014). Protein-protein interaction detection: methods and analysis. Int. J. Proteomics. 2014:1–12.
Reeb, J., Kloppmann, E., Bernhofer, M., and Rost, B. (2014). Evaluation of transmembrane helix predictions in 2014. Proteins Struct. Funct. Bioinf. 83:473–484.
Reeb, J., Hecht, M., Mahlich, Y. et al. (2016). Predicted molecular effects of sequence variants link to system level of disease. PLoS Comput. Biol. 12(8):e1005047.
Remmert, M., Biegert, A., Hauser, A., and Söding, J. (2012). HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods. 9(2):173–175.
Res, I., Mihalek, I., and Lichtarge, O. (2005). An evolution based classifier for prediction of protein interfaces without using protein structures. Bioinformatics. 21(10):2496–2501.
Rezácová, P., Borek, D., Moy, S.F. et al. (2008). Crystal structure and putative function of small Toprim domain-containing protein from Bacillus stearothermophilus. Proteins. 70:311–319.
Rose, P.W., Prli ´c, A., Altunkaya, A. et al. (2017). The RCSB protein data bank: integrative view of protein, gene and 3D structural information. Nucleic Acids Res. 45(D1):D271–D281.
Rost, B. (1996). PHD: predicting one-dimensional protein structure by profile based neural networks. Methods Enzymol. 266:525–539.
Rost, B. (2001). Protein secondary structure prediction continues to rise. J. Struct. Biol. 134:204–218.
Rost, B. (2002). Enzyme function less conserved than anticipated. J. Mol. Biol. 318:595–608.
Rost, B. and Sander, C. (1993). Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc. Natl. Acad. Sci. USA 90:7558–7562.
Rost, B. and Sander, C. (1994a). Combining evolutionary information and neural networks to predict protein secondary structure. Proteins Struct. Funct. Bioinf. 19:55–72.
Rost, B. and Sander, C. (1994b). Conservation and prediction of solvent accessibility in protein families. Proteins Struct. Funct. Genet. 20(3):216–226.
Rost, B., Yachdav, G., and Liu, J. (2004). The PredictProtein server. Nucleic Acids Res. 32(Suppl 2):W321–W326.
Rychlewski, L. and Fischer, D. (2005). LiveBench-8: the large-scale, continuous assessment of automated protein structure prediction. Protein Sci. 14(1):240–245.
Savojardo, C., Fariselli, P., and Casadio, R. (2013). BETAWARE: a machine-learning tool to detect and predict transmembrane beta-barrel proteins in prokaryotes. Bioinformatics. 29:504–505.
Schaarschmidt, J., Monastyrskyy, B., Kryshtafovych, A., and Bonvin, A.M.J.J. (2018). Assessment of contact predictions in CASP12: co-evolution and deep learning coming of age. Proteins 86(Suppl 1):51–66.
Schlessinger, A., Schaefer, C., Vicedo, E. et al. (2011). Protein disorder–a breakthrough invention of evolution? Curr. Opin. Struct. Biol. 21:412–418.
Schrodinger LLC. (2015). The PyMOL Molecular Graphics System, Version 1.9.
Shimizu, K., Hirose, S., and Noguchi, T. (2007). Structural bioinformatics POODLE-S: web application for predicting protein disorder by using physicochemical features and reduced amino acid set of a position-specific scoring matrix. Struct. Bioinf. 23:2337–2338.
Shoemaker, B.A., Zhang, D., Tyagi, M. et al. (2012). IBIS (inferred biomolecular interaction server) reports, predicts and integrates multiple types of conserved interactions for proteins. Nucleic Acids Res. 40:834–840.
Sigrist, C.J.A., De Castro, E., Cerutti, L. et al. (2013). New and continuing developments at PROSITE. Nucleic Acids Res. 41:344–347.
Šiki´c, M., Tomi ´c, S., and Vlahovi ˇcek, K. (2009). Prediction of protein-protein interaction sites in sequences and 3D structures by random forests. PLoS Comput. Biol. 5(1):e1000278.
Sillitoe, I., Cuff, A.L., Dessailly, B.H. et al. (2013). New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures. Nucleic Acids Res. 41:490–498.
Sillitoe, I., Lewis, T.E., Cuff, A. et al. (2015). CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res. 43:D376–D381.
Söding, J. (2005). Protein homology detection by HMM-HMM comparison. Bioinformatics. 21:951–960.
Stevens, T.J. and Arkin, I.T. (2000). Do more complex organisms have a greater proportion of membrane proteins in their genomes? Proteins 39:417–420.
Suyama, M. and Ohara, O. (2003). DomCut: prediction of inter-domain linker regions in amino acid sequences. Bioinformatics. 19:673–674.
Szent-Györgyi, A.G. and Cohen, C. (1957). Role of proline in polypeptide chain configuration of proteins. Science. 126:697.
Thorn, K.S. and Bogan, A.A. (2001). ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics. 17:284–285.
Thusberg, J., Olatubosun, A., and Vihinen, M. (2011). Performance of mutation pathogenicity prediction methods on missense variants. Hum. Mutat. 32:358–368.
Tien, M.Z., Meyer, A.G., Sydykova, D.K. et al. (2013). Maximum allowed solvent accessibilities of residues in proteins. PLoS One. 8(11):e80635.
Tompa, P., Davey, N.E., Gibson, T.J., and Babu, M.M. (2014). A million peptide motifs for the molecular biologist. Mol. Cell. 55(2):161–169.
Touw, W.G., Baakman, C., Black, J. et al. (2015). A series of PDB-related databanks for everyday needs. Nucleic Acids Res. 43(D1):D364–D368.
Tsirigos, K.D., Elofsson, A., and Bagos, P.G. (2016). PRED-TMBB2: improved topology prediction and detection of beta-barrel outer membrane proteins. Bioinformatics. 32(17):i665–i671.
Tuncbag, N., Kar, G., Keskin, O. et al. (2009). A survey of available tools and web servers for analysis of protein-protein interactions and interfaces. Briefings Bioinf. 10:217–232.
UniProt Consortium (2016). UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45:1–12.
Vicedo, E., Schlessinger, A., and Rost, B. (2015). Environmental pressure may change the composition protein disorder in prokaryotes. PLoS One. 10:1–21.
Viklund, H., Granseth, E., and Elofsson, A. (2006). Structural classification and prediction of reentrant regions in alpha-helical transmembrane proteins: application to complete genomes. J. Mol. Biol. 361:591–603.
Von Heijne, G. (1992). Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J. Mol. Biol. 225:487–494.
Von Heijne, G. and Gavel, Y. (1988). Topogenic signals in integral membrane proteins. Eur. J. Biochem. 174:671–678.
Walia, R.R., Xue, L.C., Wilkins, K. et al. (2014). RNABindRPlus: a predictor that combines machine learning and sequence homology-based methods to improve the reliability of predicted RNA-binding residues in proteins. PLoS One 9(5):e97725.
Wang, B., Chen, P., Huang, D.S. et al. (2006). Predicting protein interaction sites from residue spatial sequence profile and evolution rate. FEBS Lett. 580:380–384.
Wang, S., Li, W., Liu, S., and Xu, J. (2016a). RaptorX-property: a web server for protein structure property prediction. Nucleic Acids Res. 44(W1):W430–W435.
Wang, S., Peng, J., Ma, J., and Xu, J. (2016b). Protein secondary structure prediction using deep convolutional neural fields. Sci. Rep. 6:18962.
Wang, S., Sun, S., Li, Z. et al. (2017). Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol. 13(1):e1005324.
Wright, P.E. and Dyson, H.J. (2014). Intrinsically disordered proteins in cellular signalling and regulation. Nat. Rev. Mol. Cell Biol. 16:18–29.
Wu, C.H., Nikolskaya, A., Huang, H. et al. (2004). PIRSF: family classification system at the protein information resource. Nucleic Acids Res. 32:D112–D114.
Xue, L.C., Dobbs, D., and Honavar, V. (2011). HomPPI: a class of sequence homology based protein-protein interface prediction methods. BMC Bioinf. 12:244.
Yachdav, G., Kloppmann, E., Kajan, L. et al. (2014). PredictProtein–an open resource for online prediction of protein structural and functional features. Nucleic Acids Res. 42:W337–W343.
Yan, J., Friedrich, S., and Kurgan, L. (2016). A comprehensive comparative review of sequence based predictors of DNA- and RNA-binding residues. Briefings Bioinf. 17:88–105.
Yang, Y., Gao, J., Wang, J. et al. (2016a). Sixty-five years of the long march in protein secondary structure prediction: the final stretch. Briefings Bioinf. 19(3):482–494.
Yang, J., Jin, Q.Y., Zhang, B., and Shen, H.B. (2016b). R2C: improving ab initio residue contact map prediction using dynamic fusion strategy and Gaussian noise filter. Bioinformatics. 32:2435–2443.
Zhang, H., Zhang, T., Chen, K. et al. (2011). Critical assessment of high-throughput standalone methods for secondary structure prediction. Briefings Bioinf. 12(6):672–688.
Zhao, H., Yang, Y., and Zhou, Y. (2013). Prediction of RNA binding proteins comes of age from low resolution to high resolution. Mol. Biosyst. 9:2417–2425.