Ch12 Protein Structure Prediction and Analysis / Summary + Internet Resources + Further Reading + References
小结
当今生物信息学中使用的许多概念和思想,例如序列比较、结构/序列可视化、结构预测、电子数据库以及进化分析,都可以追溯到结构生物学,以及开发了许多早期生物信息学工具的结构生物学家。没有结构生物学和结构生物学家的这些重要贡献,生物信息学就不会成为今天的样子。近些年来,局面开始发生转变:结构生物学家如今也开始求助于生物信息学家,以帮助解决模式发现、远缘结构比较以及大规模分布式数据管理等新兴问题。结构生物学家与生物信息学家之间的这种相互给予和吸收,对于维系这两个领域都至关重要;这种专业知识与洞见的交流无疑还将在未来持续相当长一段时间。希望本章已经说明,至少其中一部分互动是如何演变而来的,以及结构生物信息学如何继续成为深入理解生命“引擎”——蛋白质和酶——不可或缺的组成部分。
网络资源
BioMagResBank
www.bmrb.wisc.edu
CASP
predictioncenter.org
CATH/Gene3D
www.cathdb.info
CE
source.rcsb.org/jfatcatserver/ceHome.jsp
CPHModels
www.cbs.dtu.dk/services/CPHmodels
Dali
ekhidna2.biocenter.helsinki.fi/dali/
DeepView
spdbv.vital-it.ch
DSSP
www.cmbi.ru.nl/dssp.html
FATCAT
fatcat.sanfordburnham.org
HHpred
toolkit.tuebingen.mpg.de/#/tools/hhpred
iCn3D
www.ncbi.nlm.nih.gov/Structure/icn3d/full.html
I-TASSER
zhanglab.ccmb.med.umich.edu/I-TASSER/
Jmol
jmol.sourceforge.net
JSmol
jmol.sourceforge.net
LOMETS
zhanglab.ccmb.med.umich.edu/LOMETS
LOOPP
cbsu.tc.cornell.edu/software/loopp
MMDB
www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml
MODELLER
salilab.org/modeller
ModWeb
modbase.compbio.ucsf.edu/modweb
MolProbity
molprobity.biochem.duke.edu
MUSTER
zhanglab.ccmb.med.umich.edu/MUSTER
NGL Viewer
proteinformatics.charite.de/ngl/html/ngl.html
PANAV
panav.wishartlab.com
PDBe
www.ebi.ac.uk/pdbe
PDBeFOLD
www.ebi.ac.uk/msd-srv/ssm
PDBj
pdbj.org
Phyre2
www.sbg.bio.ic.ac.uk/~phyre2/html/page.cgi?id=index
Proteopedia
proteopedia.org/wiki/index.php/Main_Page
PROTEUS2
www.proteus2.ca/proteus2
PyMOL
www.pymol.org
RaptorX
raptorx.uchicago.edu
RasMol
www.openrasmol.org
RCSB-PDB
www.rcsb.org/pdb/home/home.do
Robetta
robetta.bakerlab.org
Rosetta@home
boinc.bakerlab.org
RosettaCommons
www.rosettacommons.org
RosettaDesign
rosettadesign.med.unc.edu
ROSIE
rosie.rosettacommons.org
SCOP
scop.mrc-lmb.cam.ac.uk/scop
SCOPe
scop.berkeley.edu
SHIFTX2
www.shiftx2.ca
STING Millennium
sms.cbi.cnptia.embrapa.br/SMS/STINGm
SuperPose
wishart.biology.ualberta.ca/SuperPose
SWISS-MODEL
swissmodel.expasy.org
TargetDB
sbkb.org
TM-align
cssb.biology.gatech.edu/skolnick/webservice/TM-align/index.shtml
TopMatch
topmatch.services.came.sbg.ac.at
TopSearch
topsearch.services.came.sbg.ac.at
VADAR
vadar.wishartlab.com
VAST+
www.ncbi.nlm.nih.gov/Structure/vastplus/vastplus.cgi
WebMol
bioinformatics.mpimp-golm.mpg.de/group-members/mpi-mp-group/dirk-walther/webmol-1
WHAT_CHECK
swift.cmbi.umcn.nl/gv/whatcheck/
延伸阅读
Branden, C. and Tooze, J. (1999). Introduction to Protein Structure, 2e. New York, NY: Garland Science Publishing. 这是一本出色且易读的参考书,覆盖内容优秀,并配有精美的彩色图示。本书很好地涵盖了该领域;尽管它出版于将近 20 年前,几乎每一位从事结构生物学实践工作的研究者都会拥有第一版或第二版中的某一版。
Kelley, L.A. and Sternberg, M.J.E. (2009). Protein structure prediction on the web: a case study using the Phyre server. Nat. Protoc. 4: 363–371. 这篇文章非常详细且实用地介绍了如何使用 Phyre 结构预测服务器,以及该服务器的工作原理。文章还提供了关于蛋白质结构预测的优秀背景材料,并对结构预测的优势与局限给出了很好的、平衡的评估。
Lesk, A.M. (2000). Introduction to Protein Architecture: The Structural Biology of Proteins. Oxford, UK: Oxford University Press. 这是 Lesk 博士的又一本优秀著作。全书图示精美,并且对各种背景的读者都很友好。书中还提供了许多有趣的问题和基于网络的练习。
Rhodes, G. (2006). Crystallography Made Crystal Clear: A Guide for Users of Macromolecular Models, 3e. Cambridge, MA: Academic Press. 对于非晶体学研究者而言,这是一本介绍蛋白质 X 射线晶体学的优秀入门书。它以清晰、易懂的方式解释了许多复杂概念。同时,本书还包含一组非常易读的章节,涉及 NMR 结构分析、同源模型的使用以及蛋白质结构可视化。
参考文献
Bai, X.C., McMullan, G., and Scheres, S.H. (2015). How cryo-EM is revolutionizing structural biology. Trends Biochem. Sci. 40: 49–57.
Bates, P.A., Kelley, L.A., MacCallum, R.M., and Sternberg, M.J. (2001). Enhancement of protein modeling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM. Proteins (Suppl 5): 39–46.
Bernstein, F.C., Koetzle, T.F., Williams, G.J.B. et al. (1977). The Protein Data Bank. J. Mol. Biol. 112: 535–542.
Bonneau, R., Tsai, J., Ruczinski, I. et al. (2001). Rosetta in CASP4: progress in ab initio protein structure prediction. Proteins (Suppl 5): 119–126.
Borrell, B. (2009). Fraud rocks protein community. Nature 462: 970.
Bowie, J.U., Luthy, R., and Eisenberg, D. (1991). A method to identify protein sequences that fold into a known 3-dimensional structure. Science 253: 164–170.
Bryant, S.H. and Lawrence, C.E. (1993). An empirical energy function for threading a protein sequence through a folding motif. Proteins 16 (1): 92–112.
Brylinski, M. and Lingam, D. (2012). eThread: a highly optimized machine learning-based approach to meta-threading and the modeling of protein tertiary structures. PLoS One 7: e50200.
Cavanagh, J., Faribrother, W.J., Palmer, A.G. III, et al. (2006). Protein NMR Spectroscopy: Principles and Practice, 2e. Cambridge, MA: Academic Press.
Chandonia, J.M., Fox, N.K., and Brenner, S.E. (2017). SCOPe: manual curation and artifact removal in the structural classification of proteins – extended database. J. Mol. Biol. 429: 348–355.
Chou, P.Y. and Fasman, G.D. (1974). Prediction of protein conformation. Biochemistry 13: 222–245.
Corey, R.B. and Pauling, L. (1953). Molecular models of amino acids, peptides, and proteins. Rev. Sci. Instrum. 24: 621–627.
Davis, I.W., Leaver-Fay, A., Chen, V.B. et al. (2007). MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res. 35 (Web Server issue): W375–W383.
Dietmann, S., Park, J., Notredame, C. et al. (2001). A fully automatic evolutionary classification of protein folds: Dali domain dictionary version 3. Nucleic Acids Res. 29: 55–57.
Doreleijers, J.F., Sousa da Silva, A.W., Krieger, E. et al. (2012). CING: an integrated residue-based structure validation program suite. J. Biomol. NMR 54: 267–283.
Drenth, J. (2006). Principles of Protein X-Ray Crystallography, 3e. New York, NY: Springer.
Gibson, K.D. and Scheraga, H.A. (1967). Minimization of polypeptide energy I. Preliminary structures of bovine pancreatic ribonuclease s-peptide. Proc. Natl. Acad. Sci. U.S.A. 58: 420–427.
Hagen, J.B. (2000). The origins of bioinformatics. Nat. Rev. Genet. 1: 231–236.
Hall, S.R., Allen, A.H., and Brown, I.D. (1991). The crystallographic information file (CIF): a new standard archive file for crystallography. Acta Crystallogr. Sec. A: Found. Crystallogr. 47: 655–685.
Han, B., Liu, Y., Ginzinger, S.W., and Wishart, D.S. (2011). SHIFTX2: significantly improved protein chemical shift prediction. J. Biomol. NMR 50: 43–57.
Hanson, R.M., Prilusky, J., Renjian, Z. et al. (2013). JSmol and the next-generation web-based representation of 3D molecular structure as applied to Proteopedia. Isr. J. Chem. 53: 207–216.
Herráez, A. (2006). Biomolecules in the computer: Jmol to the rescue. Biochem. Mol. Biol. Educ. 34: 255–261.
Higa, R.H., Togawa, R.C., Montagner, A.J. et al. (2004). STING Millennium suite: integrated software for extensive analyses of 3d structures of proteins and their complexes. BMC Bioinf. 5: 107.
Hodis, E., Prilusky, J., Martz, E. et al. (2008). Proteopedia – a scientific “wiki” bridging the rift between three-dimensional structure and function of biomacromolecules. Genome Biol. 9: R121.
Hooft, R.W., Vriend, G., Sander, C., and Abola, E.E. (1996). Errors in protein structures. Nature 381: 272.
Kabsch, W. and Sander, C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22: 2577–2637.
Källberg, M., Margaryan, G., Wang, S. et al. (2014). RaptorX server: a resource for template-based protein structure modeling. Methods Mol. Biol. 1137: 17–27.
Kaplan, W. and Littlejohn, T.G. (2001). Swiss-PDB viewer (Deep View). Briefings Bioinf. 2: 195–197.
Kelley, L.A., Mezulis, S., Yates, C.M. et al. (2015). The Phyre2 web portal for protein modeling, prediction and analysis. Nat. Protoc. 10: 845–858.
Kendrew, J.C., Bodo, G., Dintzis, H.M. et al. (1958). A three dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature 181: 662–666.
Kim, D.E., Chivian, D., and Baker, D. (2004). Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res. 32 (Web Server issue): W526–W531.
Klepeis, J.L., Lindorff-Larsen, K., Dror, R.O., and Shaw, D.E. (2009). Long-timescale molecular dynamics simulations of protein structure and function. Curr. Opin. Struct. Biol. 19: 120–127.
Krissinel, E. and Henrick, K. (2004). Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr. Sect. D: Biol. Crystallogr. 60: 2256–2268.
Kuntal, B.K., Aparoy, P., and Reddanna, P. (2010). EasyModeller: a graphical interface to MODELLER. BMC Res. Notes 3: 226.
Laskowski, R.A., MacArthur, M.W., Moss, D.S., and Thornton, J.M. (1993). PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr. 26: 283–291.
Levitt, M. (2007). Growth of novel protein structural data. Proc. Natl. Acad. Sci. U.S.A. 104: 3183–3188.
Levitt, M. and Chothia, C. (1976). Structural patterns in globular proteins. Nature 261: 552–558.
Lindorff-Larsen, K., Piana, S., Dror, R.O., and Shaw, D.E. (2011). How fast-folding proteins fold. Science 334: 517–520.
Liu, Y. and Kuhlman, B. (2006). RosettaDesign server for protein design. Nucleic Acids Res. 34 (Web Server issue): W235–W238.
Lüthy, R., Bowie, J.U., and Eisenberg, D. (1992). Assessment of protein models with three-dimensional profiles. Nature 356: 83–85.
Lyskov, S., Chou, F.C., Conchúir, S.Ó. et al. (2013). Serverification of molecular modeling applications: the Rosetta online server that includes everyone (ROSIE). PLoS One 8: e63906.
Madej, T., Boguski, M.S., and Bryant, S.H. (1995). Threading analysis suggests that the obese gene product may be a helical cytokine. FEBS Lett. 373: 13–18.
Madej, T., Lanczycki, C.J., Zhang, D. et al. (2014). MMDB and VAST+: tracking structural similarities between macromolecular complexes. Nucleic Acids Res. 42 (Database issue): D297–D303.
Maiti, R., Van Domselaar, G.H., Zhang, H., and Wishart, D.S. (2004). SuperPose: a simple server for sophisticated structural superposition. Nucleic Acids Res. 32 (Web Server issue): W590–W594.
Marks, D.S., Colwell, L.J., Sheridan, R. et al. (2011). Protein 3D structure computed from evolutionary sequence variation. PLoS One 6 (12): e28766.
Marti-Renom, M.A., Stuart, A.C., Fiser, A. et al. (2000). Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. 29: 291–325.
Martz, E. (2002). Protein explorer: easy yet powerful macromolecular visualization. Trends Biochem. Sci. 27: 107–109.
McCree, D.E. (1999). Practical Protein Crystallography, 2e. Cambridge, MA: Academic Press.
Montgomerie, S., Cruz, J.A., Shrivastava, S. et al. (2008). PROTEUS2: a web server for comprehensive protein structure prediction and structure-based annotation. Nucleic Acids Res. 36 (Web Server issue): W202–W209.
Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. (1995). SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: 536–540.
NCBI Resource Coordinators (2017). Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 45 (D1): D12–D17.
Nielsen, M., Lundegaard, C., Lund, O., and Petersen, T.N. (2010). CPHmodels-3.0 – remote homology modeling using structure-guided sequence profiles. Nucleic Acids Res. 38 (Web Server issue): W576–W581.
Pearl, F.M.G., Lee, D., Bray, J.E. et al. (2000). Assigning genomic sequences to CATH. Nucleic Acids Res. 28: 277–282.
Pieper, U., Webb, B.M., Dong, G.Q. et al. (2014). ModBase, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 42 (Database issue): D336–D346.
Prlic, A., Bliven, S., Rose, P.W. et al. (2010). Pre-calculated protein structure alignments at the RCSB PDB website. Bioinformatics 26: 2983–2985.
Ramachandran, G.N., Ramakrishnan, C., and Sasisekharan, V. (1963). Stereochemistry of polypeptide chain configurations. J. Mol. Biol. 7: 95–99.
Read, R.J., Adams, P.D., Arendall, W.B. 3rd, et al. (2011). A new generation of crystallographic validation tools for the protein data bank. Structure 19: 1395–1412.
Richards, F.M. (1977). Areas, volumes, packing and protein structure. Annu. Rev. Biophys. Bioeng. 6: 151–176.
Richardson, J.S. (1981). The anatomy and taxonomy of protein structure. Adv. Protein Chem. 34: 167–339.
Rose, A.S. and Hildebrand, P.W. (2015). NGL viewer: a web application for molecular visualization. Nucleic Acids Res. 43 (Web Server issue): W576–W579.
Sali, A. (1998). 100,000 protein structures for the biologist. Nat. Struct. Biol. 5: 1029–1032.
Sayle, R.A. and Milner-White, E.J. (1995). RASMOL: biomolecular graphics for all. Trends Biochem. Sci. 20: 374–376.
Schaeffer, R.D. and Daggett, V. (2011). Protein folds and protein folding. Protein Eng. Des. Sel. 24: 11–19.
Schwede, T., Kopp, J., Guex, N., and Peitsch, M.C. (2003). SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Res. 31: 3381–3385.
Sheffler, W. and Baker, D. (2010). RosettaHoles2: a volumetric packing measure for protein structure refinement and validation. Protein Sci. 19: 1991–1995.
Shindyalov, I.N. and Bourne, P.E. (2001). A database and tools for 3-D protein structure comparison and alignment using the combinatorial extension (CE) algorithm. Nucleic Acids Res. 29: 228–229.
Sippl, M.J. and Wiederstein, M. (2008). A note on difficult structure alignment problems. Bioinformatics 24: 426–427.
Söding, J., Biegert, A., and Lupas, A.N. (2005). The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 33 (Web Server issue): W244–W248.
Vaguine, A.A., Richelle, J., and Wodak, S.J. (1999). SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Crystallogr. Sect. D: Biol. Crystallogr. 55: 191–205.
Vallat, B.K., Pillardy, J., Májek, P. et al. (2009). Building and assessing atomic models of proteins from structural templates: learning and benchmarks. Proteins 76: 930–945.
Varadi, M., Kosol, S., Lebrun, P. et al. (2014). pE-DB: a database of structural ensembles of intrinsically disordered and of unfolded proteins. Nucleic Acids Res. 42 (Database issue): D326–D335.
Walther, D. (1997). WebMol – a Java based PDB viewer. Trends Biochem. Sci. 22: 274–275.
Wang, B., Wang, Y., and Wishart, D.S. (2010). A probabilistic approach for validating protein NMR chemical shift assignments. J. Biomol. NMR 47: 85–99.
Westbrook, J.D., Feng, Z., Chen, L. et al. (2003). The Protein Data Bank and structural genomics. Nucleic Acids Res. 31: 489–491.
Westbrook, J.D., Ito, N., Nakamura, H. et al. (2005). PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics 21: 988–992.
Wiederstein, M., Gruber, M., Frank, K. et al. (2014). Structure-based characterization of multiprotein complexes. Structure 22: 1063–1070.
Willard, L., Ranjan, A., Zhang, H. et al. (2003). VADAR: a web server for quantitative evaluation of protein structure quality. Nucleic Acids Res. 31: 3316–3319.
Wu, S. and Zhang, Y. (2007). LOMETS: a local meta-threading-server for protein structure prediction. Nucleic Acids Res. 35: 3375–3382.
Wu, S. and Zhang, Y. (2008). MUSTER: improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins 72: 547–556.
Yang, J. and Zhang, Y. (2015). I-TASSER server: new development for protein structure and function predictions. Nucleic Acids Res. 43 (Web Server issue): W174–W181.
Ye, Y. and Godzik, A. (2004). FATCAT: a web server for flexible structure comparison and structure similarity searching. Nucleic Acids Res. 32 (Web Server issue): W582–W585.
Young, J.Y., Westbrook, J.D., Feng, Z. et al. (2017). OneDep: unified wwPDB system for deposition, biocuration, and validation of macromolecular structures in the PDB archive. Structure 25: 536–545.
Zhang, Y. and Skolnick, J. (2005). TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33 (7): 2302–2309.