Ch11 Proteomics and Protein Identification by Mass Spectrometry / Summary + Acknowledgments + Internet Resources + Further Reading + References
Summary
与生物信息学的其他领域类似,蛋白质组学分析也使用复杂的数据分析流程和算法。为了获得最佳结果,必须仔细考虑所提交数据的质量以及所选择的参数。不存在一种在所有情形下都能完美工作的“one-size-fits-all”解决方案,多数软件工具都是针对特定任务而设计的。MS 数据的来源和质量同样至关重要,这也凸显出在开始任何分析之前,必须充分理解所研究的生物学问题。根据所使用的 MS 仪器类型、所生成数据的质量和类型,以及正在进行的实验表征类型,在实现最佳性能之前,需要仔细设定关键的数据库搜索工具参数(见 Table 11.2)。
Table 11.2 使用序列数据库搜索引擎时的标准搜索参数。
| 参数 | SEQUEST | !X Tandem | MaxQuant |
|---|
| Enzyme | Trypsin | Trypsin | Trypsin |
| Number of missed cleavages | 2 | 2 | 2 |
| Peptide mass tolerance | 0.5 Da | 0.4 Da | 4.5 ppm |
| Maximum number of modifications per peptide | 3 | 10 | 5 |
| Fixed modifications | Carbamidomethylation | Carbamidomethylation | Carbamidomethylation |
| Variable modifications | Oxidation, acetylation | Oxidation, acetylation | Oxidation, acetylation |
| Parent mass type | Monoisotopic mass | Monoisotopic mass | Monoisotopic mass |
| Fragment mass type | Monoisotopic mass | Monoisotopic mass | Monoisotopic mass |
| Minimum peptide length | 6 | 6 | 7 |
| Maximum peptide length | 40 | 50 | 25 |
| False discovery rate | 0.01 | 0.01 | 0.01 |
| Precursor mass tolerance | 10 ppm | −2.0 to 4.0 Da | 6 ppm |
| Fragment ion method | CID | CID | CID |
CID,collision-induced dissociation,碰撞诱导解离。
在所有蛋白质组学实验与分析中,需要考虑的重要因素包括:
- MS 仪器的正确校准(例如使用已知标准品);
- 理解仪器预期的质量分辨率和质量准确度;
- 根据蛋白质酶解所使用的 protease,指定适当的蛋白水解切割规则;
- 记录 MS 数据采集(仪器)设置,例如:
- 所使用的电离和碎裂方法,以及每个 spectrum 中识别到的离子系列;
- precursor 和 fragment ion 的质量、扫描范围以及匹配容差;
- 稳定同位素或多重电荷态的存在;
- 定义可变或预定义的翻译后修饰(post-translational modifications,例如 phosphorylation)或化学修饰(例如 acetylation);
- 污染物种的存在,例如 trypsin 自溶产物、keratin 以及其他实验伪影;
- 为搜索选择参考蛋白序列数据库;
- 处理并测量每个 spectrum 的 signal-to-noise ratio。
充分理解这些参数如何影响搜索范围,并最终影响结果质量,是至关重要的。
一般而言,有两种方法可以确保结果质量。第一种方法是选择最佳参数设置,这可以通过系统性地改变搜索参数,直至获得令人满意的结果来实现。例如,将初始 MS 扫描范围从 375–1500 m/z 增加到 400–1800 m/z,可以改善 peptide coverage 和 signal to noise;通过纳入来自亲缘关系接近但注释更完善物种的 orthologs 来扩大搜索空间,也可以提供信息量更高的结果。另一种确保高质量搜索结果的策略,是整合多个程序的结果,以在最大化覆盖度的同时尽量减少 false positives。由于不同搜索引擎采用不同的评分方案,并会考虑输入数据的不同特征,一个算法可能检测到另一个算法遗漏的特征(Kwon et al. 2011)。
总体而言,决定 LC-MS/MS 研究中生物信息学分析成功与否的两个主要因素是:了解数据本身的性质,并牢记 protein identification 只是任何蛋白质组学分析工作流程中的第一步。我们相信,本章在这些方面提供了一些有益的指导。
Acknowledgments
作者感谢 Emili Lab(University of Toronto, Toronto, Canada;Boston University, Boston, MA, USA)成员提出的建设性意见,以及他们在汇编支持信息方面提供的帮助。我们还感谢 Carl White 和 Ruth Isserlin(University of Toronto),以及 Indranil Paul 和 Benjamin Blum(Boston University)分享他们的专业知识、睿智建议和关键见解,这些都极大地改进了本章内容。
Internet Resources
Further Reading
Nature Milestones in Mass Spectrometry(www.nature.com/milestones/milemassspec)是由 Nature Publishing Group 旗下五种期刊共同参与的一项协作成果。每篇 milestone article 均由 Nature Publishing Group 的编辑撰写,聚焦于 mass spectrometry 中一项关键技术发展,并围绕一个突破性进展展开。每篇文章都会重点介绍促成该进展的主要论文,以及由这些进展衍生出的应用。
References
Aebersold, R. and Mann, M. (2003). Mass spectrometry-based proteomics. Nature 422 (6928): 198–207.
Bauer, C., Cramer, R., and Schuchhardt, J. (2011). Evaluation of peak-picking algorithms for protein mass spectrometry. Methods Mol. Biol. 696: 341–352.
Butterfield, D.A., Boyd-Kimball, D., and Castegna, A. (2003). Proteomics in Alzheimer’s disease: insights into potential mechanisms of neurodegeneration. J. Neurochem. 86 (6): 1313–1327.
Cox, J., Neuhauser, N., Michalski, A. et al. (2011). Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10 (4): 1794–1805.
Craig, R. and Beavis, R.C. (2004). TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20 (9): 1466–1467.
Craig, R., Cortens, J.P., and Beavis, R.C. (2004). Open source system for analyzing, validating, and storing protein identification data. J. Proteome Res. 3 (6): 1234–1242.
Deutsch, E.W., Csordas, A., Sun, Z. et al. (2017). The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition. Nucleic Acids Res. 45 (D1): D1100–D1106.
Djuric, S.W., Hutchins, C.W., and Talaty, N.N. (2016). Current status and future prospects for enabling chemistry technology in the drug discovery process. F1000Research 5: 2426.
Eng, J., McCormack, A., and Yates, J. (1994). An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5: 976–989.
Farrah, T., Deutsch, E.W., Hoopmann, M.R. et al. (2013). The state of the human proteome in 2012 as viewed through PeptideAtlas. J. Proteome Res. 12 (1): 162–171.
Fenn, J.B., Mann, M., Meng, C.K. et al. (1989). Electrospray ionization for mass spectrometry of large biomolecules. Science 246 (4926): 64–71.
Fenyö, D. (1999). The biopolymer markup language. Bioinformatics 15 (4): 339–340.
Fenyö, D., Eriksson, J., and Beavis, R. (2010). Mass spectrometric protein identification using the global proteome machine. In: Computational Biology (ed. D. Fenyö), 189–202. Totowa, NJ: Humana Press.
Filiou Michaela, D., Martins-de-Souza, D., Guest Paul, C. et al. (2012). To label or not to label: applications of quantitative proteomics in neuroscience research. Proteomics 12 (4–5): 736–747.
Gaudet, P., Michel, P.A., Zahn-Zabal, M. et al. (2017). The neXtProt knowledgebase on human proteins: 2017 update. Nucleic Acids Res. 45 (D1): D177–D182.
Gavin, A.C., Bosche, M., Krause, R. et al. (2002). Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415 (6868): 141–147.
Gerber, S.A., Rush, J., Stemman, O. et al. (2003). Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proc. Natl. Acad. Sci. U.S.A. 100 (12): 6940–6945.
Gramolini, A.O., Kislinger, T., Alikhani-Koopaei, R. et al. (2008). Comparative proteomics profiling of a phospholamban mutant mouse model of dilated cardiomyopathy reveals progressive intracellular stress responses. Mol. Cell. Proteomics 7 (3): 519–533.
Gygi, S.P., Rist, B., Gerber, S.A. et al. (1999). Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol. 17 (10): 994–999.
Halligan, B.D., Geiger, J.F., Vallejos, A.K. et al. (2009). Low cost, scalable proteomics data analysis using Amazon’s cloud computing services and open source search algorithms. J. Proteome Res. 8 (6): 3148–3153.
Henzel, W.J., Billeci, T.M., Stults, J.T. et al. (1993). Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. Proc. Natl. Acad. Sci. U.S.A. 90 (11): 5011–5015.
Hsu, J.-L., Huang, S.-Y., Chow, N.-H., and Chen, S.-H. (2003). Stable-isotope dimethyl labeling for quantitative proteomics. Anal. Chem. 75 (24): 6843–6852.
Huang, K.-Y., Lee, T.-Y., Kao, H.-J. et al. (2019). dbPTM in 2019: exploring disease association and cross-talk of post-translational modifications. Nucleic Acids Res. 47 (D1): D298–D308.
Jennings, K.R. (1968). Collision-induced decompositions of aromatic molecular ions. Int. J. Mass Spectrom. Ion Phys. 1 (3): 227–235.
Karas, M. and Hillenkamp, F. (1988). Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. Anal. Chem. 60 (20): 2299–2301.
Kelleher, N.L., Lin, H.Y., Valaskovic, G.A. et al. (1999). Top down versus bottom up protein characterization by tandem high-resolution mass spectrometry. J. Am. Chem. Soc. 121 (4): 806–812.
Kislinger, T., Rahman, K., Radulovic, D. et al. (2003). PRISM, a generic large scale proteomic investigation strategy for mammals. Mol. Cell. Proteomics 2 (2): 96–106.
Krogan, N.J., Cagney, G., Yu, H. et al. (2006). Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440 (7084): 637–643.
Kwon, T., Choi, H., Vogel, C. et al. (2011). MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines. J. Proteome Res. 10 (7): 2949–2958.
Little, D.P., Speir, J.P., Senko, M.W. et al. (1994). Infrared multiphoton dissociation of large multiply charged ions for biomolecule sequencing. Anal. Chem. 66 (18