第9章 分子进化与系统发育分析
9.8 Future Challenges、Internet Resources 与 References
范围:PDF page 292 Future Challenges 标题起 - PDF page 298;合并 Future Challenges、Internet Resources、References。
---
Future Challenges
系统发育分析是回答多种生物学问题的有力工具。不过,系统发育树本质上是一种推断出来的动态结构:它依赖于所采用的方法、纳入或排除的序列区域、物种取样方式、参数设定、定根方式以及其他因素。听起来似乎有些悖论,但在构建系统发育关系时,最重要的因素并不是具体采用哪一种系统发育推断方法,而是原始数据本身的质量。数据选择与序列比对过程的重要性怎么强调都不过分。即便是最复杂的系统发育推断方法,也无法自动纠正带有偏差或错误的输入数据。因此,研究者始终都应从尽可能多的角度检查原始数据和分析结果,确认结论在一般生物学意义上说得通。
随着 DNA 测序技术的成本持续下降,而速度、读长和准确性不断提升,我们对序列数据进行人工审查、分析、存储和共享的能力也必须同步提升。用于开展系统发育分析和其他生物信息学分析的工具与整合平台仍在不断涌现,因为科学家正在持续开发序列信息的新用途与新应用。在“big data”时代,系统发育学和生物信息学面临的障碍,已经不再是能否产生数据,而是是否有足够多具备专业能力的人来完成分析,以及是否有足够的基础设施来支撑这些计算(Muir et al. 2016)。因此,能够对基因、基因组、蛋白质以及其他分子与系统层面信息开展系统发育分析的分析人员和生物信息学家,今后仍会持续保持高需求。
此外,工具和算法的accuracy、sensitivity 和 specificity(见 Box 5.4)必须以系统化、定量化的方式进行评估,才能明确各自的优势与局限。只有这样,研究共同体才能判断哪些工具和算法最适合特定任务,以及不同方法得到的结果应如何进行比较与整合。
展望未来,针对海量可用数据开展的各种整合式生物信息学与系统发育分析,将为我们理解世界提供新的路径,也会帮助我们学习如何适应不断变化的环境。系统发育学的发展,以及地球生命本身的演化,都可以借用一条广为流传的 Charles Darwin 名言变体来概括——这句话被镌刻在 California Academy of Sciences 总部石地板上:
> “It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change.”
---
Internet Resources
---
References
以下参考文献题录按原书英文原文保留:
Achtman, M., Wain, J., Weill, F.X. et al., and the S. Enterica MLST Study Group (2012). Multilocus sequence typing as a replacement for serotyping in Salmonella enterica. PLoS Pathog. 8:e1002776.
Allegre, C.J. and Schneider, S.H. (2005). Evolution of Earth [online]. Sci. Amer. 293.
Anderson, S., Bankier, A.T., Barrell, B.G. et al. (1981). Sequence and organization of the human mitochondrial genome. Nature 290(5806):457-465.
Archibald, J.A. (2014). Aristotle's Ladder, Darwin's Tree: The Evolution of Visual Metaphors for Biological Order. New York, NY: Columbia University Press.
Argimón, S., Abudahab, K., Goater, R.J.G. et al. (2016). Microreact: visualizing and sharing data for genomic epidemiology and phylogeography. Microb. Genom. 2. https://doi.org/10.1099/mgen.0.000093.
Beerenwinkel, N., Greenman, C.D., and Lagergren, J. (2016). Computational cancer biology: an evolutionary perspective. PLoS Comput. Biol. 12(2):e1004717.
Bergsten, J. (2005). A review of long-branch attraction. Cladistics 21:163-193.
Bouvier, A. and Wadhwa, M. (2010). The age of the solar system redefined by the oldest Pb-Pb age of a meteoritic inclusion. Nat. Geosci. 3:637-641.
Brown, W.M., George, M. Jr., and Wilson, A.C. (1979). Rapid evolution of animal mitochondrial DNA. Proc. Natl. Acad. Sci. USA 76:1967-1971.
Chenna, R., Sugawara, H., Koike, T. et al. (2003). Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 31:3497-3500.
Croxen, M.A., Macdonald, K.A., Walker, M. et al. (2017). Multi-provincial Salmonellosis outbreak related to newly hatched chicks and poults: a genomics perspective. PLoS Curr. 9:9.
D'Costa, V.M., Griffiths, E., and Wright, G.D. (2007). Expanding the soil antibiotic resistome: exploring environmental diversity. Curr. Opin. Microbiol. 10:481-489.
Darwin, C. (1859). On the Origin of Species. London, UK: John Murray.
Dayhoff, M.O., Schwartz, R.M., and Orcutt, B.C. (1978). A model of evolutionary change in proteins. In: Atlas of Protein Sequence and Structure (ed. M.O. Dayhoff), 345-362. Washington, DC: National Biomedical Research Foundation.
Dodd, M.S., Papineau, D., Grenne, T. et al. (2017). Evidence for early life in Earth's oldest hydrothermal vent precipitates. Nature 543:60-64.
Doolittle, W.F. (2000). Uprooting the tree of life. Sci. Am. 282:90-95.
Dutta, A. and Chaudhuri, K. (2010). Analysis of tRNA composition and folding in psychrophilic, mesophilic and thermophilic genomes: indications for thermal adaptation. FEMS Microbiol. Lett. 305:100-108.
Efron, B. (1979). Bootstrapping methods: another look at the jackknife. Ann. Stat. 7:1-26.
Feil, E.J., Li, B.C., Aanensen, D.M. et al. (2004). eBURST: inferring patterns of evolutionary descent among clusters of related bacterial genotypes from multilocus sequence typing data. J. Bacteriol. 186:1518-1530.
Felsenstein, J. (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17:368-376.
Felsenstein, J. (1985). Confidence intervals on phylogenies: an approach using the bootstrap. Evolution 39:783-791.
Fitch, W.M. and Margoliash, E. (1967). Construction of phylogenetic trees. Science 155:279-284.
Gadagkar, S.R., Rosenberg, M.S., and Kumar, S. (2005). Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree. J. Exp. Zool. B Mol. Dev. Evol. 304:64-74.
Galtier, N. and Lobry, J.R. (1997). Relationships between genomic G + C content, RNA secondary structures, and optimal growth temperature in prokaryotes. J. Mol. Evol. 44:632-636.
Gardy, J.L., Johnston, J.C., Ho Sui, S.J. et al. (2011). Whole-genome sequencing and social-network analysis of a tuberculosis outbreak. N. Engl. J. Med. 364:730-739.
Gerner-Smidt, P., Hise, K., Kincaid, J. et al., and the Pulsenet Taskforce (2006). PulseNet USA: a five-year update. Foodborne Pathog. Dis. 3:9-19.
Griffiths, A.J.F., Miller, J.H., Suzuki, D.T. et al. (eds.) (2000). How DNA changes affect phenotype. In: An Introduction to Genetic Analysis, 7e. New York, NY: W.H. Freeman.
Gupta, R.S. and Griffiths, E. (2002). Critical issues in bacterial phylogeny. Theor. Popul. Biol. 61:423-434.
Handelsman, J. (2004). Metagenomics: application of genomics to uncultured microorganisms. Microbiol. Mol. Biol. Rev. 68:669-685.
Hasegawa, M., Kishino, H., and Yano, T. (1985). Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160-174.
Hedden, P. (2003). The genes of the Green Revolution. Trends Genet. 19(1):5-9.
Henikoff, S. and Henikoff, J.G. (1992). Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89:10915-10919.
Hopkins, A.L. and Groom, C.R. (2002). The druggable genome. Nature Rev. Drug Discov. 1:727-730.
Huelsenbeck, J.P. (1995). Performance of phylogenetic methods in simulation. Syst. Biol. 44:17-48.
Huelsenbeck, J.P., Larget, B., Miller, R.E., and Ronquist, F. (2002). Potential applications and pitfalls of Bayesian inference of phylogeny. Syst. Biol. 51:673-688.
Hughey, R., Krogh, A., Barrett, C., and Grate, L. (1996). SAM: sequence alignment and modelling software. University of California, Santa Cruz, Baskin Center for Computer Engineering and Information Sciences.
Jukes, T.H. and Cantor, C.R. (1969). Evolution of protein molecules. In: Mammalian Protein Metabolism (ed. H.N. Munro), 21-123. New York, NY: Academic Press.
Kage, U., Kumar, A., Dhokane, D. et al. (2016). Functional molecular markers for crop improvement. Crit. Rev. Biotechnol. 36(5):917-930.
Kimura, M. (1980). A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111-120.
Kundu, S. and Ghosh, S.K. (2015). Trend of different molecular markers in the last decades for studying human migrations. Gene 556(2):81-90.
Lawson, F.S., Charlebois, R.L., and Dillon, J.A. (1996). Phylogenetic analysis of carbamoyl phosphate synthetase genes: complex evolutionary history includes an internal duplication within a gene which can root the tree of life. Mol. Biol. Evol. 13:970-977.
Linnaeus, C. (1735). Systema Naturae (trans. M.S.J. Engel-Ledeboer and H. Engel. 1964. Nieuwkoop B de Graff, Amsterdam). Leyden, Netherlands: Johann Willem Groot.
Liu, Y., He, Z., Appels, R., and Xia, X. (2012). Functional markers in wheat: current status and future prospects. Theor. Appl. Genet. 125:1-10.
Locey, K.J. and Lennon, J.T. (2016). Scaling laws predict global microbial diversity. Proc. Natl. Acad. Sci. USA 113:5970-5975.
Margos, G., Gatewood, A.G., Aanensen, D.M. et al. (2008). MLST of housekeeping genes captures geographic population structure and suggests a European origin of Borrelia burgdorferi. Proc. Natl. Acad. Sci. USA 105:8730-8735.
Mora, C., Tittensor, D.P., Adl, S. et al. (2011). How many species are there on Earth and in the ocean? PLoS Biol. 9:e1001127.
Moran-Gilad, J., Rokney, A., Danino, D. et al. (2017). Real-time genomic investigation underlying the public health response to a Shiga toxin-producing Escherichia coli O26:H11 outbreak in a nursery. Epidemiol. Infect. 145(14):2998-3006.
Muir, P., Li, S., Lou, S. et al. (2016). The real cost of sequencing: scaling computation to keep pace with data generation. Genome Biol. 17:53.
NCBI Resource Coordinators (2016). Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 44(Database issue):D7-D19.
Needleman, S.B. and Wunsch, C.D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48:443-453.
Parks, D.H., Porter, M., Churcher, S. et al. (2009). GenGIS: a geospatial information system for genomic data. Genome Res. 19:1896-1904.
Parks, D.H., Mankowski, T., Zangooei, S. et al. (2013). GenGIS2: geospatial analysis of traditional and genetic biodiversity, with new gradient algorithms and an extensible plugin framework. PLoS One 8:e69885.
Planck Collaboration (2015). Planck 2015 results. XIII. Cosmological parameters. Astron. Astrophys. Rev. 594:A13.
Robinson, E.R., Walker, T.M., and Pallen, M.J. (2013). Genomics and outbreak investigation: from sequence to consequence. Genome Med. 5:36.
Ruggiero, M.A., Gordon, D.P., Orrell, T.M. et al. (2015). A higher level classification of all living organisms. PLoS One 10:e0119248.
Rzhetsky, A. and Nei, M. (1992). Statistical properties of the ordinary least-squares, generalized least-squares, and minimum-evolution methods of phylogenetic inference. J. Mol. Evol. 35(4):367-375.
Sahraeian, S.M., Luo, K.R., and Brenner, S.E. (2015). SIFTER search: a web server for accurate phylogeny-based protein function prediction. Nucleic Acids Res. 43:W141-W147.
Saitou, N. and Nei, M. (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425.
Sakharkar, M.K., Sakharkar, K.R., and Pervaiz, S. (2007). Druggability of human disease genes. Int. J. Biochem. Cell Biol. 39(6):1156-1164.
Salipante, S.J. and Hall, B.G. (2011). Inadequacies of minimum spanning trees in molecular epidemiology. J. Clin. Microbiol. 49:3568-3575.
Schmedes, S.E., Sajantila, A., and Budowle, B. (2016). Expansion of microbial forensics. J. Clin. Microbiol. 54:1964-1974.
Schmitt, M. (2003). Willi Hennig and the rise of cladistics. In: The New Panorama of Animal Evolution (eds. A. Legakis, S. Sfenthourakis, R. Polymeni and M. Thessalou-Legaki), 369-379. Moscow, Russia: Pensoft Publishers.
Schwartz, R. and Schäffer, A.A. (2017). The evolution of tumour phylogenetics: principles and practice. Nat. Rev. Genet. 18(4):213-229.
Searls, D.B. (2003). Pharmacophylogenomics: genes, evolution and drug targets. Nat. Rev. Drug Discov. 2:613-623.
Singapore Zika Study Group (2017). Outbreak of Zika virus infection in Singapore: an epidemiological, entomological, virological, and clinical analysis. Lancet Infect. Dis. 17:813-821.
Sokal, R. and Michener, C. (1958). A statistical method for evaluating systematic relationships. Univ. Kans. Sci. Bull. 38:1409-1438.
Strimmer, K. and von Haeseler, A. (1996). Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13:964-969.
Sturk-Andreaggi, K., Peck, M.A., Boysen, C. et al. (2017). AQME: a forensic mitochondrial DNA analysis tool for next-generation sequencing data. Forensic Sci. Int. Genet. 31:189-197.
Swofford, D.L., Olsen, G.J., Waddell, P.J., and Hillis, D.M. (1996). Phylogenetic inference. In: Molecular Systematics (eds. D.M. Hillis, C. Moritz and B.K. Mable), 407-514. Sunderland, MA: Sinauer Associates.
Tamura, K. (1992). Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G + C-content biases. Mol. Biol. Evol. 9:678-687.
Tamura, K. and Nei, M. (1993). Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10:512-526.
Tang, P., Croxen, M.A., Hasan, M.R. et al. (2017). Infection control in the new age of genomic epidemiology. Am. J. Infect. Control. 45:170-179.
Tavaré, S. (1986). Some probabilistic and statistical problems in the analysis of DNA sequences. Lectures on Mathematics in the Life Sciences 17:57-86.
Theurey, P. and Pizzo, P. (2018). The aging mitochondria. Genes 9(1):22.
Thompson, J.D., Higgins, D.G., and Gibson, T.J. (1994). CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680.
Wacey, D., Kilburn, M.R., Saunders, M. et al. (2011). Microfossils of sulphur-metabolizing cells in 3.4-billion-year-old rocks of Western Australia. Nat. Geosci. 4:698-702.
Weiss, M.C., Sousa, F.L., Mrnjavac, N. et al. (2016). The physiology and habitat of the last universal common ancestor. Nat. Microbiol. 1:16116.
Whittaker, R.H. (1969). New concepts of kingdoms of organisms. Science 163:150-160.
Wilde, S.A., Valley, J.W., Peck, W.H., and Graham, C.M. (2001). Evidence from detrital zircons for the existence of continental crust and oceans on the Earth 4.4 Gyr ago. Nature 409:175-178.
Woese, C.R. and Fox, G.E. (1977). Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc. Natl. Acad. Sci. USA 74:5088-5090.
Woese, C.R., Kandler, O., and Wheelis, M.L. (1990). Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc. Natl. Acad. Sci. USA 87:4576-4579.
Yang, Z. (1994). Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39:306-314.
Yang, R. and Keim, P. (2012). Microbial forensics: a powerful tool for pursuing bioterrorism perpetrators and the need for an international database. J. Bioterr. Biodef. S3:007.
Yang, Z. and Rannala, B. (1997). Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo method. Mol. Biol. Evol. 14:717-724.
Yang, I.S., Lee, H.Y., Yang, W.I., and Shin, K.J. (2013). mtDNA profiler: a web application for the nomenclature and comparison of human mitochondrial DNA sequences. J. Forensic Sci. 58(4):972-980.
Yoshida, C.E., Kruczkiewicz, P., Laing, C.R. et al. (2016). The Salmonella In Silico Typing Resource (SISTR): an open web-accessible tool for rapidly typing and subtyping draft salmonella genome assemblies. PLoS One 11(1):e0147101.
Zuckerkandl, E. and Pauling, L. (1965). Molecules as documents of evolutionary history. J. Theor. Biol. 8:357-366.