Genome annotation has evolved considerably over the past two decades. These changes have been driven, in part, by significant improvements in computational techniques (for gene prediction) and in part by a significant expansion in the number of known and annotated genomes from an ever-growing number of diverse species. The availability of improved gene prediction tools, along with significantly expanded databases of well-annotated genes, proteins, and genomes, has moved genome annotation away from pure gene prediction to a more integrated, holistic approach that combines multiple lines of evidence to locate, identify, and functionally annotate genes. When combined with experimental data such as RNA-seq data or protein sequence data (from structural proteomics or expression-based proteomics), it is possible to obtain remarkably accurate and impressively complete annotations. This comprehensive blending of evidence is the basis for many newly developed, semi-automated or automated genome annotation pipelines and to many of the newer genome browsers and editors. However, not all genome annotation efforts can yield the same quantity or quality of information. Certainly prokaryotic genome annotation is faster, easier, and much more accurate than eukaryotic genome annotation. Indeed, the challenge of prokaryotic genome annotation is essentially a “solved problem,” while the challenge of eukaryotic genome annotation has to be considered as a “work in progress.”
<!-- END LEGACY FRAGMENT: 18_Summary -->
<!-- BEGIN SUPPLEMENTAL BACK MATTER: PDF pages 166-173 -->
Supplemental source: Acknowledgments / Internet Resources / Further Reading / References
PDF pages: 166-173 | Print pages: 146-153
Acknowledgments
TheauthorthanksAndyBaxevanisandRodericGuigófortheirhelpfulcommentsandtheuse
ofmaterialfromprioreditionsofthisbook.
Internet Resources
AbInitioProkaryoticGenePredictors
EasyGene(server) www.cbs.dtu.dk/services/EasyGene
GeneMark.hmm(server) opal.biology.gatech.edu/GeneMark/gmhmmp.cgi
GeneMarkS(server) opal.biology.gatech.edu/GeneMark/genemarks.cgi
GLIMMER(program) www.cs.jhu.edu/~genomics/Glimmer
Prodigal(program) github.com/hyattpd/Prodigal
AbInitioEukaryoticGenePredictors
GeneID%28server%29%20genome.crg.es/geneid.html
GeneMark-ES(program) opal.biology.gatech.edu/GeneMark
GeneZilla(program) www.genezilla.org
GenomeScan%28server%29%20hollywood.mit.edu/genomescan.html
GENSCAN%28server%29%20hollywood.mit.edu/GENSCAN.html
HMMgene(server) www.cbs.dtu.dk/services/HMMgene
SNAP%28program%29%20korflab.ucdavis.edu/software.html
Hybrid/ExtrinsicEukaryoticGenomeFinders
AUGUSTUS(server) bioinf.uni-greifswald.de/augustus
AUGUSTUS-PPX(program) bioinf.uni-greifswald.de/augustus
CONTRAST(program) contra.stanford.edu/contrast
GeneID(server) genome.crg.es/software/geneid
GeneWise(server) www.ebi.ac.uk/Tools/psa/genewise
GenomeThreader(program) genomethreader.org
GSNAP(program) research-pub.gene.com/gmap
mGENE(program) www.mgene.org
Internet Resources 147
Hybrid/ExtrinsicEukaryoticGenomeFinders
Mugsy-Annotator(program) mugsy.sourceforge.net
SGP-2(program) genome.crg.es/software/sgp2
STAR(program) code.google.com/archive/p/rna-star
Transomics(program) linux1.softberry.com/berry.phtml?topic=transomics
tRNAandrRNAFinders
Rfam(server) rfam.xfam.org
RNAmmer(server) www.cbs.dtu.dk/services/RNAmmer
RNAMotif%28program%29%20casegroup.rutgers.edu/casegr-sh-2.5.html
tRNAdb(server) trnadb.bioinf.uni-leipzig.de/DataOutput/Welcome
tRNADB-CE(server) trna.ie.niigata-u.ac.jp/cgi-bin/trnadb/index.cgi
tRNAfinder(server) ei4web.yz.yamagata-u.ac.jp/~kinouchi/tRNAfinder
tRNAscan-SE(server) lowelab.ucsc.edu/tRNAscan-SE
Phage-FindingTools
Phage_Finder(program) phage-finder.sourceforge.net
PHAST(server) phast.wishartlab.com
PHASTER(server) phaster.ca
RepeatFinding/MaskingTools
Dfam(server) www.dfam.org
LTR_FINDER(server) tlife.fudan.edu.cn/tlife/ltr_finder
LTRharvest%28program%29%20genometools.org/index.html
MITE-Hunter%28program%29%20target.iplantcollaborative.org/mite_hunter.html
Repbase(server) www.girinst.org/repbase
RepeatMasker(program) www.repeatmasker.org
RepeatScout(program) bix.ucsd.edu/repeatscout
RetroPred%28program%29%20www.juit.ac.in/attachments/RetroPred/home.html
ProkaryoticGenomeAnnotationPipelines
BASys(server) www.basys.ca
Prokka(program) www.vicbioinformatics.com/software.prokka.shtml
RAST(server/program) rast.nmpdr.org
EukaryoticGenomeAnnotationPipelines
BRAKER1(program) bioinf.uni-greifswald.de/bioinf/braker
EVM(program) evidencemodeler.github.io
JIGSAW(program) www.cbcb.umd.edu/software/jigsaw
MAKER2%28program%29%20www.yandell-lab.org/software/maker.html
PASA(program) github.com/PASApipeline/PASApipeline/wiki
GenomeBrowsersand/orEditors
Artemis(program) www.sanger.ac.uk/science/tools/artemis
Ensembl%28program%29%20uswest.ensembl.org/downloads.html
GenomeView(program) genomeview.org
JBrowse(program) jbrowse.org
UCSCGenomeBrowser%20hgdownload.cse.ucsc.edu/downloads.html
WebApollo(program) genomearchitect.github.io
148 Genome Annotation
Further Reading
Hoff,K.J.andStanke,M.(2015).Currentmethodsforautomatedannotationofprotein-coding
genes.Curr.Opin.InsectSci. 7,8–14.Awell-writtenandup-to-datesummaryofsomeofthe
latestdevelopmentsingenomeannotationwithsomeverypracticaladviceaboutwhich
annotationtoolsshouldbeused.
Nielsen,P.andKrogh,A.(2005).Large-scaleprokaryoticgenepredictionandcomparisonto
genomeannotation. Bioinformatics.21,4322–4329.Averyreadableassessmentofprokaryotic
genepredictionandgenomeannotation.
Yandell,M.andEnce,D.(2012).Abeginner’sguidetoeukaryoticgenomeannotation. Nat.Rev.
Genet.13,329–342.Anice,easy-to-readintroductiontotheprocessesinvolvedineukaryotic
genomeannotationalongwithusefuldescriptionsoftheavailablecomputationaltoolsandbest
practices.
Yoon,B.(2009).HiddenMarkovmodelsandtheirapplicationsinbiologicalsequenceanalysis.
Curr.Genomics 10,402–415.AcomprehensivetutorialonHMMsthatprovidesmanyuseful
examplesandexplanationsofhowdifferentHMMsareconstructedandusedingeneprediction
andgenesequenceanalysis.
References
Abe,T.,Inokuchi,H.,Yamada,Y.etal.(2014).tRNADB-CE:tRNAgenedatabasewell-timedin
theeraofbigsequencedata. Front.Genet. 5:114.
Abeel,T.,VanParys,T.,Saeys,Y.etal.(2012).GenomeView:anext-generationgenomebrowser.
NucleicAcidsRes. 40(2):e12.
Alexandersson,M.,Cawley,S.,andPatcher,L.(2003).SLAM:cross-speciesgenefindingand
alignmentwithageneralizedpairhiddenMarkovmodel. GenomeRes. 13:496–502.
Allen,J.E.andSalzberg,S.L.(2005).JIGSAW:integrationofmultiplesourcesofevidenceforgene
prediction.Bioinformatics21:3596–3603.
Allen,J.E.,Majoros,W.H.,Pertea,M.,andSalzberg,S.L.(2006).JIGSAW,GeneZilla,and
GlimmerHMM:puzzlingoutthefeaturesofhumangenesintheENCODEregions. Genome
Biol.7(Suppl1,S9):1–13.
Angiuoli,S.V.,DunningHotopp,J.C.,Salzberg,S.L.,andTettelin,H.(2011).Improving
pan-genomeannotationusingwholegenomemultiplealignment. BMCBioinf 12:272.
Arndt,D.,Grant,J.R.,Marcu,A.etal.(2016).PHASTER:abetter,fasterversionofthePHAST
phagesearchtool. NucleicAcidsRes. 44(W1):W16–W21.
Bao,Z.andEddy,S.R.(2002).Automateddenovoidentificationofrepeatsequencefamiliesin
sequencedgenomes. GenomeRes. 12:1269–1276.
Bellman,R.E.(1957). DynamicProgramming.Princeton:PrincetonUniversityPress.
Besemer,J.andBorodovsky,M.(2005).GeneMark:websoftwareforgenefindinginprokaryotes,
eukaryotesandviruses. NucleicAcidsRes. 33(WebServer):W451–W454.
Besemer,J.,Lomsadze,A.,andBorodovsky,M.(2001).GeneMarkS:aself-trainingmethodfor
predictionofgenestartsinmicrobialgenomes.Implicationsforfindingsequencemotifsin
regulatoryregions. NucleicAcidsRes. 29:2607–2618.
Birney,E.andDurbin,R.(1997).Dynamite:aflexiblecodegeneratinglanguagefordynamic
programmingmethodsusedinsequencecomparison.In: ProceedingsoftheFifthInternational
ConferenceonIntelligentSystemsforMolecularBiology,Halkidiki,Greece(21–26June1997) ,vol.
5,56–64.MenloPark,CA:AAAIPress.
Birney,E.,Clamp,M.,andDurbin,R.(2004).GeneWiseandGenomewise. GenomeRes. 14:
988–995.
Blanco,E.,Parra,G.,andGuigó,R.(2002).Usinggeneidtoidentifygenes.In: CurrentProtocolsin
Bioinformatics,vol.1,unit4.3.NewYork:Wiley.
References 149
Blattner,F.R.,Plunkett,G.3rd,,Bloch,C.A.etal.(1997).Thecompletegenomesequenceof
Escherichiacoli K-12.Science277:1453–1462.
Bobay,L.-M.,Touchon,M.,andRocha,E.P.C.(2014).Pervasivedomesticationofdefective
prophagesbybacteria. Proc.NatlAcad.Sci.USA. 111:12127–12132.
Borodovsky,M.andLomsadze,A.(2011).Geneidentificationinprokaryoticgenomes,phages,
metagenomes,andESTsequenceswithGeneMarkSsuite. Curr.Protoc.Bioinformatics .Chapter
4,Unit4.5.1–17.
Borodovsky,M.andMcIninch,J.(1993).GeneMark:parallelgenerecognitionforbothDNA
strands.Comput.Chem. 17:123–133.
Borodovsky,M.,Rudd,K.E.,andKoonin,E.V.(1994).Intrinsicandextrinsicapproachesfor
detectinggenesinabacterialgenome. NucleicAcidsRes. 22:4756–4767.
Bose,M.andBarber,R.D.(2006).ProphageFinder:aprophagelocipredictiontoolforprokaryotic
genomesequences. InSilicoBiol.(Gedrukt) 6:223–227.
Burge,C.andKarlin,S.(1997).PredictionofcompletegenestructuresinhumangenomicDNA. J.
Mol.Biol. 268:78–94.
Burset,M.andGuigó,R.(1996).Evaluationofgenestructurepredictionprograms. Genomics.34:
353–357.
Carver,T.,Harris,S.R.,Berriman,M.etal.(2012).Artemis:anintegratedplatformforvisualization
andanalysisofhigh-throughputsequence-basedexperimentaldata. Bioinformatics28:464–469.
Casjens,S.(2003).Prophagesandbacterialgenomics:whathavewelearnedsofar? Mol.Microbiol.
49:277–300.
Casper,J.,Zweig,A.S.,Villarreal,C.etal.(2018).TheUCSCGenomeBrowserdatabase:2018
update.NucleicAcidsRes. 46(D1):D762–D769.
Coghlan,A.,Fiedler,T.J.,McKay,S.J.etal.,andnGASPConsortium.(2008).nGASP–the
nematodegenomeannotationassessmentproject. BMCBioinf 9:549.
Cordaux,R.andBatzer,M.A.(2009).Theimpactofretrotransposonsonhumangenome
evolution.Nat.Rev.Genet. 10:691–703.
Delcher,A.L.,Harmon,D.,Kasif,S.etal.(1999).Improvedmicrobialgeneidentificationwith
GLIMMER.NucleicAcidsRes. 27:4636–4641.
Delcher,A.L.,Bratke,K.A.,Powers,E.C.,andSalzberg,S.L.(2007).Identifyingbacterialgenesand
endosymbiontDNAwithGlimmer. Bioinformatics23:673–679.
Dobin,A.,Davis,C.A.,Schlesinger,F.etal.(2013).STAR:ultrafastuniversalRNA-seqaligner.
Bioinformatics29:15–21.
Dunham,I.,Shimizu,N.,Roe,B.A.etal.(1999).TheDNAsequenceofhumanchromosome22.
Nature402:489–495.
Eddy,S.R.(2009).Anewgenerationofhomologysearchtoolsbasedonprobabilisticinference.
GenomeInform. 23:205–211.
Eilbeck,K.,Moore,B.,Holt,C.,andYandell,M.(2009).Quantitativemeasuresforthe
managementandcomparisonofannotatedgenomes. BMCBioinf 10:67.
Ellinghaus,D.,Kurtz,S.,andWillhoeft,U.(2008).LTRharvest,anefficientandflexiblesoftware
fordenovodetectionofLTRretrotransposons. BMCBioinf 9:18.
Ezkurdia,I.,Juan,D.,Rodriguez,J.M.etal.(2014).Multipleevidencestrandssuggestthatthere
maybeasfewas19,000humanprotein-codinggenes. Hum.Mol.Genet. 23:5866–5878.
Fay,J.C.andWu,C.(2003).Sequencedivergence,functionalconstraint,andselectioninprotein
evolution.Annu.Rev.GenomicsHum.Genet. 4:213–235.
Fernández-Suárez,X.M.andSchuster,M.K.(2010).Usingtheensemblgenomeservertobrowse
genomicsequencedata. Curr.Protoc.Bioinformatics .Chapter1,Unit1.15.
Fickett,J.W.andTung,C.S.(1992).Anassessmentofproteincodingmeasures. NucleicAcidsRes.
20:6441–6450.
Fouts,D.E.(2006).Phage_Finder:automatedidentificationandclassificationofprophageregions
incompletebacterialgenomesequences. NucleicAcidsRes. 34:5839–5851.
Gelfand,M.S.(1995).PredictionoffunctioninDNAsequenceanalysis. J.Comput.Biol. 2:87–117.
150 Genome Annotation
Gelfand,M.S.andRoytberg,M.A.(1993).Predictionoftheexon-intronstructurebyadynamic
programmingapproach. Biosystems.30:173–182.
Gelfand,M.S.,Mironov,A.A.,andPevner,P.A.(1996).Generecognitionviasplicedsequence
alignment.Proc.Natl.Acad.Sci.USA. 93:9061–9066.
Gish,W.andStates,D.(1993).Identificationofproteincodingregionsbydatabasesimilarity
search.Nat.Genet. 3:266–272.
Grabherr,M.G.,Haas,B.J.,Yassour,M.etal.(2011).Full-lengthtranscriptomeassemblyfrom
RNA-seqdatawithoutareferencegenome. Nat.Biotechnol. 29:644–652.
Gremme,G.,Brendel,V.,Sparks,M.E.,andKurtz,S.(2005).Engineeringasoftwaretoolforgene
structurepredictioninhigherorganisms. Inf.SoftwareTechnol. 47:965–978.
Gross,S.S.andBrent,M.R.(2006).Usingmultiplealignmentstoimprovegeneprediction. J.
Comput.Biol. 13:379–393.
Gross,S.S.,Do,C.B.,Sirota,M.,andBatzoglou,S.(2007).CONTRAST:adiscriminative,
phylogeny-freeapproachtomultipleinformantdenovogeneprediction. GenomeBiol. 8:R269.
Guigó,R.(1999).DNAcomposition,codonusageandexonprediction.In: GeneticDatabases (ed.
M.Bishop)),53–80.Cambridge,MA:AcademicPress.
Guigó,R.andReese,M.G.(2005).EGASP:collaborationthroughcompetitiontofindhuman
genes.Nat.Methods 2:575–577.
Guigó,R.,Dermitzakis,E.T.,Agarwal,P.etal.(2003).Comparisonofmouseandhumangenomes
followedbyexperimentalverificationyieldsanestimated1,019additionalgenes. Proc.Natl.
Acad.Sci.USA. 100:1140–1145.
Guigó,R.,Flicek,P.,Abril,J.F.etal.(2006).EGASP:thehumanENCODEgenomeannotation
assessmentproject. GenomeBiol. 7(Suppl1):S2.1–S2.31.
Haas,B.J.,Salzberg,S.L.,Zhu,W.etal.(2008).Automatedeukaryoticgenestructureannotation
usingEVidenceModelerandtheprogramtoassemblesplicedalignments. GenomeBiol. 9:R7.
Han,Y.andWessler,S.R.(2010).MITE-Hunter:aprogramfordiscoveringminiature
inverted-repeattransposableelementsfromgenomicsequences. NucleicAcidsRes. 38:e199.
Harrow,J.,Frankish,A.,Gonzalez,J.M.etal.(2012).GENCODE:thereferencehumangenome
annotationforTheENCODEProject. GenomeRes. 22:1760–1774.
Häsler,J.andStrub,K.(2006).Aluelementsasregulatorsofgeneexpression. NucleicAcidsRes.
34:5491–5497.
Hoff,K.J.andStanke,M.(2013).WebAUGUSTUS–awebservicefortrainingAUGUSTUSand
predictinggenesineukaryotes. NucleicAcidsRes. 41(WebServerissue):W123–W128.
Hoff,K.J.,Lange,S.,Lomsadze,A.etal.(2016).BRAKER1:unsupervisedRNA-seq-basedgenome
annotationwithGeneMark-ETandAUGUSTUS. Bioinformatics32:767–769.
Holt,C.andYandell,M.(2010).MAKER2:anannotationpipelineandgenome-database
managementtoolforsecond-generationgenomeprojects. BMCBioinf 12:491.
Hou,Y.andLin,S.(2009).Distinctgenenumber–genomesizerelationshipsforeukaryotesand
non-eukaryotes:genecontentestimationfordinoflagellategenomes. PLoSOne 4(9):e6978.
Hyatt,D.,Chen,G.L.,Locascio,P.F.etal.(2010).Prodigal:prokaryoticgenerecognitionand
translationinitiationsiteidentification. BMCBioinf 11:119.
Jühling,F.,Mörl,M.,Hartmann,R.K.etal.(2009).tRNAdb2009:compilationoftRNAsequences
andtRNAgenes. NucleicAcidsRes. 37(Databaseissue):D159–D162.
Jurka,J.,Kapitonov,V.V.,Pavlicek,A.etal.(2005).Repbaseupdate,adatabaseofeukaryotic
repetitiveelements. Cytogenet.GenomeRes. 110(1–4):462–467.
Kalvari,I.,Argasinska,J.,Quinones-Olvera,N.etal.(2018).Rfam13.0:shiftingtoa
genome-centricresourcefornon-codingRNAfamilies. NucleicAcidsRes. 46(D1):
D335–D342.
Keller,O.,Kollmar,M.,Stanke,M.,andWaack,S.(2011).Anovelhybridgenepredictionmethod
employingproteinmultiplesequencealignments. Bioinformatics27:757–763.
Kent,W.J.(2002).BLAT–theBLAST-likealignmenttool. GenomeRes. 12:656–664.
Kim,D.,Pertea,G.,Trapnell,C.etal.(2013).TopHat2:accuratealignmentoftranscriptomesinthe
presenceofinsertions,deletionsandgenefusions. GenomeBiol. 14:R36.
References 151
Kinouchi,M.andKuoakawa,K.(2006).tRNAfinder:asoftwaresystemtofindalltRNAgenesin
theDNAsequencebasedonthecloverleafsecondarystructure. J.Comput.AidedChem. 7:
116–126.
König,S.,Romoth,L.W.,Gerischer,L.,andStanke,M.(2016).Simultaneousgenefindingin
multiplegenomes. Bioinformatics32:3388–3395.
Korf,I.,Flicek,P.,Duan,D.,andBrent,M.R.(2001).Integratinggenomichomologyintogene
structureprediction. Bioinformatics.17:S140–S148.
Kozak,M.(1987).Ananalysisof5 ′-noncodingsequencesfrom699vertebratemessengerRNAs.
NucleicAcidsRes. 15:8125–8148.
Krogh,A.(1997).TwomethodsforimprovingperformanceofaHMMandtheirapplicationfor
genefinding.In: ProceedingsoftheFifthInternationalConferenceonIntelligentSystemsfor
MolecularBiology,Halkidiki,Greece(21–26June1997) ,vol.5,179–186.MenloPark,CA:AAAI
Press.
Krogh,A.,Mian,I.S.,andHaussler,D.(1994).AhiddenMarkovmodelthatfindsgenesin E.coli
DNA.NucleicAcidsRes. 22:4768–4678.
Kulp,D.,Haussler,D.,Reese,M.G.,andEeckman,F.H.(1996).AgeneralizedhiddenMarkov
modelfortherecognitionofhumangenesinDNA.In: ProceedingsoftheFourthInternational
ConferenceonIntelligentSystemsforMolecularBiology ,vol.4,134–142,June12-15,1996,St.
Louis,MO.USA,AAAIPress,MenloPark,California.
Lagesen,K.,Hallin,P.,Rødland,E.A.etal.(2007).RNAmmer:consistentandrapidannotationof
ribosomalRNAgenes. NucleicAcidsRes. 35:3100–3108.
Lander,E.S.,Linton,L.M.,Birren,B.etal.(2001).Initialsequencingandanalysisofthehuman
genome.Nature409:860–921.
Larsen,T.S.andKrogh,A.(2003).EasyGene–aprokaryoticgenefinderthatranksORFsby
statisticalsignificance. BMCBioinf 4:21.
Lee,E.,Helt,G.A.,Reese,J.T.etal.(2013).WebApollo:aweb-basedgenomicannotationediting
platform.GenomeBiol. 14:R93.
Li,W.,Zhang,P.,Fellers,J.P.etal.(2004).Sequencecomposition,organization,andevolutionof
thecoreTriticeaegenome. PlantJ. 40:500–511.
Lifton,R.P.,Goldberg,M.L.,Karp,R.W.,andHogness,D.S.(1978).Theorganizationofthehistone
genesin Drosophilamelanogaster:functionalandevolutionaryimplications. ColdSpring
HarborSymp.Quant.Biol. 42:1047–1051.
Little,J.W.(2005).Lysogeny,prophageinduction,andlysogenicconversion.In: Phages:TheirRole
inBacterialPathogenesisandBiotechnology (eds.M.K.Waldor,D.I.FriedmanandS.L.Adhya),
37–54.Washington,DC:ASMPress.
Lowe,T.M.andEddy,S.R.(1997).tRNAscan-SE:aprogramforimproveddetectionoftransfer
RNAgenesingenomicsequence. NucleicAcidsRes. 25:955–964.
Lukashin,A.V.andBorodovsky,M.(1998).GeneMark.hmm:newsolutionsforgenefinding.
NucleicAcidsRes. 26:1107–1115.
Lunter,G.andGoodson,M.(2011).Stampy:astatisticalalgorithmforsensitiveandfastmapping
ofIlluminasequencereads. GenomeRes. 21:936–939.
Macke,T.J.,Ecker,D.J.,Gutell,R.R.etal.(2001).RNAMotif,anRNAsecondarystructure
definitionandsearchalgorithm. NucleicAcidsRes. 29:4724–4735.
Meyer,I.M.andDurbin,R.(2002).Comparativeabinitiopredictionofgenestructuresusingpair
HMMs.Bioinformatics18:1309–1318.
Naik,P.K.,Mittal,V.K.,andGupta,S.(2008).RetroPred:atoolforprediction,classificationand
extractionofnon-LTRretrotransposons(LINEs&SINEs)fromthegenomebyintegrating
PALS,PILER,MEMEandANN. Bioinformation2:263–270.
Overbeek,R.,Olson,R.,Pusch,G.D.etal.(2014).TheSEEDandtherapidannotationofmicrobial
genomesusingsubsystemstechnology(RAST). NucleicAcidsRes. 42(Databaseissue):
D206–D214.
Parra,G.,Agarwal,P.,Abril,J.F.etal.(2003).Comparativegenepredictioninhumanandmouse.
GenomeRes. 13:108–117.
152 Genome Annotation
Pennisi,E.(2003).Bioinformatics.Genecountersstruggletogettherightanswer. Science.301:
1040–1041.
Pertea,M.,Pertea,G.M.,Antonescu,C.M.etal.(2015).StringTieenablesimprovedreconstruction
ofatranscriptomefromRNA-seqreads. Nat.Biotechnol. 33:290–295.
Pribnow,D.(1975).NucleotidesequenceofanRNApolymerasebindingsiteatanearlyT7
promoter.Proc.Natl.Acad.Sci.USA. 72:784–788.
Price,A.L.,Jones,N.C.,andPevzner,P.A.(2005).Denovoidentificationofrepeatfamiliesinlarge
genomes.Bioinformatics21(Suppl1):i351–i358.
Riley,M.,Abe,T.,Arnaud,M.B.etal.(2006). Escherichiacoli K-12:acooperativelydeveloped
annotationsnapshot–2005. NucleicAcidsRes. 34:1–9.
Rogic,S.,Mackworth,A.K.,andOuellette,F.B.F.(2001).Evaluationofgene-findingprogramson
mammaliansequences. GenomeRes. 11:817–832.
Sakharkar,M.,Passetti,F.,deSouza,J.E.etal.(2002).ExInt:anexonintrondatabase. Nucleic
AcidsRes. 30:191–194.
Sallet,E.,Gouzy,J.,andSchiex,T.(2014).EuGene-PP:anext-generationautomatedannotation
pipelineforprokaryoticgenomes. Bioinformatics30:2659–2661.
Schweikert,G.,Behr,J.,Zien,A.etal.(2009).mGene.web:awebserviceforaccurate
computationalgenefinding. NucleicAcidsRes. 37(WebServerissue):W312–W316.
Seemann,T.(2014).Prokka:rapidprokaryoticgenomeannotation. Bioinformatics30:2068–2069.
Shine,J.andDalgarno,L.(1975).Determinantofcistronspecificityinbacterialribosomes. Nature
254:34–38.
Simão,F.A.,Waterhouse,R.M.,Ioannidis,P.etal.(2015).BUSCO:assessinggenomeassemblyand
annotationcompletenesswithsingle-copyorthologs. Bioinformatics.31:3210–3212.
Slater,G.S.andBirney,E.(2005).Automatedgenerationofheuristicsforbiologicalsequence
comparison.BMCBioinf 6:31.
Slupska,M.M.,King,A.G.,Fitz-Gibbon,S.etal.(2001).Leaderlesstranscriptsofthecrenarchaeal
hyperthermophilePyrobaculumaerophilum. J.Mol.Biol. 309:347–360.
Souvorov,A.,Kapustin,Y.,Kiryutin,B.etal.(2010).Gnomon–NCBIeukaryoticgeneprediction
tool.NatlCent.Biotechnol.Inf. 2010:1–24.
Sperisen,P.,Iseli,C.,Pagni,M.etal.(2004).Trome,trESTandtrGEN:databasesofpredicted
proteinsequences. NucleicAcidsRes. 32(Databaseissue):D509–D511.
Steijger,T.,Abril,J.F.,Engström,P.G.etal.,andRGASPConsortium(2013).Assessmentof
transcriptreconstructionmethodsforRNA-seq. Nat.Methods 10:1177–1184.
Stothard,P.andWishart,D.S.(2005).Circulargenomevisualizationandexplorationusing
CGView.Bioinformatics21:537–539.
Subramanian,S.,Mishra,R.K.,andSingh,L.(2003).Genome-wideanalysisofmicrosatellite
repeatsinhumans:theirabundanceanddensityinspecificgenomicregions. GenomeBiol. 4:
R13.
Tarailo-Graovac,M.andChen,N.(2009).UsingRepeatMaskertoidentifyrepetitiveelementsin
genomicsequences. Curr.ProtocBioinformatics .Chapter4,Unit4.10.
Taruscio,D.andMantovani,A.(2004).Factorsregulatingendogenousretroviralsequencesin
humanandmouse. Cytogenet.GenomeRes. 105:351–362.
Thibaud-Nissen,F.,DiCuccio,M.,Hlavina,W.etal.(2016).TheNCBIeukaryoticgenome
annotationpipeline. J.Anim.Sci. 94(Suppl4)):184.
Trapnell,C.,Pachter,L.,andSalzberg,S.L.(2009).TopHat:discoveringsplicejunctionswith
RNA-seq.Bioinformatics25:1105–1111.
Trapnell,C.,Roberts,A.,Goff,L.etal.(2012).Differentialgeneandtranscriptexpressionanalysis
ofRNA-seqexperimentswithTopHatandcufflinks. Nat.Protoc. 7:562–578.
VanDomselaar,G.H.,Stothard,P.,Shrivastava,S.etal.(2005).BASys:awebserverforautomated
bacterialgenomeannotation. NucleicAcidsRes. 33(WebServerissue):W455–W459.
Wang,Z.,Gerstein,M.,andSnyder,M.(2009).RNA-seq:arevolutionarytoolfortranscriptomics.
Nat.Rev.Genet. 10:57–63.
References 153
Waterhouse,R.M.,Tegenfeldt,F.,Li,J.etal.(2013).OrthoDB:ahierarchicalcatalogofanimal,
fungalandbacterialorthologs. NucleicAcidsRes. 41(Databaseissue):D358–D365.
Wegrzyn,J.L.,Liechty,J.D.,Stevens,K.A.etal.(2014).Uniquefeaturesoftheloblollypine( Pinus
taedaL.)megagenomerevealedthroughsequenceannotation. Genetics196:891–909.
Westesson,O.,Skinner,M.,andHolmes,I.(2013).Visualizingnext-generationsequencingdata
withJBrowse. BriefingsBioinf. 14:172–177.
Wheeler,T.J.,Clements,J.,Eddy,S.R.etal.(2013).Dfam:adatabaseofrepetitiveDNAbasedon
profilehiddenMarkovmodels. NucleicAcidsRes. 41(Databaseissue):D70–D82.
Will,C.L.andLührmann,R.(2011).Spliceosomestructureandfunction. ColdSpringHarbor
Perspect.Biol. 3(7),pii:a003707.
Winsor,G.L.,Lo,R.,HoSui,S.J.etal.(2005).Pseudomonasaeruginosagenomedatabaseand
PseudoCAP:facilitatingcommunity-based,continuallyupdated,genomeannotation. Nucleic
AcidsRes. 33(Databaseissue):D338–D343.
Wootton,J.C.andFederhen,S.(1993).Statisticsoflocalcomplexityinaminoacidsequencesand
sequencedatabases. Comput.Chem. 17:149–163.
Wu,T.D.,Reeder,J.,Lawrence,M.etal.(2016).GMAPandGSNAPforgenomicsequence
alignment:enhancementstospeed,accuracy,andfunctionality. MethodsMol.Biol. 1418:
283–334.
Xu,Z.andWang,H.(2007).LTR_FINDER:anefficienttoolforthepredictionoffull-lengthLTR
retrotransposons.NucleicAcidsRes. 35(WebServerissue):W265–W268.
Yeh,R.,Lim,L.P.,andBurge,C.(2001).Computationalinferenceofthehomologousgene
structuresinthehumangenome. GenomeRes. 11:803–816.
Zhang,M.Q.(2002).Computationalpredictionofeukaryoticproteincodinggenes. Nat.Rev.Genet.
3:698–709.
Zhou,Y.,Liang,Y.,Lynch,K.H.etal.(2011).PHAST:afastphagesearchtool. NucleicAcidsRes. 39
(WebServerissue):W347–W352.
Zhu,H.,Hu,G.,Yang,Y.etal.(2007).MED:anewnon-supervisedgenepredictionalgorithmfor
bacterialandarchaealgenomes. BMCBioinf 8:97.
<!-- END SUPPLEMENTAL BACK MATTER: PDF pages 166-173 -->