Automatic annotation of eukaryotic genes, pseudogenes and promoters

doi:10.1186/gb-2006-7-s1-s10

. 2006;7 Suppl 1(Suppl 1):S10.1-12.

doi: 10.1186/gb-2006-7-s1-s10. Epub 2006 Aug 7.

Automatic annotation of eukaryotic genes, pseudogenes and promoters

Victor Solovyev¹, Peter Kosarev, Igor Seledsov, Denis Vorobyev

Affiliations

PMID: 16925832
PMCID: PMC1810547
DOI: 10.1186/gb-2006-7-s1-s10

Automatic annotation of eukaryotic genes, pseudogenes and promoters

Victor Solovyev et al. Genome Biol. 2006.

. 2006;7 Suppl 1(Suppl 1):S10.1-12.

doi: 10.1186/gb-2006-7-s1-s10. Epub 2006 Aug 7.

Authors

Victor Solovyev¹, Peter Kosarev, Igor Seledsov, Denis Vorobyev

Affiliation

¹ Department of Computer Science, Royal Holloway, University of London, Egham, Surrey TW20 0EX, UK. victor@cs.rhul.ac.uk

PMID: 16925832
PMCID: PMC1810547
DOI: 10.1186/gb-2006-7-s1-s10

Abstract

Background: The ENCODE gene prediction workshop (EGASP) has been organized to evaluate how well state-of-the-art automatic gene finding methods are able to reproduce the manual and experimental gene annotation of the human genome. We have used Softberry gene finding software to predict genes, pseudogenes and promoters in 44 selected ENCODE sequences representing approximately 1% (30 Mb) of the human genome. Predictions of gene finding programs were evaluated in terms of their ability to reproduce the ENCODE-HAVANA annotation.

Results: The Fgenesh++ gene prediction pipeline can identify 91% of coding nucleotides with a specificity of 90%. Our automatic pseudogene finder (PSF program) found 90% of the manually annotated pseudogenes and some new ones. The Fprom promoter prediction program identifies 80% of TATA promoters sequences with one false positive prediction per 2,000 base-pairs (bp) and 50% of TATA-less promoters with one false positive prediction per 650 bp. It can be used to identify transcription start sites upstream of annotated coding parts of genes found by gene prediction software.

Conclusion: We review our software and underlying methods for identifying these three important structural and functional genome components and discuss the accuracy of predictions, recent advances and open problems in annotating genomic sequences. We have demonstrated that our methods can be effectively used for initial automatic annotation of the eukaryotic genome.

PubMed Disclaimer

Figures

**Figure 1**
Example of a processed pseudogene. Alignment versus protein encoded by the parent gene. Identity, 83.7%; coverage of protein sequence, 93.9%; number of internal stop codons, 2; number of frameshifts, I; K_a/K_s, 0.484.

**Figure 2**
Example of a pseudogene that has not been processed. Alignment versus protein encoded by the parent gene. Identity, 86.4%; coverage of protein sequence, 97.6%; number of internal stop codons, 3; number of frameshifts, 4; K_a/K_s, 0.594.

**Figure 3**
Pseudogene in ENm004 sequence, absent from HAVANA annotation. The alignment has a stop codon close to position 151636.

**Figure 4**
A distribution of predicted TSS relative to the start of mRNA sequences. Figures on the x-axis are centers of 100 bp intervals, for example, mark 50 corresponds to [+1,+100] interval.

**Figure 5**
A distribution of predicted TSS near the start of mRNA sequences. Figures on the x-axis are centers of 10 bp intervals, for example, mark 5 corresponds to [+1,+10] interval.

See this image and copyright information in PMC

Cited by

Transgenic Kalanchoë blossfeldiana, Containing Individual rol Genes and Open Reading Frames Under 35S Promoter, Exhibit Compact Habit, Reduced Plant Growth, and Altered Ethylene Tolerance in Flowers.
Favero BT, Tan Y, Lin Y, Hansen HB, Shadmani N, Xu J, He J, Müller R, Almeida A, Lütken H. Favero BT, et al. Front Plant Sci. 2021 May 7;12:672023. doi: 10.3389/fpls.2021.672023. eCollection 2021. Front Plant Sci. 2021. PMID: 34025708 Free PMC article.
Genomes of parasitic nematodes (Meloidogyne hapla, Meloidogyne incognita, Ascaris suum and Brugia malayi) have a reduced complement of small RNA interference pathway genes: knockdown can reduce host infectivity of M. incognita.
Iqbal S, Fosu-Nyarko J, Jones MG. Iqbal S, et al. Funct Integr Genomics. 2016 Jul;16(4):441-57. doi: 10.1007/s10142-016-0495-y. Epub 2016 Apr 28. Funct Integr Genomics. 2016. PMID: 27126863
Genomic, Transcriptomic, and Proteomic Analysis Provide Insights Into the Cold Adaptation Mechanism of the Obligate Psychrophilic Fungus Mrakia psychrophila.
Su Y, Jiang X, Wu W, Wang M, Hamid MI, Xiang M, Liu X. Su Y, et al. G3 (Bethesda). 2016 Nov 8;6(11):3603-3613. doi: 10.1534/g3.116.033308. G3 (Bethesda). 2016. PMID: 27633791 Free PMC article.
Molecular Cloning and Functional Analysis of Gene Clusters for the Biosynthesis of Indole-Diterpenes in Penicillium crustosum and P. janthinellum.
Nicholson MJ, Eaton CJ, Stärkel C, Tapper BA, Cox MP, Scott B. Nicholson MJ, et al. Toxins (Basel). 2015 Jul 23;7(8):2701-22. doi: 10.3390/toxins7082701. Toxins (Basel). 2015. PMID: 26213965 Free PMC article.
Gene Expression Patterns for Proteins With Lectin Domains in Flax Stem Tissues Are Related to Deposition of Distinct Cell Wall Types.
Petrova N, Nazipova A, Gorshkov O, Mokshina N, Patova O, Gorshkova T. Petrova N, et al. Front Plant Sci. 2021 Apr 26;12:634594. doi: 10.3389/fpls.2021.634594. eCollection 2021. Front Plant Sci. 2021. PMID: 33995436 Free PMC article.

See all "Cited by" articles

References

1. Collins FS, Green ED, Guttmacher AE, Guyer MS. A vision for the future of genomic research. Nature. 2003;422:835–847. doi: 10.1038/nature01626. - DOI - PubMed
1. The ENCODE Project Consortium The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306:636–639. doi: 10.1126/science.1105136. - DOI - PubMed
1. Guigo R, Reese MG. EGASP collaboration through competition to find human genes. Nat Methods. 2005;2:575–577. doi: 10.1038/nmeth0805-575. - DOI - PubMed
1. ENCODE Project http://genome.ucsc.edu/ENCODE/
1. Boguski MS, Lowe TM, Tolstoshev CM. dbEST - database for "expressed sequence tags". Nat Genet. 1993;4:332–333. doi: 10.1038/ng0893-332. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

[1] Collins FS, Green ED, Guttmacher AE, Guyer MS. A vision for the future of genomic research. Nature. 2003;422:835–847. doi: 10.1038/nature01626. - DOI - PubMed

[2] Collins FS, Green ED, Guttmacher AE, Guyer MS. A vision for the future of genomic research. Nature. 2003;422:835–847. doi: 10.1038/nature01626. - DOI - PubMed

[3] The ENCODE Project Consortium The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306:636–639. doi: 10.1126/science.1105136. - DOI - PubMed

[4] The ENCODE Project Consortium The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306:636–639. doi: 10.1126/science.1105136. - DOI - PubMed

[5] Guigo R, Reese MG. EGASP collaboration through competition to find human genes. Nat Methods. 2005;2:575–577. doi: 10.1038/nmeth0805-575. - DOI - PubMed

[6] Guigo R, Reese MG. EGASP collaboration through competition to find human genes. Nat Methods. 2005;2:575–577. doi: 10.1038/nmeth0805-575. - DOI - PubMed

[7] ENCODE Project http://genome.ucsc.edu/ENCODE/

[8] ENCODE Project http://genome.ucsc.edu/ENCODE/

[9] Boguski MS, Lowe TM, Tolstoshev CM. dbEST - database for "expressed sequence tags". Nat Genet. 1993;4:332–333. doi: 10.1038/ng0893-332. - DOI - PubMed

[10] Boguski MS, Lowe TM, Tolstoshev CM. dbEST - database for "expressed sequence tags". Nat Genet. 1993;4:332–333. doi: 10.1038/ng0893-332. - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Automatic annotation of eukaryotic genes, pseudogenes and promoters

Affiliation

Automatic annotation of eukaryotic genes, pseudogenes and promoters

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Miscellaneous