Assessment of protein coding measures
- PMID: 1480466
- PMCID: PMC334555
- DOI: 10.1093/nar/20.24.6441
Assessment of protein coding measures
Abstract
A number of methods for recognizing protein coding genes in DNA sequence have been published over the last 13 years, and new, more comprehensive algorithms, drawing on the repertoire of existing techniques, continue to be developed. To optimize continued development, it is valuable to systematically review and evaluate published techniques. At the core of most gene recognition algorithms is one or more coding measures--functions which produce, given any sample window of sequence, a number or vector intended to measure the degree to which a sample sequence resembles a window of 'typical' exonic DNA. In this paper we review and synthesize the underlying coding measures from published algorithms. A standardized benchmark is described, and each of the measures is evaluated according to this benchmark. Our main conclusion is that a very simple and obvious measure--counting oligomers--is more effective than any of the more sophisticated measures. Different measures contain different information. However there is a great deal of redundancy in the current suite of measures. We show that in future development of gene recognition algorithms, attention can probably be limited to six of the twenty or so measures proposed to date.
Similar articles
-
Locating protein coding regions in human DNA using a decision tree algorithm.J Comput Biol. 1995 Fall;2(3):473-85. doi: 10.1089/cmb.1995.2.473. J Comput Biol. 1995. PMID: 8521276
-
A new fourier transform approach for protein coding measure based on the format of the Z curve.Bioinformatics. 1998;14(8):685-90. doi: 10.1093/bioinformatics/14.8.685. Bioinformatics. 1998. PMID: 9789094
-
A Fourier characteristic of coding sequences: origins and a non-Fourier approximation.J Comput Biol. 2005 Nov;12(9):1153-65. doi: 10.1089/cmb.2005.12.1153. J Comput Biol. 2005. PMID: 16305326
-
Prediction of function in DNA sequence analysis.J Comput Biol. 1995 Spring;2(1):87-115. doi: 10.1089/cmb.1995.2.87. J Comput Biol. 1995. PMID: 7497122 Review.
-
Engineering Aspects of Olfaction.In: Persaud KC, Marco S, Gutiérrez-Gálvez A, editors. Neuromorphic Olfaction. Boca Raton (FL): CRC Press/Taylor & Francis; 2013. Chapter 1. In: Persaud KC, Marco S, Gutiérrez-Gálvez A, editors. Neuromorphic Olfaction. Boca Raton (FL): CRC Press/Taylor & Francis; 2013. Chapter 1. PMID: 26042329 Free Books & Documents. Review.
Cited by
-
LncRNApred: Classification of Long Non-Coding RNAs and Protein-Coding Transcripts by the Ensemble Algorithm with a New Hybrid Feature.PLoS One. 2016 May 26;11(5):e0154567. doi: 10.1371/journal.pone.0154567. eCollection 2016. PLoS One. 2016. PMID: 27228152 Free PMC article.
-
Ab initio gene finding in Drosophila genomic DNA.Genome Res. 2000 Apr;10(4):516-22. doi: 10.1101/gr.10.4.516. Genome Res. 2000. PMID: 10779491 Free PMC article.
-
Detecting and analyzing DNA sequencing errors: toward a higher quality of the Bacillus subtilis genome sequence.Genome Res. 1999 Nov;9(11):1116-27. doi: 10.1101/gr.9.11.1116. Genome Res. 1999. PMID: 10568751 Free PMC article.
-
Evaluation of gene-finding programs on mammalian sequences.Genome Res. 2001 May;11(5):817-32. doi: 10.1101/gr.147901. Genome Res. 2001. PMID: 11337477 Free PMC article.
-
LncFinder: an integrated platform for long non-coding RNA identification utilizing sequence intrinsic composition, structural information and physicochemical property.Brief Bioinform. 2019 Nov 27;20(6):2009-2027. doi: 10.1093/bib/bby065. Brief Bioinform. 2019. PMID: 30084867 Free PMC article. Review.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Medical