The prediction of exons through an analysis of spliceable open reading frames
- PMID: 1321415
- PMCID: PMC312502
- DOI: 10.1093/nar/20.13.3453
The prediction of exons through an analysis of spliceable open reading frames
Abstract
We have developed a computer program which predicts internal exons from naive genomic sequence data and which will run on any IBM-compatible 80286 (or higher) computer. The algorithm searches a sequence for 'spliceable open reading frames' (SORFs), which are open reading frames bracketed by suitable splice-recognition sequences, and then analyzes the region for codon usage. Potential exons are stratified according to the reliability of their prediction, from confidence levels 1 to 5. The program is designed to predict internal exons of length greater than 60 nucleotides. In an analysis of 116 genes of a training set, 384 out of 441 such exons (87.1%) are identified, with 280 (63.5%) of predictions matching the true exon exactly (at both 5' and 3' splice junctions and in the correct reading frame), and with 104 (23.6%) exons matching partially. In a similar analysis of 14 genes in a test set unrelated to the genes used to generate the parameters of the program, 70 out of 80 internal exons greater than 60 bp in length are identified (87.5%), with 47 completely and 23 partially matched. SORFs that partially match true internal exons share at least one splice junction with the exon, or share both splice junctions but are interpreted in an incorrect reading frame. Specificity (the percentage of SORFs that correspond to true exons) varies from 91% at confidence level 1 to 16% at confidence level 5, with an overall specificity of 35-40%. The output displays nucleotide position, confidence level, reading frame phase at the 5' and 3' ends, acceptor and donor sequences and scoring statistics and also gives an amino acid translation of the potential exon. SORFIND compares favourably with other programs currently used to predict protein-coding regions.
Similar articles
-
Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames.Nucleic Acids Res. 1994 Dec 11;22(24):5156-63. doi: 10.1093/nar/22.24.5156. Nucleic Acids Res. 1994. PMID: 7816600 Free PMC article.
-
The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames.Proc Int Conf Intell Syst Mol Biol. 1994;2:354-62. Proc Int Conf Intell Syst Mol Biol. 1994. PMID: 7584412
-
Pombe: a gene-finding and exon-intron structure prediction system for fission yeast.Yeast. 1998 Jun 15;14(8):701-10. doi: 10.1002/(SICI)1097-0061(19980615)14:8<701::AID-YEA247>3.0.CO;2-#. Yeast. 1998. PMID: 9675815
-
Exonization of transposed elements: A challenge and opportunity for evolution.Biochimie. 2011 Nov;93(11):1928-34. doi: 10.1016/j.biochi.2011.07.014. Epub 2011 Jul 26. Biochimie. 2011. PMID: 21787833 Review.
-
Using MZEF to find internal coding exons.Curr Protoc Bioinformatics. 2002 Aug;Chapter 4:Unit 4.2. doi: 10.1002/0471250953.bi0402s00. Curr Protoc Bioinformatics. 2002. PMID: 18792940 Review.
Cited by
-
Computational Identification of Novel Genes: Current and Future Perspectives.Bioinform Biol Insights. 2016 Aug 1;10:121-31. doi: 10.4137/BBI.S39950. eCollection 2016. Bioinform Biol Insights. 2016. PMID: 27493475 Free PMC article. Review.
-
Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames.Nucleic Acids Res. 1994 Dec 11;22(24):5156-63. doi: 10.1093/nar/22.24.5156. Nucleic Acids Res. 1994. PMID: 7816600 Free PMC article.
-
Construction and analysis of an hn-cDNA library derived from the p-arm of pig chromosome 12.Mamm Genome. 1996 Sep;7(9):654-6. doi: 10.1007/s003359900200. Mamm Genome. 1996. PMID: 8703117
-
Positional cloning of ZNF217 and NABC1: genes amplified at 20q13.2 and overexpressed in breast carcinoma.Proc Natl Acad Sci U S A. 1998 Jul 21;95(15):8703-8. doi: 10.1073/pnas.95.15.8703. Proc Natl Acad Sci U S A. 1998. PMID: 9671742 Free PMC article.
-
A brief review of computational gene prediction methods.Genomics Proteomics Bioinformatics. 2004 Nov;2(4):216-21. doi: 10.1016/s1672-0229(04)02028-5. Genomics Proteomics Bioinformatics. 2004. PMID: 15901250 Free PMC article. Review.
References
Publication types
MeSH terms
Substances
Associated data
- Actions
- Actions
- Actions
- Actions
- Actions
- Actions
LinkOut - more resources
Full Text Sources
Other Literature Sources