A dictionary-based approach for gene annotation
- PMID: 10582576
- DOI: 10.1089/106652799318364
A dictionary-based approach for gene annotation
Abstract
This paper describes a fast and fully automated dictionary-based approach to gene annotation and exon prediction. Two dictionaries are constructed, one from the nonredundant protein OWL database and the other from the dbEST database. These dictionaries are used to obtain O (1) time lookups of tuples in the dictionaries (4 tuples for the OWL database and 11 tuples for the dbEST database). These tuples can be used to rapidly find the longest matches at every position in an input sequence to the database sequences. Such matches provide very useful information pertaining to locating common segments between exons, alternative splice sites, and frequency data of long tuples for statistical purposes. These dictionaries also provide the basis for both homology determination, and statistical approaches to exon prediction.
Similar articles
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
Efficient prediction of alternative splice forms using protein domain homology.In Silico Biol. 2004;4(2):195-208. In Silico Biol. 2004. PMID: 15107023
-
Inferring gene structures in genomic sequences using pattern recognition and expressed sequence tags.Proc Int Conf Intell Syst Mol Biol. 1997;5:344-53. Proc Int Conf Intell Syst Mol Biol. 1997. PMID: 9322060
-
Computational methods for alternative splicing prediction.Brief Funct Genomic Proteomic. 2006 Mar;5(1):46-51. doi: 10.1093/bfgp/ell011. Epub 2006 Feb 20. Brief Funct Genomic Proteomic. 2006. PMID: 16769678 Review.
-
Protein annotation: detective work for function prediction.Trends Genet. 1998 Jun;14(6):248-50. doi: 10.1016/s0168-9525(98)01486-3. Trends Genet. 1998. PMID: 9635409 Review. No abstract available.
Cited by
-
Human and mouse gene structure: comparative analysis and application to exon prediction.Genome Res. 2000 Jul;10(7):950-8. doi: 10.1101/gr.10.7.950. Genome Res. 2000. PMID: 10899144 Free PMC article.
-
Improving the specificity of exon prediction using comparative genomics.BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S13. doi: 10.1186/1471-2164-9-S2-S13. BMC Genomics. 2008. PMID: 18831778 Free PMC article.
-
Current methods of gene prediction, their strengths and weaknesses.Nucleic Acids Res. 2002 Oct 1;30(19):4103-17. doi: 10.1093/nar/gkf543. Nucleic Acids Res. 2002. PMID: 12364589 Free PMC article. Review.
-
Levenshtein Distance, Sequence Comparison and Biological Database Search.IEEE Trans Inf Theory. 2021 Jun;67(6):3287-3294. doi: 10.1109/tit.2020.2996543. Epub 2020 May 21. IEEE Trans Inf Theory. 2021. PMID: 34257466 Free PMC article.
-
Gene identification in novel eukaryotic genomes by self-training algorithm.Nucleic Acids Res. 2005 Nov 28;33(20):6494-506. doi: 10.1093/nar/gki937. Print 2005. Nucleic Acids Res. 2005. PMID: 16314312 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Medical