Improved splice site detection in Genie
- PMID: 9278062
- DOI: 10.1089/cmb.1997.4.311
Improved splice site detection in Genie
Abstract
We present an improved splice site predictor for the genefinding program Genie. Genie is based on a generalized Hidden Markov Model (GHMM) that describes the grammar of a legal parse of a multi-exon gene in a DNA sequence. In Genie, probabilities are estimated for gene features by using dynamic programming to combine information from multiple content and signal sensors, including sensors that integrate matches to homologous sequences from a database. One of the hardest problems in genefinding is to determine the complete gene structure correctly. The splice site sensors are the key signal sensors that address this problem. We replaced the existing splice site sensors in Genie with two novel neural networks based on dinucleotide frequencies. Using these novel sensors, Genie shows significant improvements in the sensitivity and specificity of gene structure identification. Experimental results in tests using a standard set of annotated genes showed that Genie identified 86% of coding nucleotides correctly with a specificity of 85%, versus 80% and 84% in the older system. In further splice site experiments, we also looked at correlations between splice site scores and intron and exon lengths, as well as at the effect of distance to the nearest splice site on false positive rates.
Similar articles
-
Integrating database homology in a probabilistic gene structure model.Pac Symp Biocomput. 1997:232-44. Pac Symp Biocomput. 1997. PMID: 9390295
-
Genie--gene finding in Drosophila melanogaster.Genome Res. 2000 Apr;10(4):529-38. doi: 10.1101/gr.10.4.529. Genome Res. 2000. PMID: 10779493 Free PMC article.
-
A generalized hidden Markov model for the recognition of human genes in DNA.Proc Int Conf Intell Syst Mol Biol. 1996;4:134-42. Proc Int Conf Intell Syst Mol Biol. 1996. PMID: 8877513
-
Finding genes in DNA with a Hidden Markov Model.J Comput Biol. 1997 Summer;4(2):127-41. doi: 10.1089/cmb.1997.4.127. J Comput Biol. 1997. PMID: 9228612
-
The Gene-Finder computer tools for analysis of human and model organisms genome sequences.Proc Int Conf Intell Syst Mol Biol. 1997;5:294-302. Proc Int Conf Intell Syst Mol Biol. 1997. PMID: 9322052
Cited by
-
X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes.Mol Psychiatry. 2016 Jan;21(1):133-48. doi: 10.1038/mp.2014.193. Epub 2015 Feb 3. Mol Psychiatry. 2016. PMID: 25644381 Free PMC article.
-
Aberrant splicing in transgenes containing introns, exons, and V5 epitopes: lessons from developing an FSHD mouse model expressing a D4Z4 repeat with flanking genomic sequences.PLoS One. 2015 Mar 5;10(3):e0118813. doi: 10.1371/journal.pone.0118813. eCollection 2015. PLoS One. 2015. PMID: 25742305 Free PMC article.
-
Phenotype-genotype correlations in a pseudodominant Stargardt disease pedigree due to a novel ABCA4 deletion-insertion variant causing a splicing defect.Mol Genet Genomic Med. 2020 Jul;8(7):e1259. doi: 10.1002/mgg3.1259. Epub 2020 Apr 23. Mol Genet Genomic Med. 2020. PMID: 32627976 Free PMC article.
-
Global prevalence of hereditary thrombotic thrombocytopenic purpura determined by genetic analysis.Blood Adv. 2024 Aug 27;8(16):4386-4396. doi: 10.1182/bloodadvances.2024013421. Blood Adv. 2024. PMID: 38935915 Free PMC article.
-
A novel splice-site mutation in ATP6V0A4 gene in two brothers with distal renal tubular acidosis from a consanguineous Tunisian family.J Genet. 2014 Dec;93(3):859-63. doi: 10.1007/s12041-014-0450-4. J Genet. 2014. PMID: 25572248 No abstract available.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases