Computational identification of protein coding potential of conserved sequence tags through cross-species evolutionary analysis

doi:10.1093/nar/gkg483

. 2003 Aug 1;31(15):4639-45.

doi: 10.1093/nar/gkg483.

Computational identification of protein coding potential of conserved sequence tags through cross-species evolutionary analysis

Flavio Mignone¹, Giorgio Grillo, Sabino Liuni, Graziano Pesole

Affiliations

PMID: 12888525
PMCID: PMC169873
DOI: 10.1093/nar/gkg483

Computational identification of protein coding potential of conserved sequence tags through cross-species evolutionary analysis

Flavio Mignone et al. Nucleic Acids Res. 2003.

. 2003 Aug 1;31(15):4639-45.

doi: 10.1093/nar/gkg483.

Authors

Flavio Mignone¹, Giorgio Grillo, Sabino Liuni, Graziano Pesole

Affiliation

¹ Dipartimento di Fisiologia e Biochimica Generali, Università di Milano, Via Celoria 26, 20133 Milano, Italy.

PMID: 12888525
PMCID: PMC169873
DOI: 10.1093/nar/gkg483

Abstract

The identification of conserved sequence tags (CSTs) through comparative genome analysis may reveal important regulatory elements involved in shaping the spatio-temporal expression of genetic information. It is well known that the most significant fraction of CSTs observed in human-mouse comparisons correspond to protein coding exons, due to their strong evolutionary constraints. As we still do not know the complete gene inventory of the human and mouse genomes it is of the utmost importance to establish if detected conserved sequences are genes or not. We propose here a simple algorithm that, based on the observation of the specific evolutionary dynamics of coding sequences, efficiently discriminates between coding and non-coding CSTs. The application of this method may help the validation of predicted genes, the prediction of alternative splicing patterns in known and unknown genes and the definition of a dictionary of non-coding regulatory elements.

PubMed Disclaimer

Figures

**Figure 1**
Distribution of CPSs from the CSTfinder analysis of the RANDOM set.

**Figure 2**
Results obtained from the CSTfinder analysis of the GENOME set showing the percentage of CSTs falling in different gene annotation categories.

**Figure 3**
Detailed result of CSTfinder analysis on five human–mouse homologous gene loci in the GENOME dataset belonging to different EnsEMBL gene classes. (A) Known gene; (B) novel gene; (C) EST gene; (D and E) GenScan predicted gene. Upper boxes represent identified CSTs (green, CPS > 500; red, 30 > CPS ≤ 500; black, CPS ≤ 30) with lower boxes corresponding to known or predicted exons. For each gene the EnsEMBL ID, the chromosome position and the coordinates (NCBI 30 release) are reported. The arrow highlights a CSTfinder predicted coding sequence missed by GenScan but coincident with a TwinScan predicted exon.

See this image and copyright information in PMC

Cited by

Molecular Functions of Long Non-Coding RNAs in Plants.
Zhu QH, Wang MB. Zhu QH, et al. Genes (Basel). 2012 Mar 8;3(1):176-90. doi: 10.3390/genes3010176. Genes (Basel). 2012. PMID: 24704849 Free PMC article.
Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes.
Lin MF, Deoras AN, Rasmussen MD, Kellis M. Lin MF, et al. PLoS Comput Biol. 2008 Apr 18;4(4):e1000067. doi: 10.1371/journal.pcbi.1000067. PLoS Comput Biol. 2008. PMID: 18421375 Free PMC article.
DG-CST (Disease Gene Conserved Sequence Tags), a database of human-mouse conserved elements associated to disease genes.
Boccia A, Petrillo M, di Bernardo D, Guffanti A, Mignone F, Confalonieri S, Luzi L, Pesole G, Paolella G, Ballabio A, Banfi S. Boccia A, et al. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D505-10. doi: 10.1093/nar/gki011. Nucleic Acids Res. 2005. PMID: 15608249 Free PMC article.
Statistical assessment of discriminative features for protein-coding and non coding cross-species conserved sequence elements.
Creanza TM, Horner DS, D'Addabbo A, Maglietta R, Mignone F, Ancona N, Pesole G. Creanza TM, et al. BMC Bioinformatics. 2009 Jun 16;10 Suppl 6(Suppl 6):S2. doi: 10.1186/1471-2105-10-S6-S2. BMC Bioinformatics. 2009. PMID: 19534745 Free PMC article.
Differentiating protein-coding and noncoding RNA: challenges and ambiguities.
Dinger ME, Pang KC, Mercer TR, Mattick JS. Dinger ME, et al. PLoS Comput Biol. 2008 Nov;4(11):e1000176. doi: 10.1371/journal.pcbi.1000176. Epub 2008 Nov 28. PLoS Comput Biol. 2008. PMID: 19043537 Free PMC article. Review.

See all "Cited by" articles

References

1. Delcher A.L., Harmon,D., Kasif,S., White,O. and Salzberg,S.L. (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res., 27, 4636–4641. - PMC - PubMed
1. Besemer J., Lomsadze,A. and Borodovsky,M. (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res., 29, 2607–2618. - PMC - PubMed
1. Burge C. and Karlin,S. (1997) Prediction of complete gene structures in human genomic DNA. J. Mol. Biol., 268, 78–94. - PubMed
1. Krogh A. (2000) Using database matches with for HMMGene for automated gene detection in Drosophila. Genome Res., 10, 523–528. - PMC - PubMed
1. Rogic S., Mackworth,A.K. and Ouellette,F.B. (2001) Evaluation of gene-finding programs on mammalian sequences. Genome Res., 11, 817–832. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions

Associated data

Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide

Grants and funding

GP0101Y01/TI_/Telethon/Italy

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

[1] Delcher A.L., Harmon,D., Kasif,S., White,O. and Salzberg,S.L. (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res., 27, 4636–4641. - PMC - PubMed

[2] Delcher A.L., Harmon,D., Kasif,S., White,O. and Salzberg,S.L. (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res., 27, 4636–4641. - PMC - PubMed

[3] Besemer J., Lomsadze,A. and Borodovsky,M. (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res., 29, 2607–2618. - PMC - PubMed

[4] Besemer J., Lomsadze,A. and Borodovsky,M. (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res., 29, 2607–2618. - PMC - PubMed

[5] Burge C. and Karlin,S. (1997) Prediction of complete gene structures in human genomic DNA. J. Mol. Biol., 268, 78–94. - PubMed

[6] Burge C. and Karlin,S. (1997) Prediction of complete gene structures in human genomic DNA. J. Mol. Biol., 268, 78–94. - PubMed

[7] Krogh A. (2000) Using database matches with for HMMGene for automated gene detection in Drosophila. Genome Res., 10, 523–528. - PMC - PubMed

[8] Krogh A. (2000) Using database matches with for HMMGene for automated gene detection in Drosophila. Genome Res., 10, 523–528. - PMC - PubMed

[9] Rogic S., Mackworth,A.K. and Ouellette,F.B. (2001) Evaluation of gene-finding programs on mammalian sequences. Genome Res., 11, 817–832. - PMC - PubMed

[10] Rogic S., Mackworth,A.K. and Ouellette,F.B. (2001) Evaluation of gene-finding programs on mammalian sequences. Genome Res., 11, 817–832. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Computational identification of protein coding potential of conserved sequence tags through cross-species evolutionary analysis

Affiliation

Computational identification of protein coding potential of conserved sequence tags through cross-species evolutionary analysis

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Associated data

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Associated data

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials