Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(11):e50653.
doi: 10.1371/journal.pone.0050653. Epub 2012 Nov 28.

The duplicated genes database: identification and functional annotation of co-localised duplicated genes across genomes

Affiliations

The duplicated genes database: identification and functional annotation of co-localised duplicated genes across genomes

Marion Ouedraogo et al. PLoS One. 2012.

Abstract

Background: There has been a surge in studies linking genome structure and gene expression, with special focus on duplicated genes. Although initially duplicated from the same sequence, duplicated genes can diverge strongly over evolution and take on different functions or regulated expression. However, information on the function and expression of duplicated genes remains sparse. Identifying groups of duplicated genes in different genomes and characterizing their expression and function would therefore be of great interest to the research community. The 'Duplicated Genes Database' (DGD) was developed for this purpose.

Methodology: Nine species were included in the DGD. For each species, BLAST analyses were conducted on peptide sequences corresponding to the genes mapped on a same chromosome. Groups of duplicated genes were defined based on these pairwise BLAST comparisons and the genomic location of the genes. For each group, Pearson correlations between gene expression data and semantic similarities between functional GO annotations were also computed when the relevant information was available.

Conclusions: The Duplicated Gene Database provides a list of co-localised and duplicated genes for several species with the available gene co-expression level and semantic similarity value of functional annotation. Adding these data to the groups of duplicated genes provides biological information that can prove useful to gene expression analyses. The Duplicated Gene Database can be freely accessed through the DGD website at http://dgd.genouest.org.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. DGD workflow.
Description of the DGD database development process, from sequence similarity analyses and integration of gene annotation data from NCBI, Ensembl and HGNC websites to the integration and computation of functional data from GEO (Gene Expression Omnibus) and GOA (Gene Ontology Annotation).
Figure 2
Figure 2. Distribution of the number of groups of duplicated genes according to number of duplicated genes.
BTA: Bos taurus; CAF: Canis familiaris; DER: Danio rerio; ECA: Equus caballus; GGA: Gallus gallus; HSA: Homo sapiens; MMU: Mus musculus; RNO: Rattus norvegicus and SSC: Sus scrofa.
Figure 3
Figure 3. Proportion of significant correlations.
Boxplots of significant correlations of expression for duplicated genes (blue), non-duplicated genes (orange) and randomly-selected genes (yellow). (A) Correlations for all groups of genes. Means with a different letter are significantly different according to Student’s R t-tests at p<0.05 (n = 3320, 2760 and 13605, respectively). (B) Correlations according to the number of genes within groups. For every group size, the means of each type of group are significantly different (p<0.05).
Figure 4
Figure 4. Distribution of semantic similarities.
(A) Distribution of GO biological process semantic similarities in duplicated gene groups (blue) vs. randomly-selected gene groups (yellow). Means with a different letter are significantly different according to Student’s R t-tests at p<0.05. (B) Details of the same distribution with groups pooled by size. The mean of each duplicated group is significantly different from the mean of each randomly-selected genes group (p<0.05). Note: no data were available for the group with 11 genes.

Similar articles

Cited by

References

    1. Barrans JD, Ip J, Lam C-W, Hwang IL, Dzau VJ, et al. (2003) Chromosomal distribution of the human cardiovascular transcriptome. Genomics 81: 519–524. - PubMed
    1. Bortoluzzi S, Rampoldi L, Simionati B, Zimbello R, Barbon A, et al. (1998) A comprehensive, high-resolution genomic transcript map of human skeletal muscle. Genome Res 8: 817–825. - PMC - PubMed
    1. Ko MS, Threat TA, Wang X, Horton JH, Cui Y, et al. (1998) Genome-wide mapping of unselected transcripts from extraembryonic tissue of 7.5-day mouse embryos reveals enrichment in the t-complex and under-representation on the X chromosome. Hum Mol Genet 7: 1967–1978. - PubMed
    1. Minagawa S, Nakabayashi K, Fujii M, Scherer SW, Ayusawa D (2004) Functional and chromosomal clustering of genes responsive to 5-bromodeoxyuridine in human cells. Experimental Gerontology 39: 1069–1078. - PubMed
    1. Purmann A, Toedling J, Schueler M, Carninci P, Lehrach H, et al. (2007) Genomic organization of transcriptomes in mammals: Coregulation and cofunctionality. Genomics 89: 580–587. - PubMed

Publication types

Grants and funding

This work was funded by INRA, Agrocampus Ouest and the Brittany Region. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.