Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2007 Mar 9:8:86.
doi: 10.1186/1471-2105-8-86.

Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint

Affiliations
Comparative Study

Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint

Russell L Marsden et al. BMC Bioinformatics. .

Abstract

Background: Structural genomics initiatives were established with the aim of solving protein structures on a large-scale. For many initiatives, such as the Protein Structure Initiative (PSI), the primary aim of target selection is focussed towards structurally characterising protein families which, so far, lack a structural representative. It is therefore of considerable interest to gain insights into the number and distribution of these families, and what efforts may be required to achieve a comprehensive structural coverage across all protein families.

Results: In this analysis we have derived a comprehensive domain annotation of the genomes using CATH, Pfam-A and Newfam domain families. We consider what proportions of structurally uncharacterized families are accessible to high-throughput structural genomics pipelines, specifically those targeting families containing multiple prokaryotic orthologues. In measuring the domain coverage of the genomes, we show the benefits of selecting targets from both structurally uncharacterized domain families, whilst in addition, pursuing additional targets from large structurally characterised protein superfamilies.

Conclusion: This work suggests that such a combined approach to target selection is essential if structural genomics is to achieve a comprehensive structural coverage of the genomes, leading to greater insights into structure and the mechanisms that underlie protein evolution.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Coarse-grained structural coverage of domain sequences in Swiss-Prot-TrEMBL Families are ranked in order of size, largest to smallest The black line represents coverage of domain sequences by 2486 CATH and Pfam-A_struc families, whilst the grey line represents additional coverage that would be achieved by solving a structure for structurally uncharacterised Pfam-A and NewFam domain families.
Figure 2
Figure 2
a) The percentage of newly solved Pfam-A families, per year, classified as a new fold, new superfamily, or old superfamily in the CATH domain classification Values are calculated as the percentage of first-solved structures that have been classified in the CATH database b) The underlying data used in this calculation The calculated number of new folds and superfamilies are similar when using either the CATH or SCOP domain classifications.
Figure 3
Figure 3
Coarse-grained structural coverage of seven model genomes. The benefit of solving a structure for 1571 structurally uncharacterised families in E. coli across the remaining six genomes is shown in the inset table.
Figure 4
Figure 4
Size distribution of the largest 2500 structurally uncharacterised Pfam-A and NewFam domain families (clear bars) with the proportion of these largest families lacking a prokaryotic sequence (light grey infill on white bars) The size distribution of the 2500 largest Pfam-A and NewFam families with at least five prokaryotic sequences is also shown (black bars).
Figure 5
Figure 5
Fine-grained structural coverage of domain sequences in Swiss-Prot-TrEMBL Subfamilies are ranked in order of size, largest to smallest.
Figure 6
Figure 6
a) Comparison of fine-grained structural coverage of subfamilies; subfamilies with a solved structure (black line), non-structural subfamilies within structural families (dark grey line), non-structural subfamilies within non-structural families (light grey line) Also shown in the comparative coarse-grained coverage of the largest non-structural families (thin grey line) b) Number of unique prokaryotic organisms represented in structurally uncharacterised families (Pfam-A) compared to the number found within structurally uncharacterised subfamilies within CATH and Pfam-A_struc families.
Figure 7
Figure 7
The number of subfamilies in structural families against the number of those subfamilies with a solved structure Our structural and functional understanding of many of the most diverse domain families would benefit from the structural characterisation of additional family members.

Similar articles

Cited by

References

    1. Thornton J. Structural genomics takes off. Trends Biochem Sci. 2001;26:88–89. doi: 10.1016/S0968-0004(00)01765-5. - DOI - PubMed
    1. Stevens RC, Yokoyama S, Wilson IA. Global efforts in structural genomics. Science. 2001;294:89–892. doi: 10.1126/science.1066011. - DOI - PubMed
    1. Todd AE, Marsden RL, Thornton JM, Orengo CA. Progress of structural genomics initiatives: an analysis of solved target structures. J Mol Biol. 2005;348:1235–1260. doi: 10.1016/j.jmb.2005.03.037. - DOI - PubMed
    1. Chandonia JM, Brenner SE. The impact of structural genomics: expectations and outcomes. Science. 2006;311:347–351. doi: 10.1126/science.1121018. - DOI - PubMed
    1. Brenner SE, Chothia C, Hubbard TJP. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc Natl Acad Sci USA. 2001;95:6073–6078. doi: 10.1073/pnas.95.11.6073. - DOI - PMC - PubMed

Publication types

LinkOut - more resources