Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jan;35(Database issue):D291-7.
doi: 10.1093/nar/gkl959. Epub 2006 Nov 29.

The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution

Affiliations

The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution

Lesley H Greene et al. Nucleic Acids Res. 2007 Jan.

Abstract

We report the latest release (version 3.0) of the CATH protein domain database (http://www.cathdb.info). There has been a 20% increase in the number of structural domains classified in CATH, up to 86 151 domains. Release 3.0 comprises 1110 fold groups and 2147 homologous superfamilies. To cope with the increases in diverse structural homologues being determined by the structural genomics initiatives, more sensitive methods have been developed for identifying boundaries in multi-domain proteins and for recognising homologues. The CATH classification update is now being driven by an integrated pipeline that links these automated procedures with validation steps, that have been made easier by the provision of information rich web pages summarising comparison scores and relevant links to external sites for each domain being classified. An analysis of the population of domains in the CATH hierarchy and several domain characteristics are presented for version 3.0. We also report an update of the CATH Dictionary of homologous structures (CATH-DHS) which now contains multiple structural alignments, consensus information and functional annotations for 1459 well populated superfamilies in CATH. CATH is directly linked to the Gene3D database which is a projection of CATH structural data onto approximately 2 million sequences in completed genomes and UniProt.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Annual decrease in the percentage of new structures classified in CATH which are observed to possess a novel fold. The raw data for years 1972–2005 was fit to a single exponential equation by nonlinear regression using Sigma Plot (SPSS, Version 9.0) and the fit is shown as a solid black line. The inset shows a close-up of the raw data for new topologies over the years 1980–2005. For comparison, the numbers of structural domains solved each year and deposited in the PDB and classified in CATH is depicted in the dashed line.
Figure 2
Figure 2
Annual proportion of protein structures deposited in the PDB which are classified in CATH, rejected or pending classification. The colour scheme reflects different categories of PDB chains. Black: not accepted by the CATH criteria; Red: unprocessed chains; Dark green: cumulative count of all chains processed in CATH release 2.6. Light green: cumulative count of all chains processed in CATH release 3.0.
Figure 3
Figure 3
Flow diagram of the CATH classification pipeline. This schematic illustrates the processes involved in classifying newly determined structures in CATH. The CATH update protocol workflow from new chain to assigned domain is split into two main processes; DomChop where chains are divided into domains and HomCheck where domains are classified into homologous families. Grey boxes denote production of meta-data, red denotes algorithms, blue denotes workflow decision, yellow denotes manual process. Definition of abbreviations and terms are as follows: NW, Needleman–Wunsch (23) sequence alignment algorithm; HMM, hidden Markov model (11); ChopClose, program which determines domain boundaries based on sequence identity with domains in CATH (Lewis T.E. et al. unpublished); DomChop, manual validation of domain boundary assignment; HomCheck, manual validation of homology assignment; CATHEDRAL (4), structure comparison program.
Figure 4
Figure 4
Relationship between sequence variability, structural variability and functional diversity in CATH superfamilies. Structural variation in a CATH superfamily as measured by the number of diverse structural subgroups (SSAP score <80 between groups) is plotted against sequence diversity as measured by the number of sequence diverse subfamilies in the CATH-DHS (<35% sequence identity between groups). The colour of each point reflects the number of functions identified in that superfamily using GO as follows: white (0–25), yellow (26–50), red (51–100), maroon (101–200), black (200+).

Similar articles

Cited by

References

    1. Todd A.E., Marsden R.L., Thornton J.M., Orengo C.A. Progress of structural genomics initiatives: an analysis of solved target structures. J. Mol. Biol. 2005;348:1235–1260. - PubMed
    1. Chandonia J.M., Brenner S.E. The impact of structural genomics: expectations and outcomes. Science. 2006;311:347–351. - PubMed
    1. Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N., Bourne P.E. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–42. - PMC - PubMed
    1. Pearl F.M., Bennett C.F., Bray J.E., Harrison A.P., Martin N., Shepherd A., Sillitoe I., Thornton J., Orengo C.A. The CATH database: an extended protein family resource for structural and functional genomics. Nucleic Acids Res. 2003;31:452–455. - PMC - PubMed
    1. Pearl F., Todd A., Sillitoe I., Dibley M., Redfern O., Lewis T., Bennett C., Marsden R., Grant A., Lee D., et al. The CATH domain structure database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res. 2005;33:247–251. - PMC - PubMed

Publication types