Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jan;43(Database issue):D213-21.
doi: 10.1093/nar/gku1243. Epub 2014 Nov 26.

The InterPro protein families database: the classification resource after 15 years

Affiliations

The InterPro protein families database: the classification resource after 15 years

Alex Mitchell et al. Nucleic Acids Res. 2015 Jan.

Abstract

The InterPro database (http://www.ebi.ac.uk/interpro/) is a freely available resource that can be used to classify sequences into protein families and to predict the presence of important domains and sites. Central to the InterPro database are predictive models, known as signatures, from a range of different protein family databases that have different biological focuses and use different methodological approaches to classify protein families and domains. InterPro integrates these signatures, capitalizing on the respective strengths of the individual databases, to produce a powerful protein classification resource. Here, we report on the status of InterPro as it enters its 15th year of operation, and give an overview of new developments with the database and its associated Web interfaces and software. In particular, the new domain architecture search tool is described and the process of mapping of Gene Ontology terms to InterPro is outlined. We also discuss the challenges faced by the resource given the explosive growth in sequence data in recent years. InterPro (version 48.0) contains 36,766 member database signatures integrated into 26,238 InterPro entries, an increase of over 3993 entries (5081 signatures), since 2012.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
InterPro matches for UniProtKB entry Q3JCG5 showing predicted protein family membership, domains and sites.
Figure 2.
Figure 2.
Detailed InterPro member database match data for UniProtKB entry Q3JCG5.
Figure 3.
Figure 3.
Number of entries provided by InterPro and its member databases per year.
Figure 4.
Figure 4.
The InterPro Domain Architecture tool add/remove domains pop-up window. The list of domains can be refined using either the search box (A) or drop down menu (B). Domains can be added or removed from the query using plus or minus buttons (C). The number of copies of a particular domain to add to the query is indicated (D). Selecting the Apply button (E) performs the query.
Figure 5.
Figure 5.
The InterPro Domain Architecture tool showing the results of searching with a VIT and 14-3-3 domain. Checking the ‘Order sensitivity’ option (A) means that domain order is taken into account in the results section (B). The domains can be reordered by dragging and dropping their graphical representations (C), or removed from the query by dragging them to the dustbin (D) or clicking on the [x] icon next to their name and accession (E). The InterPro accession string (F) summarizes the domain architecture composition.
Figure 6.
Figure 6.
Growth of the manually-annotated Swiss-Prot and automatically annotated TrEMBL sections of UniProtKB over the last decade.

Similar articles

  • InterPro in 2011: new developments in the family and domain prediction database.
    Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard T, Binns D, Bork P, Burge S, de Castro E, Coggill P, Corbett M, Das U, Daugherty L, Duquenne L, Finn RD, Fraser M, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, McMenamin C, Mi H, Mutowo-Muellenet P, Mulder N, Natale D, Orengo C, Pesseat S, Punta M, Quinn AF, Rivoire C, Sangrador-Vegas A, Selengut JD, Sigrist CJ, Scheremetjew M, Tate J, Thimmajanarthanan M, Thomas PD, Wu CH, Yeats C, Yong SY. Hunter S, et al. Nucleic Acids Res. 2012 Jan;40(Database issue):D306-12. doi: 10.1093/nar/gkr948. Epub 2011 Nov 16. Nucleic Acids Res. 2012. PMID: 22096229 Free PMC article.
  • InterPro, progress and status in 2005.
    Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bradley P, Bork P, Bucher P, Cerutti L, Copley R, Courcelle E, Das U, Durbin R, Fleischmann W, Gough J, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McDowall J, Mitchell A, Nikolskaya AN, Orchard S, Pagni M, Ponting CP, Quevillon E, Selengut J, Sigrist CJ, Silventoinen V, Studholme DJ, Vaughan R, Wu CH. Mulder NJ, et al. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D201-5. doi: 10.1093/nar/gki106. Nucleic Acids Res. 2005. PMID: 15608177 Free PMC article.
  • New developments in the InterPro database.
    Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Buillard V, Cerutti L, Copley R, Courcelle E, Das U, Daugherty L, Dibley M, Finn R, Fleischmann W, Gough J, Haft D, Hulo N, Hunter S, Kahn D, Kanapin A, Kejariwal A, Labarga A, Langendijk-Genevaux PS, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Nikolskaya AN, Orchard S, Orengo C, Petryszak R, Selengut JD, Sigrist CJ, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C. Mulder NJ, et al. Nucleic Acids Res. 2007 Jan;35(Database issue):D224-8. doi: 10.1093/nar/gkl841. Nucleic Acids Res. 2007. PMID: 17202162 Free PMC article.
  • The InterPro Database, 2003 brings increased coverage and new features.
    Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley RR, Courcelle E, Das U, Durbin R, Falquet L, Fleischmann W, Griffiths-Jones S, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lopez R, Letunic I, Lonsdale D, Silventoinen V, Orchard SE, Pagni M, Peyruc D, Ponting CP, Selengut JD, Servant F, Sigrist CJ, Vaughan R, Zdobnov EM. Mulder NJ, et al. Nucleic Acids Res. 2003 Jan 1;31(1):315-8. doi: 10.1093/nar/gkg046. Nucleic Acids Res. 2003. PMID: 12520011 Free PMC article.
  • In silico characterization of proteins: UniProt, InterPro and Integr8.
    Mulder NJ, Kersey P, Pruess M, Apweiler R. Mulder NJ, et al. Mol Biotechnol. 2008 Feb;38(2):165-77. doi: 10.1007/s12033-007-9003-x. Epub 2007 Oct 4. Mol Biotechnol. 2008. PMID: 18219596 Review.

Cited by

References

    1. Finn R.D., Bateman A., Clements J., Coggill P., Eberhardt R.Y., Eddy S.R., Heger A., Hetherington K., Holm L., Mistry J., et al. Pfam: the protein families database. Nucleic Acids Res. 2014;42:D222–D2230. - PMC - PubMed
    1. Attwood T.K., Coletta A., Muirhead G., Pavlopoulou A., Philippou P.B., Popov I., Romá-Mateo C., Theodosiou A., Mitchell A.L. The PRINTS database: a fine-grained protein sequence annotation and analysis resource—its status in 2012. Database. 2012;10:bas019. - PMC - PubMed
    1. Sigrist C.J.A., de Castro E., Cerutti L., Cuche B.A., Hulo N., Bridge A., Bougueleret L., Xenarios I. New and continuing developments at PROSITE. Nucleic Acids Res. 2013;41:D344–D347. - PMC - PubMed
    1. Bru C., Courcelle E., Carrère S., Beausse Y., Dalmar S., Kahn D. The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res. 2005;33:D212–D215. - PMC - PubMed
    1. Lees J.G., Lee D., Studer R.A., Dawson N.L., Sillitoe I., Das S., Yeats C., Dessailly B.H., Rentzsch R., Orengo C.A. Gene3D: multi-domain annotations for protein sequence and comparative genome analysis. Nucleic Acids Res. 2014;42:D240–D245. - PMC - PubMed

Publication types