UniRef: comprehensive and non-redundant UniProt reference clusters
- PMID: 17379688
- DOI: 10.1093/bioinformatics/btm098
UniRef: comprehensive and non-redundant UniProt reference clusters
Abstract
Motivation: Redundant protein sequences in biological databases hinder sequence similarity searches and make interpretation of search results difficult. Clustering of protein sequence space based on sequence similarity helps organize all sequences into manageable datasets and reduces sampling bias and overrepresentation of sequences.
Results: The UniRef (UniProt Reference Clusters) provide clustered sets of sequences from the UniProt Knowledgebase (UniProtKB) and selected UniProt Archive records to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences. Currently covering >4 million source sequences, the UniRef100 database combines identical sequences and subfragments from any source organism into a single UniRef entry. UniRef90 and UniRef50 are built by clustering UniRef100 sequences at the 90 or 50% sequence identity levels. UniRef100, UniRef90 and UniRef50 yield a database size reduction of approximately 10, 40 and 70%, respectively, from the source sequence set. The reduced redundancy increases the speed of similarity searches and improves detection of distant relationships. UniRef entries contain summary cluster and membership information, including the sequence of a representative protein, member count and common taxonomy of the cluster, the accession numbers of all the merged entries and links to rich functional annotation in UniProtKB to facilitate biological discovery. UniRef has already been applied to broad research areas ranging from genome annotation to proteomics data analysis.
Availability: UniRef is updated biweekly and is available for online search and retrieval at http://www.uniprot.org, as well as for download at ftp://ftp.uniprot.org/pub/databases/uniprot/uniref.
Supplementary information: Supplementary data are available at Bioinformatics online.
Similar articles
-
UniProtKB/Swiss-Prot.Methods Mol Biol. 2007;406:89-112. doi: 10.1007/978-1-59745-535-0_4. Methods Mol Biol. 2007. PMID: 18287689
-
The Universal Protein Resource (UniProt): an expanding universe of protein information.Nucleic Acids Res. 2006 Jan 1;34(Database issue):D187-91. doi: 10.1093/nar/gkj161. Nucleic Acids Res. 2006. PMID: 16381842 Free PMC article.
-
UniProt: the Universal Protein knowledgebase.Nucleic Acids Res. 2004 Jan 1;32(Database issue):D115-9. doi: 10.1093/nar/gkh131. Nucleic Acids Res. 2004. PMID: 14681372 Free PMC article.
-
In silico characterization of proteins: UniProt, InterPro and Integr8.Mol Biotechnol. 2008 Feb;38(2):165-77. doi: 10.1007/s12033-007-9003-x. Epub 2007 Oct 4. Mol Biotechnol. 2008. PMID: 18219596 Review.
-
UniProt and Mass Spectrometry-Based Proteomics-A 2-Way Working Relationship.Mol Cell Proteomics. 2023 Aug;22(8):100591. doi: 10.1016/j.mcpro.2023.100591. Epub 2023 Jun 8. Mol Cell Proteomics. 2023. PMID: 37301379 Free PMC article. Review.
Cited by
-
Phage-encoded ribosomal protein S21 expression is linked to late-stage phage replication.ISME Commun. 2022 Mar 30;2(1):31. doi: 10.1038/s43705-022-00111-w. ISME Commun. 2022. PMID: 37938675 Free PMC article.
-
Hyperexpansion of genetic diversity and metabolic capacity of extremophilic bacteria and archaea in ancient Andean lake sediments.Microbiome. 2024 Sep 17;12(1):176. doi: 10.1186/s40168-024-01878-x. Microbiome. 2024. PMID: 39300577 Free PMC article.
-
Genomes of Endotrypanum monterogeii from Panama and Zelonia costaricensis from Brazil: Expansion of Multigene Families in Leishmaniinae Parasites That Are Close Relatives of Leishmania spp.Pathogens. 2023 Nov 30;12(12):1409. doi: 10.3390/pathogens12121409. Pathogens. 2023. PMID: 38133293 Free PMC article.
-
Aquifer environment selects for microbial species cohorts in sediment and groundwater.ISME J. 2015 Aug;9(8):1846-56. doi: 10.1038/ismej.2015.2. Epub 2015 Feb 3. ISME J. 2015. PMID: 25647349 Free PMC article.
-
Diverse Microorganisms in Sediment and Groundwater Are Implicated in Extracellular Redox Processes Based on Genomic Analysis of Bioanode Communities.Front Microbiol. 2020 Jul 28;11:1694. doi: 10.3389/fmicb.2020.01694. eCollection 2020. Front Microbiol. 2020. PMID: 32849356 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources