LOCATE: a mammalian protein subcellular localization database

Author Notes

Abstract

LOCATE is a curated, web-accessible database that houses data describing the membrane organization and subcellular localization of mouse and human proteins. Over the past 2 years, the data in LOCATE have grown substantially. The database now contains high-quality localization data for 20% of the mouse proteome and general localization annotation for nearly 36% of the mouse proteome. The proteome annotated in LOCATE is from the RIKEN FANTOM Consortium Isoform Protein Sequence sets which contains 58 128 mouse and 64 637 human protein isoforms. Other additions include computational subcellular localization predictions, automated computational classification of experimental localization image data, prediction of protein sorting signals and third party submission of literature data. Collectively, this database provides localization proteome for individual subcellular compartments that will underpin future systematic investigations of these regions. It is available at http://locate.imb.uq.edu.au/

INTRODUCTION

A cell is divided into different cellular compartments and each compartment is associated with a different range of biochemical processes; by localizing a protein to a specific compartment, or set of compartments, the cellular role of the protein can be inferred. Also critical is determining the membrane organization of individual proteins namely their topology relative to the membrane or if they are embedded in the lipid bilayer. Without this knowledge the function a protein has within the cell cannot be fully elucidated. This information provides insight into understanding hypothetical or novel proteins and can provide a more specific organellar context in which to investigate a particular protein. Historically, these data have been difficult to produce on a large scale for higher eukaryotic organisms. However, recent advances in membrane organization prediction methods and high-throughput subcellular localization assays have made it possible to generate these datasets. We used high-throughput methods to predict the membrane organization for the entire proteome and to determine the subcellular localization of a subset of the proteome. We then developed a database, LOCATE, to organize and warehouse these data.

GROWTH OF DATABASE CONTENT

The original mouse LOCATE database ( 1 ) has been updated and extended to include a human proteome. The original database content is described in detail ( 1 ) and updated features are outlined below.

Dataset

The mouse and human proteome FANTOM3 Isoform Protein Sequence set (IPS8) were generated by the RIKEN FANTOM Consortium ( 2 ). This dataset is comprised of protein sequences based on transcript sequences generated from direct sequencing of full-length transcripts. The sequenced transcripts were clustered into transcriptional units (TU) where a TU is a grouping of transcripts that arise from a single genomic locus. The mouse proteome contains 58 128 unique protein isoforms encoded by 29 682 TUs, while the human proteome contains 64 637 unique protein isoforms encoded by 26 583 TUs.

Membrane organization

Protein orientation with respect to the membrane was predicted by MemO, a high-throughput, automated pipeline, which combines publicly available feature predictors with empirically determined annotation rules ( 3 ). This allowed us to categorize proteins into five membrane organization classes based on the presence or absence of a transmembrane domain and the presence or absence of a signal peptide ( Table 1 ). Previously we have documented that an individual TU may contain protein isoforms representing more than one membrane organization class ( 4 ). The percentage of TU with variable membrane organization within these mouse and human proteome are 9.3 and 12.6%, respectively.

Table 1.

Open in new tab

Distribution of membrane organization classes and high quality localization data in LOCATE

The MemO data columns show the absolute numbers of proteins classified by MemO into each membrane organization class. The subcellular localization data columns show the number of protein isoforms that have an experimentally determined subcellular localization and the number of transcriptional units (TUs) that have a literature-mined subcellular localization as well as the total numbers of TUs and isoforms that have any subcellular localization data. Individual TU may contain protein isoforms from more than one membrane organization class ( 4 ).

Subcellular localization

Proteins with an N-terminal myc tag were expressed in HeLa cells and their subcellular localization was detected by indirect immunofluorescence ( 5 ). Representative images were collected and analyzed to determine the protein's subcellular localization. The annotations were reviewed using automatic image classification techniques ( 6 ). To date within the mouse proteome, experimental subcellular localization data originating from our group have been generated for 2068 protein isoforms representing a five-fold increase since the initial report. In addition, we have continued to generate independent subcellular localization annotations based on primary literature review ( 1 ) for 9245 proteins (3232 TUs) that represents a 1.9-fold increase. While we consider these sources of annotations to be of a high quality they are not yet comprehensive. To provide a localization description as complete as possible for any given protein, we also therefore include localization data mined from other online databases including LIFEdb ( 7 ), Mouse Genome Informatics ( 8 ), UniProt ( 9 ), ENSEMBL ( 10 ), and others. For mouse, 14 659 protein isoforms (7506 TUs) are annotated with subcellular localisation data from these sources.

In addition, we have included subcellular localization predictions for the mouse proteome from five prediction programs as reported in Sprenger et al. ( 11 ). These predictors were selected because they can be easily applied to proteome-scale datasets and they predict localization to at least nine major subcellular locations. Although we do not place high confidence in these predictions, we believe they are worth reporting to enable individuals to consider them in combination with other localization data.

In total, we have high-quality localization data for 4786 mouse TUs and 10 883 mouse protein isoforms representing 16 and 19% of the IPS8 set, respectively. Including the data of unknown quality retrieved from external sources, we report localization data for 9603 TUs and 20 766 isoforms representing nearly 36% of the mouse proteome. Table 1 shows a breakdown of the new data by membrane organization class, source, and quality.

To enable the broader community to contribute information to LOCATE we have developed a submission process to accept subcellular localization annotations based on the published literature from third parties.

IMPROVED DATA PRESENTATION

In order to improve the presentation of the different types of data we have made a number of changes and additions to the existing web pages.

Subcellular localization data

We provide data describing the observed or predicted subcellular localization of a protein from four sources: original experimental data, data mined from the primary literature, data from external databases and data from computational subcellular localization predictors. These localizations are all summarized at the top of the page describing an individual protein so that the data from each of the sources can be compared. We chose not to include predictions from localization predictors in the summary but the top hits for each of the five predictors we used are listed elsewhere on the page along with a link to the detailed output for each predictor.

The existence of localization data from each source is also annotated on the results of a BLAST search when a search is performed on the LOCATE database itself. This gives the viewer an overview of the extent of annotation of each isoform and each transcriptional unit.

Transmembrane topology and predicted motifs and domains

The membrane organization of a protein is displayed relative to the other protein domains, using the DomainDraw macromolecular feature drawing program ( 12 ). These protein schematic diagrams include Pfam (v21.0) and SCOP (v1.69) predicted domains and subcellular sorting signals based on experimentally defined motifs ( Figure 1 ). The complement of proteins with the individual protein features can be visualized ( http://locate.imb.uq.edu.au/list_motifs.shtml ).

Figure 1.

Transmembrane topology and predicted motifs and domains display.

Open in new tab Download slide

The topology of a membrane-spanning protein is of interest, especially for the proteins with multiple transmembrane domains (TMDs). We provide the membrane topology as predicted by MemO based on predicted signal peptides and TMDs. However, three of the five TMD predictors generate their own topology prediction without being informed by a signal peptide predictor. We display these topology predictions in addition to the MemO consensus topology.

LOCATION PROTEOMICS—DEFINING A SUBCELLULAR COMPARTMENTS PROTEIN COMPLEMENT

One of the key objectives of this database is to provide the protein content of a particular region of the cell, termed Location Proteome ( 13 ). Figure 2 shows the location proteomes of the major cellular compartments. We have compared the data collected from other sources with our independently annotated primary literature subcellular localization data from LOCATE. The cytoplasm (29.3% other ; 6.9% primary ) has been excluded as it contained limited representation in our annotations and proteins remaining at their site of biosynthesis do not represent an active transport event. Within these estimates each TU contributes equally and when multiple subcellular compartments were annotated each annotation was proportionally distributed. The differences between the two subcellular localization datasets have been discussed previously ( 11 ). Our primary localization annotations are based exclusively on experimental data and aim to represent the predominant subcellular localization. It does not well represent proteins that have multiple cellular localizations in the same cell or across distinct cell types and those induced into trafficking pathways by activation of cellular pathways. In contrast, the other subcellular localization dataset captures any subcellular localization without considering the relative distributions across multiple localization or the source of the annotation. Within the primary data set the largest compartment proteomes are the nuclear proteome with 38% of the proteins and the extracellular/plasma membrane proteome with 31% of the proteins. The other intracellular organelles proteomes are of a similar size mitochondria proteome 6.2%; endoplasmic reticulum proteome 7.0%; Golgi Apparatus proteome 7.1% and endosome/lysosome 5.8%. Within the other subcellular localisation data the mitochondria proteome, endoplasmic reticulum proteome and cytoskeleton proteome have higher estimates. The list of proteins within each region is accessible from the LOCATE homepage.

Figure 2.

Organelle proteomics—defining the protein complement of individual organelles.

Open in new tab Download slide

AVAILABILITY

LOCATE data can be retrieved as individual entries or downloaded as HTML, plain text, or XML files from http://locate.imb.uq.edu.au

ACKNOWLEDGEMENTS

The authors would like to acknowledge Fasheng Zhang for technical support and Emma Redhead for designing the LOCATE XML schema and XML document generator. The work was supported by funds from the Australian Research Council (ARC) and RDT is supported by a National Health and Medical Research Council of Australia R. Douglas Wright Career Development Award. Funding to pay the Open Access publication charges for this article was provided by The University of Queensland.

Conflict of interest statement . None declared.

REFERENCES

Fink

Aturaliya

Davis

Zhang

Hanson

Teasdale

Kai

Kawai

Carninci

, et al.

LOCATE: a mouse protein subcellular localization database

Nucleic Acids Res.

2006

, vol.

(pg.

D213

D217

)

Carninci

Kasukawa

Katayama

Gough

Frith

Maeda

Oyama

Ravasi

Lenhard

, et al.

The transcriptional landscape of the mammalian genome

Science

2005

, vol.

309

(pg.

1559

1563

)

Davis

Zhang

Yuan

Teasdale

MemO: a consensus approach to the annotation of a protein's membrane organization

In Silico Biol.

2006

, vol.

(pg.

387

399

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Davis

Hanson

Clark

Fink

Zhang

Kasukawa

Kai

Kawai

Carninci

, et al.

Differential use of signal peptides and membrane domains is a common occurrence in the protein output of transcriptional units

PLoS Genet.

2006

, vol.

pg.

e46

Aturaliya

Fink

Davis

Teasdale

Hanson

Miranda

Forrest

Grimmond

Suzuki

, et al.

Subcellular localization of mammalian type II membrane proteins

Traffic

2006

, vol.

(pg.

613

625

)

Hamilton

Pantelic

Hanson

Teasdale

Fast automated cell phenotype image classification

BMC Bioinformatics

2007

, vol.

pg.

110

Bannasch

Mehrle

Glatting

Pepperkok

Poustka

Wiemann

LIFEdb: a database for functional genomics experiments integrating information from external sources, and serving as a sample tracking system

Nucleic Acids Res.

2004

, vol.

(pg.

D505

D508

)

Eppig

Bult

Kadin

Richardson

Blake

Anagnostopoulos

Baldarelli

Baya

Beal

, et al.

The Mouse Genome Database (MGD): from genes to mice–a community resource for mouse biology

Nucleic Acids Res.

2005

, vol.

(pg.

D471

D475

)

Bairoch

Apweiler

Barker

Boeckmann

Ferro

Gasteiger

Huang

Lopez

, et al.

The Universal Protein Resource (UniProt)

Nucleic Acids Res.

2005

, vol.

(pg.

D154

D159

)

Hubbard

Aken

Beal

Ballester

Caccamo

Chen

Clarke

Coates

Cunningham

, et al.

Ensembl 2007

Nucleic Acids Res.

2007

, vol.

(pg.

D610

D617

)

Sprenger

Fink

Teasdale

Evaluation and comparison of mammalian subcellular localization prediction methods

BMC Bioinformatics

2006

, vol.

Suppl. 5

pg.

Fink

Hamilton

DomainDraw: A macromolecular feature drawing program

In Silico Biol.

2007

, vol.

pg.

0014

Google Scholar

OpenURL Placeholder Text

WorldCat

Murphy

Location proteomics: a systems approach to subcellular location

Biochem. Soc.Trans.

2005

, vol.

(pg.

535

538

)

Author notes

The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Month:	Total Views:
January 2017	3
February 2017	10
March 2017	12
April 2017	11
May 2017	18
June 2017	15
July 2017	10
August 2017	16
September 2017	12
October 2017	3
November 2017	4
December 2017	18
January 2018	14
February 2018	17
March 2018	18
April 2018	32
May 2018	18
June 2018	6
July 2018	20
August 2018	9
September 2018	10
October 2018	18
November 2018	23
December 2018	9
January 2019	32
February 2019	17
March 2019	38
April 2019	42
May 2019	20
June 2019	19
July 2019	16
August 2019	36
September 2019	31
October 2019	42
November 2019	30
December 2019	10
January 2020	13
February 2020	21
March 2020	18
April 2020	13
May 2020	12
June 2020	20
July 2020	20
August 2020	25
September 2020	34
October 2020	19
November 2020	12
December 2020	10
January 2021	40
February 2021	29
March 2021	39
April 2021	18
May 2021	16
June 2021	32
July 2021	16
August 2021	30
September 2021	19
October 2021	17
November 2021	16
December 2021	9
January 2022	14
February 2022	30
March 2022	17
April 2022	27
May 2022	17
June 2022	18
July 2022	17
August 2022	22
September 2022	22
October 2022	34
November 2022	18
December 2022	18
January 2023	24
February 2023	56
March 2023	15
April 2023	9
May 2023	19
June 2023	5
July 2023	17
August 2023	13
September 2023	21
October 2023	19
November 2023	32
December 2023	33
January 2024	29
February 2024	53
March 2024	31
April 2024	22
May 2024	32
June 2024	28
July 2024	40
August 2024	30
September 2024	33

Article Contents

LOCATE: a mammalian protein subcellular localization database

Abstract

INTRODUCTION

GROWTH OF DATABASE CONTENT

Dataset

Membrane organization

Subcellular localization

IMPROVED DATA PRESENTATION

Subcellular localization data

Transmembrane topology and predicted motifs and domains

LOCATION PROTEOMICS—DEFINING A SUBCELLULAR COMPARTMENTS PROTEIN COMPLEMENT

AVAILABILITY

ACKNOWLEDGEMENTS

REFERENCES

Author notes

Comments

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

LOCATE: a mammalian protein subcellular localization database

Abstract

INTRODUCTION

GROWTH OF DATABASE CONTENT

Dataset

Membrane organization

Subcellular localization

IMPROVED DATA PRESENTATION

Subcellular localization data

Transmembrane topology and predicted motifs and domains

LOCATION PROTEOMICS—DEFINING A SUBCELLULAR COMPARTMENTS PROTEIN COMPLEMENT

AVAILABILITY

ACKNOWLEDGEMENTS

REFERENCES

Author notes

Comments

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only