Mimicking cellular sorting improves prediction of subcellular localization
- PMID: 15808855
- DOI: 10.1016/j.jmb.2005.02.025
Mimicking cellular sorting improves prediction of subcellular localization
Abstract
Predicting the native subcellular compartment of a protein is an important step toward elucidating its function. Here we introduce LOCtree, a hierarchical system combining support vector machines (SVMs) and other prediction methods. LOCtree predicts the subcellular compartment of a protein by mimicking the mechanism of cellular sorting and exploiting a variety of sequence and predicted structural features in its input. Currently LOCtree does not predict localization for membrane proteins, since the compositional properties of membrane proteins significantly differ from those of non-membrane proteins. While any information about function can be used by the system, we present estimates of performance that are valid when only the amino acid sequence of a protein is known. When evaluated on a non-redundant test set, LOCtree achieved sustained levels of 74% accuracy for non-plant eukaryotes, 70% for plants, and 84% for prokaryotes. We rigorously benchmarked LOCtree in comparison to the best alternative methods for localization prediction. LOCtree outperformed all other methods in nearly all benchmarks. Localization assignments using LOCtree agreed quite well with data from recent large-scale experiments. Our preliminary analysis of a few entirely sequenced organisms, namely human (Homo sapiens), yeast (Saccharomyces cerevisiae), and weed (Arabidopsis thaliana) suggested that over 35% of all non-membrane proteins are nuclear, about 20% are retained in the cytosol, and that every fifth protein in the weed resides in the chloroplast.
Similar articles
-
Combining machine learning and homology-based approaches to accurately predict subcellular localization in Arabidopsis.Plant Physiol. 2010 Sep;154(1):36-54. doi: 10.1104/pp.110.156851. Epub 2010 Jul 20. Plant Physiol. 2010. PMID: 20647376 Free PMC article.
-
Predicting subcellular localization of proteins based on their N-terminal amino acid sequence.J Mol Biol. 2000 Jul 21;300(4):1005-16. doi: 10.1006/jmbi.2000.3903. J Mol Biol. 2000. PMID: 10891285
-
Protein subcellular localization prediction using artificial intelligence technology.Methods Mol Biol. 2008;484:435-63. doi: 10.1007/978-1-59745-398-1_27. Methods Mol Biol. 2008. PMID: 18592195
-
Supervised ensembles of prediction methods for subcellular localization.J Bioinform Comput Biol. 2009 Apr;7(2):269-85. doi: 10.1142/s0219720009004072. J Bioinform Comput Biol. 2009. PMID: 19340915 Review.
-
pLoc_bal-mPlant: Predict Subcellular Localization of Plant Proteins by General PseAAC and Balancing Training Dataset.Curr Pharm Des. 2018;24(34):4013-4022. doi: 10.2174/1381612824666181119145030. Curr Pharm Des. 2018. PMID: 30451108 Review.
Cited by
-
Identification of novel adhesins of M. tuberculosis H37Rv using integrated approach of multiple computational algorithms and experimental analysis.PLoS One. 2013 Jul 29;8(7):e69790. doi: 10.1371/journal.pone.0069790. Print 2013. PLoS One. 2013. PMID: 23922800 Free PMC article.
-
The moss genes PpSKI1 and PpSKI2 encode nuclear SnRK1 interacting proteins with homologues in vascular plants.Plant Mol Biol. 2007 Jul;64(5):559-73. doi: 10.1007/s11103-007-9176-5. Epub 2007 May 29. Plant Mol Biol. 2007. PMID: 17533513
-
Nuclear cysteine cathepsin variants in thyroid carcinoma cells.Biol Chem. 2010 Aug;391(8):923-35. doi: 10.1515/BC.2010.109. Biol Chem. 2010. PMID: 20536394 Free PMC article.
-
TESTLoc: protein subcellular localization prediction from EST data.BMC Bioinformatics. 2010 Nov 15;11:563. doi: 10.1186/1471-2105-11-563. BMC Bioinformatics. 2010. PMID: 21078192 Free PMC article.
-
ESLpred2: improved method for predicting subcellular localization of eukaryotic proteins.BMC Bioinformatics. 2008 Nov 28;9:503. doi: 10.1186/1471-2105-9-503. BMC Bioinformatics. 2008. PMID: 19038062 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Molecular Biology Databases