Skip to main page content
U.S. flag

An official website of the United States government

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Dec 28:5:2926.
doi: 10.12688/f1000research.10411.2. eCollection 2016.

A Bioconductor workflow for processing and analysing spatial proteomics data

Affiliations

A Bioconductor workflow for processing and analysing spatial proteomics data

Lisa M Breckels et al. F1000Res. .

Abstract

Spatial proteomics is the systematic study of protein sub-cellular localisation. In this workflow, we describe the analysis of a typical quantitative mass spectrometry-based spatial proteomics experiment using the MSnbase and pRoloc Bioconductor package suite. To walk the user through the computational pipeline, we use a recently published experiment predicting protein sub-cellular localisation in pluripotent embryonic mouse stem cells. We describe the software infrastructure at hand, importing and processing data, quality control, sub-cellular marker definition, visualisation and interactive exploration. We then demonstrate the application and interpretation of statistical learning methods, including novelty detection using semi-supervised learning, classification, clustering and transfer learning and conclude the pipeline with data export. The workflow is aimed at beginners who are familiar with proteomics in general and spatial proteomics in particular.

Keywords: Bioconductor; R Package; machine learning; mass spectromery; protein sub-cellular localisation; proteomics; spatial proteomics; transfer learning.

PubMed Disclaimer

Conflict of interest statement

Competing interests: No competing interests were dislcosed.

Figures

Figure 1.
Figure 1.. Schematic overview of the pRoloc pipeline from data import, through to data processing, machine learning and data export.
Figure 2.
Figure 2.. Simplified representation of the MSnSet data structure (reproduced with permission from the MSnbase vignette).
Figure 3.
Figure 3.. A screenshot of the data in the spreadsheet.
Figure 4.
Figure 4.. Heatmap of missing values.
Note that the features are re-ordered to highlight clusters of proteins with similar numbers of missing values.
Figure 5.
Figure 5.. PCA plot of the mouse stem cell data hl.
Each dot represents a single protein, and cluster of proteins represent proteins residing in the same sub-cellular niche. The figure on the right bins proteins and represent the bins density to highlight the presence of protein clusters.
Figure 6.
Figure 6.. Protein profiles and distribution of channel intensities.
The red dots represent the mean relative intensity for each channel.
Figure 7.
Figure 7.. Annotated PCA plots of the hl dataset, as produced with plot2D.
Figure 8.
Figure 8.. Mitochondrion and peroxisome protein profiles.
Figure 9.
Figure 9.. Using the plot3D function to visualise the hl dataset along PCs 1, 2 and 7.
Figure 10.
Figure 10.. Highlighting protein features of interest.
Figure 11.
Figure 11.. PCA plots of replicates 1 and 2.
Figure 12.
Figure 12.. A screen shot of clickable interface and zoomable PCA plot of the main app in the pRolocGUI package.
Figure 13.
Figure 13.. The compare application, main panel.
Figure 14.
Figure 14.. Results of the novelty detection algorithm.
Figure 15.
Figure 15.. Assessment of the classification model parameter optimisation.
Figure 16.
Figure 16.. Classification results.
Colours indicate class membership and point size are representative of the classification confidence.
Figure 17.
Figure 17.. Visualistion of class-specific classification score distribution.
Figure 18.
Figure 18.. Results of the localisation preductions after thresholding.
Figure 19.
Figure 19.. The classify application enable the interactive exploration of classification score thresholding.
Figure 20.
Figure 20.. Visualisation of the transfer learning parameter optimisation procedure.
Each row displays the frequency of observed weights (along the columns) for a specific sub-cellular class, with large dots representing higher observation frequencies.
Figure 21.
Figure 21.. Hierarchical clustering of the average marker profiles summarising the relation between organelles profiles.

Similar articles

Cited by

References

    1. Gatto L, Vizcaíno JA, Hermjakob H, et al. : Organelle proteomics experimental designs and analysis. Proteomics. 2010;10(22):3957–69. 10.1002/pmic.201000244 - DOI - PubMed
    1. Christoforou A, Mulvey CM, Breckels LM, et al. : A draft map of the mouse pluripotent stem cell spatial proteome. Nat Commun. 2016;7: 8992. 10.1038/ncomms9992 - DOI - PMC - PubMed
    1. Itzhak DN, Tyanova S, Cox J, et al. : Global, quantitative and dynamic mapping of protein subcellular localization. eLife. 2016;5: pii: e16950. 10.7554/eLife.16950 - DOI - PMC - PubMed
    1. Jean Beltran PM, Mathias RA, Cristea IM: A Portrait of the Human Organelle Proteome In Space and Time During Cytomegalovirus Infection. Cell Syst. 2016;3(4):361–373.e6. 10.1016/j.cels.2016.08.012 - DOI - PMC - PubMed
    1. Itzhak DN, Davies C, Tyanova S, et al. : A Mass Spectrometry-Based Approach for Mapping Protein Subcellular Localization Reveals the Spatial Proteome of Mouse Primary Neurons. Cell Rep. 2017;20(11):2706–2718. 10.1016/j.celrep.2017.08.063 - DOI - PMC - PubMed

LinkOut - more resources