Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Dec 17:6:286-94.
doi: 10.1016/j.dib.2015.11.063. eCollection 2016 Mar.

Spiked proteomic standard dataset for testing label-free quantitative software and statistical methods

Affiliations

Spiked proteomic standard dataset for testing label-free quantitative software and statistical methods

Claire Ramus et al. Data Brief. .

Abstract

This data article describes a controlled, spiked proteomic dataset for which the "ground truth" of variant proteins is known. It is based on the LC-MS analysis of samples composed of a fixed background of yeast lysate and different spiked amounts of the UPS1 mixture of 48 recombinant proteins. It can be used to objectively evaluate bioinformatic pipelines for label-free quantitative analysis, and their ability to detect variant proteins with good sensitivity and low false discovery rate in large-scale proteomic studies. More specifically, it can be useful for tuning software tools parameters, but also testing new algorithms for label-free quantitative analysis, or for evaluation of downstream statistical methods. The raw MS files can be downloaded from ProteomeXchange with identifier PXD001819. Starting from some raw files of this dataset, we also provide here some processed data obtained through various bioinformatics tools (including MaxQuant, Skyline, MFPaQ, IRMa-hEIDI and Scaffold) in different workflows, to exemplify the use of such data in the context of software benchmarking, as discussed in details in the accompanying manuscript [1]. The experimental design used here for data processing takes advantage of the different spike levels introduced in the samples composing the dataset, and processed data are merged in a single file to facilitate the evaluation and illustration of software tools results for the detection of variant proteins with different absolute expression levels and fold change values.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
illustration of the absolute abundance of spiked proteins compared to the yeast background in the 6 last samples of the dataset. Absolute abundances were estimated using the iBAQ metric calculated by MaxQuant in workflows 6 and 7 (see below for the details of the workflows).
Fig. 2
Fig. 2
Experimental design of the data processing workflow.
Fig. 3:
Fig. 3
ROC curves plotted from the dataset to compare filtering criteria (A) or bioinformatics workflows (B). A/sensitivity-FDP curves were plotted for the data obtained from workflows 6 (quantification based on MaxQuant intensity values) by varying either the |Welch t-test difference| threshold (red), the |z-score| threshold (green) or the Welch t-test p-value threshold (blue). The Welch t-test difference, z-score or p-value were used respectively as a unique criterion to classify the proteins (full line curves), or a combinations of these filters were applied to improve the classification (dotted line curves). B/Overlaid ROC curves for the different bioinformatics workflows: proteins were classified as variant by filtering on the p-value thresholds, combined to a fixed |log2(fold change)| threshold of 1 for spectral count workflows (1–4) and to a fixed |z-score| threshold of 1 for MS intensity based workflows (5–8).

Similar articles

Cited by

References

    1. Ramus C., Hovasse A., Marcellin M., Hesse A.M., Mouton-Barbosa E., Bouyssié D. Benchmarking quantitative label-free LC-MS data processing workflows using a complex spiked proteomic standard dataset. J. Proteom. 2016;132:51–62. - PubMed
    1. Bouyssie D., Gonzalez de Peredo A., Mouton E., Albigot R., Roussel L., Ortega N. Mascot file parsing and quantification (MFPaQ), a new software to parse, validate, and quantify proteomics data generated by ICAT and SILAC mass spectrometric analyses: application to the proteomics study of membrane proteins from primary human endothelial cells. Mol. Cell. Proteom.: MCP. 2007;6:1621–1637. - PubMed
    1. Mouton-Barbosa E., Roux-Dalvai F., Bouyssie D., Berger F., Schmidt E., Righetti P.G. In-depth exploration of cerebrospinal fluid by combining peptide ligand library treatment and label-free protein quantification. Mol. Cell. Proteom.: MCP. 2010;9:1006–1021. - PMC - PubMed
    1. Cox J., Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 2008;26:1367–1372. - PubMed
    1. Cox J., Matic I., Hilger M., Nagaraj N., Selbach M., Olsen J.V. A practical guide to the MaxQuant computational platform for SILAC-based quantitative proteomics. Nat. Protoc. 2009;4:698–705. - PubMed