This repository contains code for the manuscript "Proteomic characterization of low numbers of human and murine neutrophils freshly isolated from sites of sterile inflammation".
R code for the correlation plots.
An interactive visualization of the copy number abundances for mouse and human neutrophils was created using Python 3.9, pandas and bokeh for improved exploration of the abundance data. You can download and open the html file containing the visualization dashboard in any modern web browser.
The visualization dashboard contains a searchable data table with original data from the publication and an interactive plot of protein abundance over protein rank for human and mouse neutrophils.
To filter the data table for entries of interest type a search term into the search field (e.g., "CXCR2" to filter for C-X-C chemokine receptor type 2) and hit the ENTER key. The search uses strict string matching, so it does not tolerate typos. "Unfilter" the data table by deleting your search string and hitting ENTER.
Below the search field are the total number of proteins in the dataset, the number of filtered proteins currently displayed in the table (depending on whether the data were filtered by a search, e.g., 2 for "CXCR2"), and the number of selected data points.
Below these three metrics is the data table. Select data points by clicking on table rows. You can multi-select specific entries by holding CTRL and clicking individual entries, or multi-select a range of entries by holding SHIFT and clicking the first and last entry in that range (much like multi-selecting in Excel). Deselect by clicking on a selected entry. Selected entries are highlighted in the plot on the right side.
You can select data points in the interactive plot by directly clicking into the plot (multi-select using SHIFT and clicking is also possible). On the right of the plot, you can find a tool column that provides different interaction modes. Active tools are highlighted by a blue bar. The PAN tool allows to pan the figure, BOX SELECT allows the selection of multiple data points by dragging and dropping a selection box, WHEEL ZOOM activates zooming with the mouse wheel, TAP toggles whether data points should be selectable, RESET resets the figure into its initial state, SAVE downloads the figure as .png file, and HOVER displays the information of a datapoint when hovering over a data point.
You can generate the interactive visualization dashboard for your protein abundance data. Take a look at the original data table for an example data set.
- provide the data in .xlsx spreadsheet format
- data for individual species, groups, or samples should be on separate sheets (e.g., human data on sheet one, mouse data on sheet two)
- each sheet should contain a table with features in columns and proteins in rows
- column names are located in the first row
- the columns "Rank" and "Copy number" are mandatory, other columns are optional (you could include as many features as you like)
- column names must be identical across all sheets
The code is provided as jupyter notebook file (protein_profile_vis.ipynb, jupyter website).
You can run this notebook in Google Colab without installing anything locally. Check out this Guide on how to open a jupyter notebook from GitHub.
You can also run it on your local machine using CONDA. We recommend using ANACONDA navigator for setup. Before running this script, make sure you installed and activated the provided CONDA environment (protein_profiles_env.yaml).
The first parts of the jupyter notebook set up any required python libraries and functions. Execute everything before section 4 "Read data and create the visualization". Below that section is a code cell where you specify the path to your input spreadsheet and the sheets to read ("read_data" function). The script also provides options to set a color palette for your data points, a plot title, etc. Please read the code documentation in the notebook for more information.