Interactive visualization of protein abundance

This repository contains code for the manuscript "Proteomic characterization of low numbers of human and murine neutrophils freshly isolated from sites of sterile inflammation".

1) Correlation Plot

R code for the correlation plots.

2) Interactive visualization

An interactive visualization of the copy number abundances for mouse and human neutrophils was created using Python 3.9, pandas and bokeh for improved exploration of the abundance data. You can download and open the html file containing the visualization dashboard in any modern web browser.

2.1) Visualization dashboard manual

The visualization dashboard contains a searchable data table with original data from the publication and an interactive plot of protein abundance over protein rank for human and mouse neutrophils.

Searching

To filter the data table for entries of interest type a search term into the search field (e.g., "CXCR2" to filter for C-X-C chemokine receptor type 2) and hit the ENTER key. The search uses strict string matching, so it does not tolerate typos. "Unfilter" the data table by deleting your search string and hitting ENTER.

Total number of proteins, filtered proteins, selected proteins

Below the search field are the total number of proteins in the dataset, the number of filtered proteins currently displayed in the table (depending on whether the data were filtered by a search, e.g., 2 for "CXCR2"), and the number of selected data points.

Interactive data table

Below these three metrics is the data table. Select data points by clicking on table rows. You can multi-select specific entries by holding CTRL and clicking individual entries, or multi-select a range of entries by holding SHIFT and clicking the first and last entry in that range (much like multi-selecting in Excel). Deselect by clicking on a selected entry. Selected entries are highlighted in the plot on the right side.

Interactive plot

You can select data points in the interactive plot by directly clicking into the plot (multi-select using SHIFT and clicking is also possible). On the right of the plot, you can find a tool column that provides different interaction modes. Active tools are highlighted by a blue bar. The PAN tool allows to pan the figure, BOX SELECT allows the selection of multiple data points by dragging and dropping a selection box, WHEEL ZOOM activates zooming with the mouse wheel, TAP toggles whether data points should be selectable, RESET resets the figure into its initial state, SAVE downloads the figure as .png file, and HOVER displays the information of a datapoint when hovering over a data point.

2.2) Running the code to generate the visualization dashboard

Input data

You can generate the interactive visualization dashboard for your protein abundance data. Take a look at the original data table for an example data set.

provide the data in .xlsx spreadsheet format
data for individual species, groups, or samples should be on separate sheets (e.g., human data on sheet one, mouse data on sheet two)
each sheet should contain a table with features in columns and proteins in rows
- column names are located in the first row
- the columns "Rank" and "Copy number" are mandatory, other columns are optional (you could include as many features as you like)
- column names must be identical across all sheets

Executing code

The code is provided as jupyter notebook file (protein_profile_vis.ipynb, jupyter website).

You can run this notebook in Google Colab without installing anything locally. Check out this Guide on how to open a jupyter notebook from GitHub.

You can also run it on your local machine using CONDA. We recommend using ANACONDA navigator for setup. Before running this script, make sure you installed and activated the provided CONDA environment (protein_profiles_env.yaml).

The first parts of the jupyter notebook set up any required python libraries and functions. Execute everything before section 4 "Read data and create the visualization". Below that section is a code cell where you specify the path to your input spreadsheet and the sheets to read ("read_data" function). The script also provides options to set a color palette for your data points, a plot title, etc. Please read the code documentation in the notebook for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
copy_number_distribution_human_mouse_105.xlsx		copy_number_distribution_human_mouse_105.xlsx
correlation_plot_human_1000.R		correlation_plot_human_1000.R
human_mouse_copy_number_plot_final.html		human_mouse_copy_number_plot_final.html
protein_profile_vis.ipynb		protein_profile_vis.ipynb
protein_profiles_env.yaml		protein_profiles_env.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Interactive visualization of protein abundance

1) Correlation Plot

2) Interactive visualization

2.1) Visualization dashboard manual

Searching

Total number of proteins, filtered proteins, selected proteins

Interactive data table

Interactive plot

2.2) Running the code to generate the visualization dashboard

Input data

Executing code

About

Releases 2

Packages

Languages

License

voidsailor/protein_abundance_visualization

Folders and files

Latest commit

History

Repository files navigation

Interactive visualization of protein abundance

1) Correlation Plot

2) Interactive visualization

2.1) Visualization dashboard manual

Searching

Total number of proteins, filtered proteins, selected proteins

Interactive data table

Interactive plot

2.2) Running the code to generate the visualization dashboard

Input data

Executing code

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages