Detects sub-precision contacts between subcellular organelles in 2 and 3D STED (precision ~ 50-150nm) superresolution microscopy, for example endoplasmum reticulum and mitochondria (contacts ~ 20-100nm).
Where a pixel precise segmentation is not feasible due to the precision of the microscope, and colocalization does not describe the interface in a meaningful way, SubPrecisionContactDetection can reconstruct the plausible interface between the organelles.
An example rendering of the postprocessed contact zones (white) between endoplasmum reticulum (green) and mitochondria (red) is shown here (source):
- Fast: using multiple threads, and Julia's fast LLVM JIT code
- Reproducible: tests ensure backwards compatibility
- Configurable: Can process deconvolved or raw images, with optional extra denoising
- Rich: provides interpretable features for each detected contact
- Confidence map for each voxel (each voxel has a p-value)
For a hands on tutorial see the NanoScopyAI pages
- Installation 1.1 Singularity 1.2 Julia package 1.3 Windows
- Usage
- Deploying on clusters
- Cite
- FAQ
- Parameter selection 7.1 Z-filter 7.2 Window 7.3 Precision and Recall 7.4 Vesicle filter 7.5 Sampling
- Output 8.1. Contacts 8.2 Filtered channels 8.3 Confidence map 8.4 CSV files
- 2D
- Segmentation
This project is developed using Julia. For ease of use and to maximize reproducibility we also provide container images using Singularity.
This project was developed on Linux, and deployed on scientific computing clusters running Linux. The Singularity workflow ensures both local and cluster computations run exactly the same. The automated tests run the exact same environment.
This cannot be guaranteed across different OS'es (e.g. Windows, MacOs). While there are no technical reasons preventing the code from working on any OS, you may run into issues as it is not something we actively use ourselves.
You can use an optimized Singularity image, which has all dependencies pre-installed.
If you do not have Singularity, please see the documentation for detailed installation instructions.
The below steps are examples, but may not be complete for each platform, for the reference instructions, please visit installation instructions.
Fedora/RPM
sudo dnf install singularity
To run Singularity on Windows, set up WSL2 or refer to installation instructions.
See instructions.
Download the image as mcsdetect.sif. For example, using wget (Linux), you could do:
wget -O mcsdetect.sif http://vault.sfu.ca/index.php/s/QJ4Evcet4oVWXPL/download
On MacOS you can install wget using:
brew install wget
First, make sure execute permissions are set:
chmod u+x mcsdetect.sif
./mcsdetect.sif
Expected output:
chmod u+x mcsdetect.sif
./mcsdetect.sif -e 'using SubPrecisionContactDetection;'
Expected output:
chmod u+x mcsdetect.sif
./mcsdetect.sif /opt/SubPrecisionContactDetection.jl/scripts/ercontacts.jl ARGS
Where you'd replace ARGS with arguments to the script as documented in scripts/ercontacts.jl. Run it without arguments to get the help prompt.
Note due to a bug with conda MacOS installations will have some tests failing, the module itself is functional
You can either add to the global Julia installation:
julia -e 'using Pkg;Pkg.add(url="https://github.com/bencardoen/Colocalization.jl.git");Pkg.add(url="https://github.com/bencardoen/ERGO.jl.git");Pkg.add(url="https://github.com/bencardoen/SPECHT.jl.git");Pkg.add(url="https://github.com/bencardoen/SubPrecisionContactDetection.jl.git")'
julia -e 'using Pkg; Pkg.build("SubPrecisionContactDetection");Pkg.test("SubPrecisionContactDetection")'
Or create a new environment and install it there:
mkdir -p test
cd test
julia --project=. -e 'using Pkg;Pkg.add(url="https://github.com/bencardoen/Colocalization.jl.git);Pkg.add(url="https://github.com/bencardoen/ERGO.jl.git");Pkg.add(url="https://github.com/bencardoen/SPECHT.jl.git");Pkg.add(url="https://github.com/bencardoen/SubPrecisionContactDetection.jl.git")'
julia --project=. -e 'using Pkg; Pkg.build("SubPrecisionContactDetection");Pkg.test("SubPrecisionContactDetection")'
In both cases, you should see that all tests pass:
git clone https://github.com/bencardoen/SubPrecisionContactDetection.jl.git
cd SubPrecisionContactDetection.jl
julia --project=. installlocal.jl
This should result in output similar to this screenshot:
- Install VSCode
- Install Python
- Install Julia 1.9 In VSCode, create a new folder and open VSCode inside of it.
Then:
- New Terminal In the terminal, type:
git clone https://github.com/bencardoen/SubPrecisionContactDetection.jl.git
This will download the latest version of the source code. Note Python needs to be installed and defined, make sure of this step before proceeding. Now we will build it:
julia --project=. -e 'using Pkg; Pkg.build() Pkg.test()'
The command line interface does the heavy lifting for you:
Using the singularity image not only saves you from dependency tracking, it also is precompiled, making it x5 - x10 faster. This is especially true on clusters where the speedup can be even larger.
./mcsdetect.sif opt/SubPrecisionContactDetection/scripts/ercontacts.jl --inpath ./in -r "*[1,2].tif" -w 2 --deconvolved --sigmas 2.5-2.5-1.5 --outpath ./out --alpha 0.01 --beta 0.01 -c 1 -v 2000 --mode=decon
julia --project=. ./scripts/ercontacts.jl --inpath ./in -r "*[1,2].tif" -w 2 --deconvolved --sigmas 2.5-2.5-1.5 --outpath ./out --alpha 0.01 --beta 0.01 -c 1 -v 2000 --mode=decon 2>&1 | tee -a log_test.txt
Where:
- --{in|out}path : directories where tif files can be found
- -r : regex to tif files, e.g. *[1,2].tif indicates channel 1 and 2 will have filenames ending in 1,2.tif respectively.
- -w : windowsize, >1 or higher
- --sigmas : smoothing Gaussian, set < precision
- --alpha : max false positive rate (p-value), 0.05 is a common value.
- --beta : max false negative rate (stat. power) 0.05 implies 95% stat power.
- -c 1: postprocess channel 1
- -v 2000: drop all contacts touching objects in channel 1 with volume < 2000
- --mode=decon : input are non deconvolved tiff files
- 2>&1 | tee -a log_test.txt : save any output to log.txt (in addition to showing it in stdout)
The output should look like:
- skeleton_contacts.tif
- channel_[1,2].tif
- (non)_vesicle_contacts: mitochondria (channel 1) with volume < 2000 are considered vesicles, split contacts so you can visualize them separately
- channel_1_(non_)vesicle.tif : channel 1 objects (mitochondria) split into < 2000 and > 2000 objects
- raw|gradient|eroded.tif : stages of progressively computed contacts, all but 'eroded' are debug output
- *.csv : features for each contact
The features computed are:
- volume : nr of non zero voxels per contact
- weighted : weighted sum per contact (1 voxel holds the spearman value (>0))
- anisotropy/planar/sphericity : shape features
- distance to centroid (px) : distance of this contact's center to the centroid of all contacts --> higher is sparser
- z-position : Z slice of contact
- XY span : projected major axis in XY
- height : height of contact
- normalized : --> value / max (value/cell), so normalized distance to centroid -> 0-1
- eig1-3: eigenvalues of 3D PCA, used in shape statistics
In scripts/run_cube_sampling_on_dataset.jl you'll find a script that samples contacts with a sliding window, to avoid long tail statistics dominating the conclusion of any analysis. The paper goes into more depth why this is beneficial.
See hpcscripts/arraysbatch.sh for an example parameter sweep on a large set of cells. Assuming you created inlists.txt and outlists.txt, you'd submit to SLURM.
sbatch hpcscripts/arraysbatch.sh
Please edit and revise before you submit, e.g. your email and cluster account need to change at a minimum.
A detailed walkthrough can be found here
If you find this project useful, please cite
@article {Cardoen2022.06.23.497346,
author = {Cardoen, Ben and Gao, Guang and Vandevoorde, Kurt R. and Alan, Parsa and Liu, William and Vogl, A. Wayne and Hamarneh, Ghassan and Nabi, Ivan R.},
title = {Automatic sub-precision membrane contact site detection identifies convoluted tubular riboMERCs},
elocation-id = {2022.06.23.497346},
year = {2022},
doi = {10.1101/2022.06.23.497346},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2022/06/26/2022.06.23.497346},
eprint = {https://www.biorxiv.org/content/early/2022/06/26/2022.06.23.497346.full.pdf},
journal = {bioRxiv}
}
If you have any issues, please create an issue.
Make sure to include:
- include OS, Julia version
- description of steps to reproduce
- be concise yet complete
Yes, if you clone the repository, and are using Linux, you need to do 2 things
- edit singularity_recipes/recipe.def
- execute buildimage # Needs sudo
./buildimage.sh
This will rebuild the image, first checking out the latest version of the code.
Expected RAM usage for images of sizes 500x500x20 ~ 5GB RAM, 2000x2000x70: ~ 50GB RAM, and so on. By default, all of JULIA_NUM_THREADS cores will be used to run in parallel. > 8 is overkill, so set to 4-8 at most:
export JULIA_NUM_THREADS=4
On desktops this is unlikely to be an issue, but on a cluster node with > 64 cores you will probably get a slowdown if you exceed 8-12 cores.
First, make sure you install and clone in a clean environment:
mdkir mydir
cd mydir
julia
julia> ]
(@v1.x) pkg> activate .
(@v1.x) pkg> update
Do not use Julia < 1.7, there's no guarantee that deprecated APIs will still work, and performance and user friendliness of the e.g. the package manager alone make 1.7 the ideal baseline.
Current memory usage is higher than it strictly needs to be because we generate a lot of intermediate steps. In principle we could reduce usage by x2 or more, but it would come at the cost of debugging/interpretability.
MacOS + Conda has a bug where a certificate error triggers a cascade of errors. The errors can be ignored, including the failing tests, this is an optional part of the module. When the bug in conda is resolved, this issue should be resolved as well.
MCS-Detect has multiple parameters that will determine the precision and recall of the predicted contacts. While a full discussion is available in the paper, here we will give a brief explanation and guidance as to how to set them.
Because 3D STED has anisotropic resolution in Z (worse in Z than in X/Y), it is possible to see intensity bleedthrough
or shadowing
across Z.
For example, say you have a mitochondrial vesicle at Z-slice 5.
Bleedthrough can lead to intensity mimicking a faint object at Z-slice 8.
The Z-filter removes this by filtering the intensity distribution, per channel.
If you set Z=1, all intensity below
A z-value is that is too high will cause false negatives because you're removing intensity from the organelles, not the background. A too low value will included possible contacts between organelles and phantom intensity, e.g. false positives. A value of z=3 is used for the paper, derived from the size of the cell and the anisotropy. Recommended usage is to test Z-values on a single representative cell, and plot the organelle volume, in combination with visual inspection. Instructions on how to do this and accompanying scripts can be found here.
Correlation requires a comparison between two vectors of data, in 2- or 3D images this means a window size. If you set w=2 the window will be (2*2+1)^D for D dimensions. So 5x5 in 2D, 5x5x5 in 3D. w=1 would be 3x3, or 3x3x3 and so forth.
A too large value will consume more memory, and will miss finer patterns. A too small value will fail to capture large patterns. So what, then is 'too' small or large? At a minimum, the window should cover the width of the contact, but no more than 2.5x. The interested reader will detect similarities with how resolution and pixel-dimensions relate. I will give an example to give a more actionable insight: Let us assume pixel precision is 50nm in X, Y, and 75nm in Z. Say the expected contacts you wish to capture are 0-25nm. In this case w=1 would be sufficient, because a window of 3x3x3 would span 150nm lateral, and 225nm axial. W=2 would mean 250nm lateral and 375nm axial, which is likely too large, it would be dominated by differentials that are unrelated to the contact. Important The window size determines the statistical power of the correlation. A 3x3 window in 2D has limited statistical power. See below.
A correlation is a statistical estimator, and comes with a confidence value ('p-value'). Alpha control what acceptable levels of confidence are allowed, whereas beta controls statistical power. A recap from statistics:
- Significance (alpha): The probability that an observed difference is not due to random effects
- Power (beta): The probability that you can observe a given difference (of a given magnitude)
What does this mean in practice? We can compute what is minimal observable correlation you can detect, given alpha and beta. First, the 2D case (so 3x3, 5x5, ...)
Next, 3D:
Trouble reading these plots? Let's say you use a 3x3x3 window (w=1, in 3D). If you set alpha=beta=0.05 (95% confidence and power), then the smallest possible observable correlation is 0.665. (In the 2nd plot, X=27, Y=0.665).
Suppose you increase the window to w=2, 3D, then you have 0.341 (X=125, Y=0.341).
If you want to have the same minimum correlation in 3D with a window of 27, you would need to change your alpha and beta to 0.35
We can also plot this
The functions to compute this are available for you as well:
# w=2, 2D
minr = compute_min_r_for_sample_corr(25, 0.05, 0.05)
and
# r=0.2, 2D
window = compute_sample_size_for_min_corr(0.2, 0.05, 0.05)
- If you keep the window the same, and go to 2D, set alpha and beta from to have the same recall.
- If precision is too low, reduce alpha and beta (e.g. 0.05 to 0.1, or 0.25).
- If recall is too high (artifacts), increase alpha and beta (0.05 to 0.01 or 0.001)
The postprocessing scripts use size (logarithm) and mean intensity of vesicles to filter them out. This can only be empirically estimated.
Plot the intensity and object sizes of the mitochondria channel, and look for a separation between large and bright objects, versus small and faint. Off the shelf clustering methods can be of help. Alternatively, segment the image before processing. NOTE Contact detection does not differentiate between mitochondria and vesicles, the interaction may be functionally different, but the contacts are no less real.
Because contacts are large and infrequent, or small and frequent, the statistical analysis can be unstable. More precisely, the distribution is long tailed containing extreme values, and those extreme values are often the ones of interest (e.g. ribomerc). To offset this, the coverage computation and local density (nr of contacts/window) uses a window, defaulting to 5x5x5 (which corresponds to w=2).
The smaller you set this, the more you split objects apart. Ideally you set this window to be no smaller than the largest expected object. Sampling windows do not overlap, and mitochondria that are only partially visible in a window (few voxels), are discarded.
Some related command line parameters are switches in that they enable/disable behavior.
--dimension=3
: Set to 3 for 3D (default), or 2 for 2D. Please see confidence for an appreciation as to how this can change the precision/recall.--deconvolved
: Default off. Add if the images have been deconvolved. 2D without deconvolution will not work. Deconvolution can introduce artifacts, but overall is preferred for higher accuracy and lower SNR-i
: Input path, the folder or directory where to find the images--inregex
or-r
: The regular expression that matches 2 tif files. Defaults to "*[1,2].tif", where channel 1 is mitochondria, and channel 2 ER.-o
: Output directory.-z
: Z value, see z Default 3--alpha
and--beta
: see above. Default 0.05-w
: window size
The full documentation of the options can be found here
A brief overview of the generated output follows here:
Files are saved with the prefix of the original: If you have files, say, ABC1.tif, and ABC2.tif, then output will be saved ABC_xyz.tif, and so forth.
- $PREFIX_confidence_map.tif: Each voxel holds the confidence (significance, p-value [0-1]) of the corresponding contact.
- $PREFIX__channel_[1,2].tif: The filtered mitochondria and ER channel. Use this to inspect if the z-filter removed too much or too little
- $PREFIX__pre_split_raw.tif: Unfiltered contacts. At voxel x, y, z this will have a correlation value, and its significance can be found in confidence_map[x,y,z].
- $PREFIX_pre_split_gradient.tif: Contacts with the gradient filter applied.
- $PREFIX_split_eroded.tif: Erodes singular voxels that are below the precision of the system
- $PREFIX_3_eroded_volumes_nonsplit.csv : Features computed on the contacts
- $PREFIX_C1_objects.csv: A CSV file where each row describe the features of the segmented objects of that channel (1). So if the code ran on 01.tif and 02.tif, C1 will map to 01.tif, filtered, then processed.
- $PREFIX_C2_objects.csv: A CSV file where each row describe the features of the segmented objects of that channel (1). So if the code ran on 01.tif and 02.tif, C1 will map to 01.tif, filtered, then processed. See also the postprocessing for further output. The remainder are debugging outputs that can be traced to their source code in the script.
2D mode requires deconvolution to have been applied, so for example
julia --project=. scripts/ercontacts.jl -r "*[1,2].tif" -i myfolder --dimension=2 -z=3 -w=2 --deconvolved
You can run the background filtering and segmentation separately. In the SubPrecisionContactDetection.jl folder, open a Julia shell, for example in VS Code (New Terminal). Suppose we want to filter all tif files ending with "1.tif" or "2.tif" , for z=1 to 1.1 in 0.25 steps, and then compute the object properties.
julia --project=. scripts/segment.jl --inpath mydir -z 1.0 -Z 1.1 -s 0.25 -r "*[1,2].tif"
For each file, for each filtered value, it will generate:
- mask.tif
- masked.tif
For all the files, it will generate a CSV with columns, where each row is an object:
- size (nr of non zero voxels)
- weighted (intensity sum of objects)
- minimum, Q1, mean, mediam, Q3, maximum, std, kurtosis : describes the intensity distribution of the object
- xyspan : major axis of 2D projection, pixels
- zrange : extent in Z
- zmidpoint : midpoint in Z slice
- distance_to_centroid: distance of this object's centroid to centroid of all objects, describes clustering
- distance_to_centroid_normalized: rescale the above to 0-1, where 0 is centroid of all objects, 1 is maximum dispersion
- centroid_channel_{x,y,z} : in pixel coordinates, records the centroid of all objects, this is the reference for the distance computation
- centroid_object_{x,y,z} : in pixel coordinates, records the object centroid
- filename : the source tif file name
- z : the z value used
- eig1-3: PCA eigenvalues, the can be used for shape descriptors
- eig1-3normalized: eigenvalues rescaled to 1