Skip to content

NanoscopyAI/tutorial_mcs_detect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Walkthrough

The below steps allow you to compute contacts in 3D STED between subcellular organelles. See the parent repository for documentation on the project. E.g 2D mode, is documented here

Table of contents

  1. Batch processing on Compute Canada (1-1000s of cells)
  2. Processing a single cell
  3. Troubleshooting
  4. Postprocessing batch data
  5. Postprocessing sampling data (coverage, nr contacts)
  6. Running MCS Detect on LSI workstations

What this will do for you:

  • Given any number of directories with 3D TIF STED data of Mitochondria and ER
  • Check that your dataset is valid
  • Schedule it for processing
  • Compute the contact sites
  • Compute the statistics of those contacts
  • Notify you when it's completed

What you will need

Dataset organization:

Your data has to be organized in the following way

- experiment       
  - replicate      (1, 2, ...), directory
    - condition    (COS7, RMgp78), directory
       - Series001 (cellnr), directory
          ...0.tif  #Mitochannel
          ...1.tif  #ER Channel
       - Series002 etc

Do not:

  • store other files
  • use spaces in names
  • change condition names "C0s7" and "Cos7" If you do, the statistical analysis will be corrupted.

Step 0

Copy your data to the cluster using Globus.

Step 1

Log in to cluster, where you'd replace $USER with your userid.

ssh $USER@cedar.computecanada.ca

You'll see something like this

[$USER@cedar5 ~]$

Change to scratch directory

cd /scratch/$USER

Now it'll show

[you@cedar5 /scratch/YOU]$

Note that not all shells will show your current working directory, when in doubt type pwd to check, it will print

/scratch/$USER

where $USER is equal to your username.

Create a new, clean directory (replace experiment with something you pick, e.g. the date to keep track of experiments):

mkdir -p experiment
cd experiment

You can create this with Globus as well. A new directory ensures there's no clashes with existing files.

Step 2

Copy your data to a folder under /scratch/$USER, preferably using Globus

Step 3

3.0 [Optional] If you have your own configuration scripts

If you already completed this tutorial before, or if you know what you're doing and want to change parameters, for example:

  • Ask for more memory
  • Change scheduling options
  • Change channel numbers
  • ...

The below script that does everything for you will check if you have an existing script in the current directory, named submit.sh. If it finds this, it won't use the default pristine version. However, make sure these fields in your custom script (which will be copied and modified on the fly) are EXACTLY like this:

#SBATCH --account=ACCOUNT
#SBATCH --mail-user=EMAIL
#SBATCH --array=1-CELLS

These are automatically updated with the right nr of cells, email, and account. Everything else is up to you to modify as you see fit, e.g. if you want to increase memory:

#SBATCH --mem=180G

3.1 Configure

Set the DATASET variable to the name of your dataset

export DATASET="/scratch/$USER/FIXME"

And you need to configure where you want the data saved:

export OUTPATH="/scratch/$USER/OUTPUT"

DO NOT PROCEED UNLESS THESE 2 DIRECTORIES EXIST

3.2 Account info

Set your group ID and email. Replace def-abcdef with an account ID, which is either def-yourpiname or rrg-yourpiname. Check ccdb.computecanada.ca, or the output of groups.

export GROUP="def-abcdef"
export EMAIL="your@university.ca"

Step 4 Validate your dataset

If you schedule the processing of a large dataset, you don't want it to be interrupted because of avoidable mistakes, so first we'll check if the data is correctly organized so processing works as expected. Get Compute resources:

salloc --mem=62GB --account=$GROUP --cpus-per-task=8 --time=3:00:00

Once granted this will look something like this:

salloc --mem=62GB --account=$GROUPID --cpus-per-task=8 --time=3:00:00
salloc: Pending job allocation 61241941
salloc: job 61241941 queued and waiting for resources
salloc: job 61241941 has been allocated resources
salloc: Granted job allocation 61241941
salloc: Waiting for resource configuration
salloc: Nodes cdr552 are ready for job
[bcardoen@cdr552]$

Step 5 Execute

The remainder is done by executing a script, to keep things simple for you.

wget https://raw.githubusercontent.com/NanoscopyAI/tutorial_mcs_detect/main/check.sh -O script.sh

Make it executable

chmod u+x script.sh

Execute it

./script.sh

That's it. At the end you'll see something like

 Info: 2023-02-27 06:14:21 curator.jl:180: Complete with exit status proceed
+ echo Done
Done
Submitted batch job 63009530
[you@cdrxyz scratch]$ 

You will receive an email when your cells are scheduled to process and when they complete, e.g.

Slurm Array Summary Job_id=63009530_* (63009530) Name=submit.sh Began

For each execution, temporary output is saved in the directory /scratch/$USER/tmp_{DATE}, e.g. tmp_05_03_2023_HH04_36.

See below for more docs.

See https://github.com/bencardoen/SubPrecisionContactDetection.jl/ for documentation on the project, what the generated output means and so forth.

Troubleshooting

See DataCurator.jl, (https://github.com/NanoscopyAI/SubPrecisionContactDetection.jl) repositories for documentation.

Create an issue here with

  • Exact error (if any)
  • Input
  • Expected output

Checking queue delays

To check what the status is of a queued job, type

sq

this will print for each job the status (running, pending, ..) and the reason (if any) why it's queued.

You can also view

partition-stats

This will print a table showing how long the queue times are, per time slot. See the documentation for a more complete explanation.

Queue time will increase with usage (slowly), you can check how strong this effect is with:

sshare -l -A $GROUP_cpu

Check the column LEVELFS, a value > 1 means high priority (almost no waiting), < 1 is more waiting.

Running a single cell (on cluster or at home)

  • Assumes you have a Linux-like command line available, for windows install WSL

Once you have WSL installed:

Download singularity
wget https://github.com/apptainer/singularity/releases/download/v3.8.7/singularity-container_3.8.7_amd64.deb
Install itYou're sure this is the
sudo apt-get install ./singularity-container_3.8.7_amd64.deb
Test if it's working as expected
singularity --version

this will show

singularity version 3.8.7
Download MCSDetect Singularity image
singularity pull --arch amd64 library://bcvcsert/subprecisioncontactdetection/mcsdetect:latest
mv mcsdetect_latest.sif mcsdetect.sif
chmod u+x mcsdetect.sif
Configure
export IDIR="/where/your/data/is/stored/"
export ODIR="/where/your/data/should/be/stored/"

You should also grant singularity access

export SINGULARITY_BINDPATH=${PWD}
Run
 singularity exec mcsdetect.sif julia --project=/opt/SubPrecisionContactDetection.jl --sysimage=/opt/SubPrecisionContactDetection.jl/sys_img.so $LSRC/scripts/ercontacts.jl  --inpath $IDIR -r "*[0,1].tif" -w 2 --deconvolved --sigmas 2.5-2.5-1.5 --outpath  $ODIR --alpha 0.05 --beta  0.05 -c 1 -v 2000 --mode=decon 

The results and what the resulting output should be is described here

Troubleshooting

Memory exceeded

For large cells the default memory limit may be not enough. A higher limit allows (very) large cells to process, but can mean longer queue time.

You can find out which cells failed:

wget https://raw.githubusercontent.com/NanoscopyAI/tutorial_mcs_detect/main/findcellstorerun.sh
chmod u+x findcellstorerun.sh
# Copy the old lists as a backup
cp in.txt inold.txt
cp out.txt outold.txt
# Create in/out_rerun.txt that contain failed cells
./findcellstorerun.sh $JOBID in.txt out.txt
# Overwrite so the scheduling script knows where to look
mv inlist_rerun.txt in.txt
mv outlist_rerun.txt out.txt

This script will ask the cluster which cells failed, extract them from the input and output lists, and create new ones with only those cells so you can reschedule them.

Next, you'll need to update your submit.sh script that was used in scheduling the data earlier:

During the running of the check.sh script, a folder tmp_{date} is created, where all the above files are saved (incl. submit.sh).

nano submit.sh

This will open an text editor where you can edit and save the script, change the lines with memory and array

#SBATCH --mem=116G # Change to e.g. 140G (>120G will mean large memory nodes, > 300G will be very large nodes, with very long wait times)
...
#SBATCH --array=1-SETTONEWNROFCELLS

Then reschedule

sbatch submit.sh

MCS Detect Background filtering only (~ segmentation).

If you only want to compute the background filtering, use these instructions.

Run this in an interactive session, see above.

Your prompt should look like user@cdr123 where 123 varies, not user@cedar1 (or cedar5), those are the login nodes.

For reference, the setup should look like

module load StdEnv/2020 apptainer/1.1.3
export SINGULARITY_CACHEDIR="/scratch/$USER"
export APPTAINER_CACHEDIR="/scratch/$USER"
export APPTAINER_BINDBATH="/scratch/$USER,$SLURM_TMPDIR"
export SINGULARITY_BINDPATH="/scratch/$USER,$SLURM_TMPDIR"
export JULIA_NUM_THREADS="$SLURM_CPUS_PER_TASK"

echo "Checking if remote lib is available ..."

export LISTED=`apptainer remote list | grep -c SylabsCloud`
# apptainer remote list | grep -q SylabsCloud

if [ $LISTED -eq 1 ]
then
    apptainer remote use SylabsCloud
else
    echo "Not available, adding .."
    apptainer remote add --no-login SylabsCloud cloud.sycloud.io
    apptainer remote use SylabsCloud
fi

Download the recipe

wget https://raw.githubusercontent.com/bencardoen/DataCurator.jl/main/example_recipes/sweep.toml -O recipe.toml

This recipe will look like the below

[global]
act_on_success=true
inputdirectory = "testdir"
[any]
all=true
conditions = ["is_dir"]
actions=[["filter_mcsdetect", 1, 0.5, 2, "*[0-2].tif"]]

The recipe looks for files ending with 0, 1, or 2.tif. If that does not match your data, change it, for example

["*1.tif"] # Matches only abc_1.tif, 01.tif etc.
["*[1-2].tif"] # Only channels 1 and 2

Use nano or vi as text editors if needed.

The recipe will run a parameter sweep from z=1 to z=2 at increments of 0.5, you can modify these as needed. At the end it will, for each input tif file, generate a CSV file named 'stats_{original_file_name}.csv' with statistics on the size and intensity of objects for each filter value. The filename and z value used are columns in this csv.

For example, say you want to test on channels 1 and 2 only, and z=0.5 to 3.5 at .1 increments, you would modify it like so

[global]
act_on_success=true
inputdirectory = "testdir"
[any]
all=true
conditions = ["is_dir"]
actions=[["filter_mcsdetect", 0.5, 0.1, 3.5, "*[1-2].tif"]]

Output The output will be, per tif file it finds

  • per z value a mask (binary) and masked (original * mask) tif file, with mask_zvalue_original_name.tif
  • for all z values a CSV that computes the objects and their intensity after filtering

Change the inputdirectory The recipe will have testdir as inputdirectory, change it to point to your directory of choice. Or if you have defined it as a variable DATASET:

sed -i "s|testdir|${DATASET}|" recipe.toml

Download Datacurator

singularity pull --arch amd64 library://bcvcsert/datacurator/datacurator:latest
chmod u+x datacurator_latest.sif

Execute recipe

./datacurator_latest.sif -r recipe.toml

See the recipe for documentation.

Output is saved in the same location as input files.

4. Postprocessing

Extract the results using zip and Globus

cd $MYOUTPUT
zip -r myoutput.zip $MYOUTPUT

4.1 Run the postprocessing scripts

Ensure you have the latest version

singularity pull --arch amd64 library://bcvcsert/subprecisioncontactdetection/mcsdetect:latest
mv mcsdetect_latest.sif mcsdetect.sif
chmod u+x mcsdetect.sif

As before, you need to acquire an interactive session. Define where MCSDETECT stored data:

export MCSDETECTOUTPUT="..." # change 
export CSVOUTPUT="..." # Where you want the CSV's saved (use directories in /scratch)

Next, run:

echo "Configuring singularity"
module load StdEnv/2020 apptainer/1.1.3
export SINGULARITY_CACHEDIR="/scratch/$USER"
export APPTAINER_CACHEDIR="/scratch/$USER"
export APPTAINER_BINDBATH="/scratch/$USER,$SLURM_TMPDIR"
export SINGULARITY_BINDPATH="/scratch/$USER,$SLURM_TMPDIR"
export JULIA_NUM_THREADS="$SLURM_CPUS_PER_TASK"
singularity exec mcsdetect.sif python3 /opt/SubPrecisionContactDetection.jl/scripts/csvcuration.py --inputdirectory $MCSDETECTOUTPUT --outputdirectory $CSVOUTPUT

That's it.

If you have an alpha value different than 0.05, you can change the argument csvcuration.py --alpha 0.01 --inputdirectory ..., as an example. Filtering different intensities can be done in the same way, see the script for documentation.

Assuming you pointed this to the location of MCSDETECT output, of the form

experiment
   condition
      series001
        0.05
          ...

It will extract the right CSVs, and tell you how many cells it detected. It then will saved curated CSVs, both with 1 row per contact and 1 row per cell in your specified output directory. These files will be generated for you

contacts_aggregated.csv             # Contacts aggregated per cell, so 1 row = 1 cell, use this for e.g. mean height, Q95 Volume
contacts_filtered_novesicles.csv    # All contacts, without vesicles
contacts_unfiltered.csv             # All contacts, no filtering

Postprocessing sampling & coverage data

To compute contact coverage, a separate script is available.

First, acquire an interactive node as you did in the steps above. Then, with the mcsdetect.sif image in place:

# Configure variables
# These two lines ensure singularity can read your data
module load StdEnv/2020 apptainer/1.1.3
export APPTAINER_BINDBATH="/scratch/$USER,$SLURM_TMPDIR"
export SINGULARITY_BINDPATH="/scratch/$USER,$SLURM_TMPDIR"
export LSRC="/opt/SubPrecisionContactDetection.jl"
export IDIR="/set/this/to/the/output/of/mcsdetect"
export ODIR="/set/this/to/where/you/want/output/saved"

# Run
singularity exec mcsdetect.sif julia --project=$LSRC --sysimage=$LSRC/sys_img.so $LSRC/scripts/run_cube_sampling_on_dataset.jl  --inpath $IDIR --outpath  $ODIR

This will take all the output of MCS-Detect, and compute coverage statistics. The result is a file `all.csv', and the corresponding tif files if you need them. Next, you can run an aggregation script to summarize this (potentially huge) csv file and compute simplified statistics.

# Configure variables
# These two lines ensure singularity can read your data
module load StdEnv/2020 apptainer/1.1.3
export APPTAINER_BINDBATH="/scratch/$USER,$SLURM_TMPDIR"
export SINGULARITY_BINDPATH="/scratch/$USER,$SLURM_TMPDIR"
export LSRC="/opt/SubPrecisionContactDetection.jl"
export IDIR="/set/this/to/the/where/all.csv_is_saved"
export ODIR="/set/this/to/where/you/want/summary/output/saved"

# Run
singularity exec mcsdetect.sif python3 /opt/SubPrecisionContactDetection.jl/scripts/coverage.py  --inputdirectory $IDIR --outputdirectory $ODIR

This will print summary output and save a file coverage_aggregated.csv. The columns Coverage % mito by contacts, mean per cell and ncontacts mean are the columns you'll be most interested in.

They report the coverage of contacts on mitochondria (minus MDVs), and the number of contacts per sliding window of 5x5x5 voxels.

FAQ

I get a warning about SINGULARITY BINDPATH

Singularity is being replaced by AppTainer, but to support both systems, we define variables for both. In systems where AppTainer is adopted, this can then lead to warnings.

I get a warning/error that Singularity is no longer supported

Run (on WestGrid systems)

module load StdEnv/2020 apptainer/1.1.3

Running MCS Detect on the UBC LSI workstations

MCSDetect has been preinstalled on the LSI workstations. To run:

  • Open VSCode
  • File --> New Window
  • Open Folder
  • Navigate to C:\Users\Nabi Workstation\repositories\SubPrecisionContactDetection.jl This will give a view similar to:

image

  • Open a New Terminal: Terminal -> New Terminal

image

Now you can work with the code as per the documentation. We will run through a few examples:

Testing the background filter

Let's say you have some tif files in mydir, and want to test the segmentation in steps of 0.1 from 1.0 to 3.0:

julia --project=. scripts\segment.jl --inpath mydir -z 1.0 -s 0.1 -Z 3.0

Running the contact detection

Let's say you have files in idir , ending with 0 and 1.tif:

julia --project=. scripts\ercontacts.jl  --inpath idir -r "*[0,1].tif" -w 2 --deconvolved --outpath odir --alpha 0.05

About

Run ER-Mitochondria contact detection in 5 minutes or less.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages