Whole-Slide-Image sampler.

This respository aims to develop a tool for sampling from Whole-Slide-Images (WSIs) in an efficient manner. By sampling we mean producing batches of patches which can then be fed to e.g. machine learning algorithms. This should ideally be achieved without storing all the patches on disk (waste of storage). It aims to work with all WSIs that can be read by openslide. Sample data for tests is available here (images from the opensource Camelyon 16 dataset).

Assumptions

You have WSIs in a format readable by openslide.
You may also have multi-resolution-image annotation files, such as those exportable by the (excellent) slide annotation tool ASAP (which supports all openslide readable images). For example, see the folder 'annotation' in the sample data. The annotations can be multi-class. The multi-resolution level structure should be the same for both the main WSI and also for the annotation.

Usage

Have a look at the tests in tests folder

Sampling is achieved through a Single_Sampler object, which is implemented in the module single_sampler.py. We first build an object:

sampler = single_sampler.Single_Sampler(wsi_file, background_dir, annotation_dir, level0=40.)

where

wsi_file : a string path to a WSI file
background_dir : a (string) directory for background masks. If this directory does not exist or no mask for this WSI is found then a background is generated and saved to this directory (creating the directory if necessary).
annotation_dir : a (string) directory where annotations are stored (or None). We automatically look through this directory and assign the correct WSI annotation to the sampler if found, else assign no annotation.
level0 : the WSI (and annotation) resolution at 'level 0'.

We then prepare for sampling with something like:

sampler.prepare_sampling(desired_downsampling, patchsize)

where

desired_downsampling : the desired downsampling. e.g. for a WSI with level 0 at 40X a downsampling of 4 gives patches at 10X.
patchsize : patches sampled at size (patchsize x patchsize).

At this point we are ready to sample patches with:

sampler.sample_patches(max_per_class, savedir, verbose=0)

where

max_per_class : the maximum number of patches to get per class
savedir : location to save output patchframe
verbose : (bool) report number of rejected patches?

The sampled patches are saved to savedir in the form of a patchframe. A patchframe is defined as a datastructure containing coordinates of patches as well as metadata like the patch class and also the patch parent WSI, size, level. This is implemented with a pandas pd.DataFrame. Note that the patches themselves are not stored! Using this database of patches we can then pass patches to a machine learning algorithm.

e.g.

patchframe head:
       w       h class                                             parent level size
0  30900  119768     0  /home/peter/Dropbox/publish-final/WSI_sampler_...     2  256
1  78691  170619     0  /home/peter/Dropbox/publish-final/WSI_sampler_...     2  256
2  67651  158458     0  /home/peter/Dropbox/publish-final/WSI_sampler_...     2  256
3  65468  156771     0  /home/peter/Dropbox/publish-final/WSI_sampler_...     2  256
4  40402  115702     0  /home/peter/Dropbox/publish-final/WSI_sampler_...     2  256

Background generation

The background mask is stored as a downsampled, boolean numpy array where True denotes tissue and False denotes background. This is generated from the WSI using Otsu thresholding on the saturation channel followed by morphological operations. This is inspired by this paper, which achieved top results in Camelyon 16 contest. The generated background mask can be visualized using e.g.

sampler.save_background_visualization(savedir).

left: Normal slides, right: Cancer slides

Annotation viewing

Multi-resolution annotations are best viewed with ASAP but if you want to visualize here we can with e.g.

sampler.save_annotation_visualization(savedir)

Annotations on the two cancerous slides shown above

Patch viewing

We might want to inspect the patches listed in a patchframe. To do this we can save them to disk with e.g.

utils.save_patchframe_patches(patchframe) (utils module)

left: two class 0 (Normal) patches, right: two class 1 (cancer) patches (256x256 @ 10X).

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
ims		ims
modules		modules
tests		tests
ztrash		ztrash
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whole-Slide-Image sampler.

Assumptions

Usage

Background generation

Annotation viewing

Patch viewing

About

Releases

Packages

Languages

bkong1990/WholeSlideImage_Sampler

Folders and files

Latest commit

History

Repository files navigation

Whole-Slide-Image sampler.

Assumptions

Usage

Background generation

Annotation viewing

Patch viewing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages