Skip to content

bkong1990/WholeSlideImage_Sampler

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Whole-Slide-Image sampler.

GitHub

This respository aims to develop a tool for sampling from Whole-Slide-Images (WSIs) in an efficient manner. By sampling we mean producing batches of patches which can then be fed to e.g. machine learning algorithms. This should ideally be achieved without storing all the patches on disk (waste of storage). It aims to work with all WSIs that can be read by openslide. Sample data for tests is available here (images from the opensource Camelyon 16 dataset).

Assumptions

  • You have WSIs in a format readable by openslide.
  • You may also have multi-resolution-image annotation files, such as those exportable by the (excellent) slide annotation tool ASAP (which supports all openslide readable images). For example, see the folder 'annotation' in the sample data. The annotations can be multi-class. The multi-resolution level structure should be the same for both the main WSI and also for the annotation.

Usage

Have a look at the tests in tests folder

Sampling is achieved through a Single_Sampler object, which is implemented in the module single_sampler.py. We first build an object:

sampler = single_sampler.Single_Sampler(wsi_file, background_dir, annotation_dir, level0=40.)

where

  • wsi_file : a string path to a WSI file
  • background_dir : a (string) directory for background masks. If this directory does not exist or no mask for this WSI is found then a background is generated and saved to this directory (creating the directory if necessary).
  • annotation_dir : a (string) directory where annotations are stored (or None). We automatically look through this directory and assign the correct WSI annotation to the sampler if found, else assign no annotation.
  • level0 : the WSI (and annotation) resolution at 'level 0'.

We then prepare for sampling with something like:

sampler.prepare_sampling(desired_downsampling, patchsize)

where

  • desired_downsampling : the desired downsampling. e.g. for a WSI with level 0 at 40X a downsampling of 4 gives patches at 10X.
  • patchsize : patches sampled at size (patchsize x patchsize).

At this point we are ready to sample patches with:

sampler.sample_patches(max_per_class, savedir, verbose=0)

where

  • max_per_class : the maximum number of patches to get per class
  • savedir : location to save output patchframe
  • verbose : (bool) report number of rejected patches?

The sampled patches are saved to savedir in the form of a patchframe. A patchframe is defined as a datastructure containing coordinates of patches as well as metadata like the patch class and also the patch parent WSI, size, level. This is implemented with a pandas pd.DataFrame. Note that the patches themselves are not stored! Using this database of patches we can then pass patches to a machine learning algorithm.

e.g.

patchframe head:
       w       h class                                             parent level size
0  30900  119768     0  /home/peter/Dropbox/publish-final/WSI_sampler_...     2  256
1  78691  170619     0  /home/peter/Dropbox/publish-final/WSI_sampler_...     2  256
2  67651  158458     0  /home/peter/Dropbox/publish-final/WSI_sampler_...     2  256
3  65468  156771     0  /home/peter/Dropbox/publish-final/WSI_sampler_...     2  256
4  40402  115702     0  /home/peter/Dropbox/publish-final/WSI_sampler_...     2  256

Background generation

The background mask is stored as a downsampled, boolean numpy array where True denotes tissue and False denotes background. This is generated from the WSI using Otsu thresholding on the saturation channel followed by morphological operations. This is inspired by this paper, which achieved top results in Camelyon 16 contest. The generated background mask can be visualized using e.g.

sampler.save_background_visualization(savedir).

left: Normal slides, right: Cancer slides

Annotation viewing

Multi-resolution annotations are best viewed with ASAP but if you want to visualize here we can with e.g.

sampler.save_annotation_visualization(savedir)

Annotations on the two cancerous slides shown above

Patch viewing

We might want to inspect the patches listed in a patchframe. To do this we can save them to disk with e.g.

utils.save_patchframe_patches(patchframe) (utils module)

left: two class 0 (Normal) patches, right: two class 1 (cancer) patches (256x256 @ 10X).

About

Whole-Slide-Image sampling

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%