Skip to content

GeorgeBatch/GeorgeBatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

52 Commits
Β 
Β 

Repository files navigation

Hi there πŸ‘‹

I am a PhD Student in Health Data Science at Oxford supervised by Professor Jens Rittscher and funded by Professor Fergus Gleeson. I am focusing on applications of Computer Vision πŸ‘€πŸ’» to improving diagnostics and treatment of patients with lung cancer as part of the DART lung health project ( see my role in the project).

October 2024: my second workshop paper (pre-print πŸ“, code πŸ’») got the Best Paper Award at DEMI-2024 workshop of MICCAI conferenceπŸ₯‡! In our work "Evaluating histopathology foundation models for few-shot tissue clustering: an application to LC25000 augmented dataset cleaning", we (1) create a pipeline for grouping augmented images using foundation models, (2) release the decontaminated version of LC25000 histopathology dataset, and (3) propose a minimal setup benchmark for evaluating pathology foundation models. Cleaned dataset, annotation framework, and evaluation pipeline are available in the LC25000-clean repository.

May 2024: my first main conference paper πŸ“ (pre-print πŸ“, code πŸ’») was published at ISBI-2024 conference!πŸš€ In our work "Accurate Subtyping of Lung Cancers by Modelling Class Dependencies", we (1) construct a weakly-supervised multi-label lung cancer histology dataset from three public (TCGA, TCIA-CPTAC, DHMC), and one in-house dataset DART, (2) propose a class-dependency injection method allowing the learning of robust bag representations suitable for multi-label problems under weakly-supervised settings. Dataset creation, model building, and training code is available in the dependency-mil repository.

September 2022: my first workshop paper πŸ“ (pre-print πŸ“, code πŸ’») got published at MICCAI 2022 CaPTion workshop! πŸš€ In our work "Active Data Enrichment by Learning What to Annotate in Digital Pathology", we (1) proposed a new comprehensive annotation protocol for lung cancer pathology, (2) proposed a new metric for comparing how well a retrieval methods can prioritize examples from underrepresented classes, and (3) demonstrated that annotating and adding top-runked examples into the training set results in greater improvements to the algorithm performance than annotating and adding random examples. Links: published paper, open-access paper, code.

December 2020: my first mini-conference working notes paper πŸ“ (code πŸ’») got published at MediaEval 2020 Multimedia Benchmark workshop πŸš€. In our work "Real-Time Polyp Segmentation Using U-Net with IoU Loss" we explored how using a combination of differentiable IoU and BCE losses affects the segmentation performance measured by meanIoU and DiceScore when training a simple U-Net. Links: published open-access paper, code.


Public histology data sources. If you also want to start working with histopathology images, but do not have or are waiting for your own data, consider starting with "Dartmouth Lung Cancer Histology Dataset" DHMC, the "The Cancer Genome Atlas" (TCGA), and "The Cancer Imaging Archive" TCIA-CPTAC. Downloading large volumes of data is not a trivial task, so I documented my process for TCGA-lung-histology-download, TCIA-CPTAC-lung-histology-download.

Public natural images sources. Another thing you can do if you are lacking medical data is to simulate parts of your future workflow on natural images, e.g. classifying medical images for presence or absence of particular patterns can be similar to classifying natural images for presence or absence of particular objects. I used images from the COCO dataset. You can see my work here: GeorgeBatch/cocoapi.


Education


Here are some of the best free online resources to boost your ML/DL knowledge πŸš€ I am currently doing it, while skipping the repetitive parts ⏰


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published