-
Notifications
You must be signed in to change notification settings - Fork 22
RetinaNet with sampler
It took us quite a lot of time to develop reasonable solution to this competition.
We have decided to work with RetinaNet and focal loss, described in this paper: Focal Loss for Dense Object Detection. If you are new to RetinaNet - I recommend to skim through blog post that describes The intuition behind RetinaNet.
All our experiments are available here: Experiments 🚀
GoogleAI object detection open-images-v4 dataset. It is large, it is difficult and I think it will stay with us for a little while so it is probably a good idea to get familiar with it. Check dataset_exploration notebook.
We quickly decided that this competition consist of two subproblems, each to be approached separately:
- First subproblem is classes related to people and clothing, because the bboxes overlap a lot and there are multiple bboxes per image. Here, we have approximately 80 classes.
- Second subproblem is remaining classes. Here, we take all these classes and divide it into 7 bins. Each bin is occupied by classes with similar frequency in the dataset. We need such bins to prepare proper epoch as described below.
- When we run training for the remaining classes, we make sure that each class (within an epoch) has similar number of occurrences -> we implemented sampler to do this work. Thanks to this we have more balanced problem. In practice we oversample rare classes and subsample frequent classes. 7 bins mentioned above are utilized here.
- Next, we calculate aspect ratio and we prepare batches only for images with similar aspect ratio. We need this in the next step - resize. After resize all images are similarly squeezed - training signal is better balanced.
- At this point we are ready to feed batch to the network. Images are with similar aspect ratio, classes within the epoch are balanced, so training signal is stronger.
- Resulting experiment is like this one.
- It is good to start experimenting with few classes (like 10) and get better feel of the problem. We also run training on 10 classes.
- We noticed, that for rare classes augmentations are necessary :)
- What to do with highly overlapping bboxes (people and clothing subproblem).
What you can see on master branch is training procedure on 10 classes -> you can extend it to work on the entire dataset. We worked with this code to quickly iterate over various ideas.
Nothing fancy here. We are working with PyTorch abstraction:
Dataset
-
Dataloader
- We preprocess the images with standard PyTorch values for pretrained models:
MEAN = [0.485, 0.456, 0.406] STD = [0.229, 0.224, 0.225]
- We took the implementation from https://github.com/kuangliu/pytorch-retinanet
- Having read the Focal Loss for Dense Object Detection paper we realized there is some stuff missing so:
- we added initialization explained in there
- we added a functionality that lets you switch easily between different resnet versions (34, 50, 101, 152)
- We made the non max suppression parallel to make sure that we can submit our solution before christmass
- It took a while but we managed to wrap competition metric calculation from
tensorflow/models/research/object_detection
in a nice and easy to use function. If you are interested in this part just go to thesrc/utils.py
- We use the
valid_ids
provided by organizers but we added a little twist where we can choose which object classes we want to train/evaluate on. Good for debugging and lets you check if the network is learning anything.
check our GitHub organization https://github.com/neptune-ml for more cool stuff 😎
Kamil & Kuba, core contributors