The codes and notebooks for Recursion Cellular Image Classification. It's my first kaggle medal(88/866).
The task is to classify siRNA image, with 6 channels stored in 6 png images
- We start our training with this starter kernel. It has efficientnet, resnet, densenet, data preprocess pipeline ready.
These facts pretty much define how reckless we are this time. It's a miracle we even got the medal...
- We discover there are 2 sites 3 days after we join. Before that we only use half the data.
- We discover the plate leak only almost too late. Even if we apply the enforcing 227-hot encoding by the end. I know the fact "within each plate, there is only 1 siRNA of its class in the plate" only after the competition is over.
- We are acknowledge of "control" pictures after the competition is over.
- We ensemble the models, here are an evolution of our ensemble solutions. v1, my first ensemble, v2, I mark the public LB score,v3, v5, final version of ensemble with plate leak as an enforcing mask, also visualize how the plate leak works. Gradually we fade out other models, and use various versions of "EfficientNet-b5"
- The ensemble improved our public LB score at least 0.05
- We found the plate leak 2 days before the competition closed. I kicked myself for the slopiness of not wondering "discussion" often enough.
- This notebook expored the plate leak and allocate/save the 4 groups of siRNA.
- In the final ensemble notebook we apply the leak info to the model prediction output. This leak helped out Public LB score improved at least 0.1.
- On the eve of the closing (UTC+8:00), I experimented learning from conv activations. But the time is too brief. Nothing prevails