Description
Hi Oscar Team,
Thanks for the interesting paper and open-sourcing your model.
On your download page, you mention that images are fed into Oscar through the outputs of a "Faster R-CNN with ResNet-101, using object and attribute annotations from Visual Genome". Have you made this model available too? It would be great if you could give a link to this pre-trained model, as it is necessary to run Oscar on my own images (I'm interested in image captioning and VQA).
I have tried to look for it myself, and the closest thing I could find was the R101-FPN from the Detectron2 model zoo (PyTorch model). However, this was trained on the COCO dataset of object tags, and I understand that the Visual Genome has significantly more labels. So surely this one would fail to produce the image features that Oscar expects?
I'd be grateful if you could let me know if my thinking is correct and if there is a link to the appropriate PyTorch model for generating inputs that Oscar can use.
Thanks in advance!
Activity