Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any benchmark for the latest released model? #61

Open
ControlNet opened this issue Jun 7, 2022 · 8 comments
Open

Any benchmark for the latest released model? #61

ControlNet opened this issue Jun 7, 2022 · 8 comments

Comments

@ControlNet
Copy link

Hi,

I love your project. Could you please provide some benchmark (accuracy, f1, etc) about the latest pretrained model in the release?

That will be a great help because I also want to train some models (EfficientNet, ViT, etc) by myself. By comparing the benchmark scores, I could understand the performance of the model better.

@koke2c95
Copy link

koke2c95 commented Jun 7, 2022

we already trained nfnet, regnet, ConvNeXt on danbooru2020

those too bad, and tagger is useless .... we don't know what this trained model can do

you should checkout this paper, what's next "Transfer Learning for Pose Estimation of Illustrated Characters", Chen & Zwicker 2021

but downstream task experiment show us, too bad even not good as common trained models ....

if you want image self-sup manner, that also bad Train vision models with vissl + illustrated images

what we need VLM, text-image pretrained (like CLIP things) on LAION anime subset (remove tons anime uncorrelated datas)

training and prepare this datas, not danbooru20xx trained, is a pre-trained dataset

then we can do open vocab detector and a well captioner, enter the anime storytelling things

@KichangKim
Copy link
Owner

I don't have any benchmark test/score of DeepDanbooru for latest model.

@ghost
Copy link

ghost commented Jun 8, 2022

@ControlNet
Try this: https://github.com/lucidrains/x-clip
You can use a pretrained text encoder and only train the image encoder. The only thing is that you will need to turn danbooru tags into sentences that can be interpreted by the pretrained text encoder first.

I have the same impression as @koke2c95 that tagging models does not work well on danbooru data, as the tags are too noisy and the model can not leverage the relationship between the tags(concepts), which also adds to the noise.

@koke2c95
Copy link

koke2c95 commented Jun 8, 2022

RM

and sorry a issue mentioned on Multi-Modal-Comparators, seem it can't remove :(

@ControlNet
Copy link
Author

@koke2c95 Thank you for your reply.

I decide to do simple auto-tagging (multi-label classification) with more lightweight and more accurate. So image generation or translation is not in my plan yet.

You said the label is very low quality. I fully understand it as these tags are community-driven, so the noise cannot be avoided. I'm thinking if it's possible to employ some self supervise learning and weak supervise learning techniques to improve it.

Also, Is there any tag-based anime image dataset with accurate labels?

BTW, regarding the height-width ratio of these images varies very significantly, I doubt naive resizing may not extract the features well. Using sliding window might be a better choice.

I don't have any benchmark test/score of DeepDanbooru for latest model.

Thank you for your reply.

@Daniel8811 Thank you for your suggestions. I know CLIP is an amazing work, and it's robust for unseen data. But I'm not familiar with that. If the pretrained text encoder is used, I highly doubt the anime-style labels (yuri, genshin_impact, blue_hair) can be predicted well.

@ghost
Copy link

ghost commented Jun 8, 2022

@ControlNet

If the pretrained text encoder is used, I highly doubt the anime-style labels (yuri, genshin_impact, blue_hair) can be predicted well.

That could be a problem. I agree that it's still unclear if CLIP would really work better on danbooru data at the time of speaking.

@koke2c95 So I guess you are doing text2image at the moment. Do you have a thread or write-up for you exploration?

@ghost
Copy link

ghost commented Jun 9, 2022

I decide to do simple auto-tagging (multi-label classification) with more lightweight and more accurate.

@ControlNet
Maybe you could manually clean up the danbooru tags so that it contains less abstract concepts (like yuri) and more tags that are obvious to everyone (like blue hair). This may significantly reduce the noise and therefore make the model more accurate.

@ControlNet
Copy link
Author

@Daniel8811

it contains less abstract concepts (like yuri) and more tags that are obvious to everyone (like blue hair)

Yes, it's possible although manually finding these "obvious" tags for thousands of tags are tricky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants