Any benchmark for the latest released model? #61

ControlNet · 2022-06-07T17:47:07Z

Hi,

I love your project. Could you please provide some benchmark (accuracy, f1, etc) about the latest pretrained model in the release?

That will be a great help because I also want to train some models (EfficientNet, ViT, etc) by myself. By comparing the benchmark scores, I could understand the performance of the model better.

koke2c95 · 2022-06-07T22:21:42Z

we already trained nfnet, regnet, ConvNeXt on danbooru2020

those too bad, and tagger is useless .... we don't know what this trained model can do

you should checkout this paper, what's next "Transfer Learning for Pose Estimation of Illustrated Characters", Chen & Zwicker 2021

but downstream task experiment show us, too bad even not good as common trained models ....

if you want image self-sup manner, that also bad Train vision models with vissl + illustrated images

what we need VLM, text-image pretrained (like CLIP things) on LAION anime subset (remove tons anime uncorrelated datas)

training and prepare this datas, not danbooru20xx trained, is a pre-trained dataset

then we can do open vocab detector and a well captioner, enter the anime storytelling things

KichangKim · 2022-06-08T01:29:02Z

I don't have any benchmark test/score of DeepDanbooru for latest model.

ghost · 2022-06-08T03:05:29Z

@ControlNet
Try this: https://github.com/lucidrains/x-clip
You can use a pretrained text encoder and only train the image encoder. The only thing is that you will need to turn danbooru tags into sentences that can be interpreted by the pretrained text encoder first.

I have the same impression as @koke2c95 that tagging models does not work well on danbooru data, as the tags are too noisy and the model can not leverage the relationship between the tags(concepts), which also adds to the noise.

koke2c95 · 2022-06-08T12:35:33Z

RM

and sorry a issue mentioned on Multi-Modal-Comparators, seem it can't remove :(

ControlNet · 2022-06-08T13:46:49Z

@koke2c95 Thank you for your reply.

I decide to do simple auto-tagging (multi-label classification) with more lightweight and more accurate. So image generation or translation is not in my plan yet.

You said the label is very low quality. I fully understand it as these tags are community-driven, so the noise cannot be avoided. I'm thinking if it's possible to employ some self supervise learning and weak supervise learning techniques to improve it.

Also, Is there any tag-based anime image dataset with accurate labels?

BTW, regarding the height-width ratio of these images varies very significantly, I doubt naive resizing may not extract the features well. Using sliding window might be a better choice.

I don't have any benchmark test/score of DeepDanbooru for latest model.

Thank you for your reply.

@Daniel8811 Thank you for your suggestions. I know CLIP is an amazing work, and it's robust for unseen data. But I'm not familiar with that. If the pretrained text encoder is used, I highly doubt the anime-style labels (yuri, genshin_impact, blue_hair) can be predicted well.

ghost · 2022-06-08T14:31:31Z

@ControlNet

If the pretrained text encoder is used, I highly doubt the anime-style labels (yuri, genshin_impact, blue_hair) can be predicted well.

That could be a problem. I agree that it's still unclear if CLIP would really work better on danbooru data at the time of speaking.

@koke2c95 So I guess you are doing text2image at the moment. Do you have a thread or write-up for you exploration?

ghost · 2022-06-09T14:47:20Z

I decide to do simple auto-tagging (multi-label classification) with more lightweight and more accurate.

@ControlNet
Maybe you could manually clean up the danbooru tags so that it contains less abstract concepts (like yuri) and more tags that are obvious to everyone (like blue hair). This may significantly reduce the noise and therefore make the model more accurate.

ControlNet · 2022-06-20T19:07:40Z

@Daniel8811

it contains less abstract concepts (like yuri) and more tags that are obvious to everyone (like blue hair)

Yes, it's possible although manually finding these "obvious" tags for thousands of tags are tricky.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any benchmark for the latest released model? #61

Any benchmark for the latest released model? #61

ControlNet commented Jun 7, 2022

koke2c95 commented Jun 7, 2022 •

edited

Loading

KichangKim commented Jun 8, 2022

ghost commented Jun 8, 2022 •

edited by ghost

Loading

koke2c95 commented Jun 8, 2022 •

edited

Loading

ControlNet commented Jun 8, 2022

ghost commented Jun 8, 2022

ghost commented Jun 9, 2022 •

edited by ghost

Loading

ControlNet commented Jun 20, 2022

Any benchmark for the latest released model? #61

Any benchmark for the latest released model? #61

Comments

ControlNet commented Jun 7, 2022

koke2c95 commented Jun 7, 2022 • edited Loading

KichangKim commented Jun 8, 2022

ghost commented Jun 8, 2022 • edited by ghost Loading

koke2c95 commented Jun 8, 2022 • edited Loading

ControlNet commented Jun 8, 2022

ghost commented Jun 8, 2022

ghost commented Jun 9, 2022 • edited by ghost Loading

ControlNet commented Jun 20, 2022

koke2c95 commented Jun 7, 2022 •

edited

Loading

ghost commented Jun 8, 2022 •

edited by ghost

Loading

koke2c95 commented Jun 8, 2022 •

edited

Loading

ghost commented Jun 9, 2022 •

edited by ghost

Loading