-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update CLIP to a functional model #2393
Update CLIP to a functional model #2393
Conversation
Is our test suite running with Tf 1.16? I keep getting this error constantly when testing with TensorFlow 1.16.1:
|
This needs to be a conditional import, let me send a fix |
8fdabd5
to
c6e84b9
Compare
update model input format update golden values update CLIP to functional model update tests code reformat use dict instead of list Update keras_cv/models/feature_extractor/clip/clip_model.py Co-authored-by: Tirth Patel <tirthasheshpatel@gmail.com> remove build and compute output shape update model input format update golden values Refactor CLIP Refactor includes: - CLIPProcessor is now a Keras layer and uses some utilities from KerasNLP to support all types of python types and array inputs - CLIPImageEncoder, CLIPTextEncoder, and CLIPEncoder now implement a `.compute_output_shape` method (required for CLIP to work with the functional API) - CLIPHead added to remove raw variables from the CLIP Task models; having variables in `keras.Model` class is tricky since functional API doesn't allow state. - CLIP checkpointing script has been updated to now work with the new API: new weights will be uploaded to Kaggle. TODO: attribute KerasNLP wherever relevant TODO: upload new weights to Kaggle TODO: refactor the CLIPProcessor class and the CLIP class to also pull tokenizer vocab and merges from Kaggle. remove build and compute output shape Some fixes for the refactor Fix the tests, update presets update to layers instead of models
c6e84b9
to
3c54540
Compare
In the latest commit, I have removed some numerics tests since we don't match the HF model. Also, updated the presets and squashed everything down to one commit. Sorry, I lost authorship when doing this. I will approve but leave merging up to @divyashreepathihalli since I have a lot of contributions to this. Please let me know if you have any changes in mind, otherwise feel free to merge. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified the refactor has the same numerics as the model we had before. But note that those numerics are broken too, we are very far from the equivalent HF model.
# These values are NOT computing using HF as the reference model. | ||
# Currently, the numerics of the CLIP model don't match the | ||
# HF model exactly (for the same inputs). For the time being, | ||
# these tests just confirm that unrelated changed don't affect | ||
# the numerics. Once the fix for the numerics is in, we can remove | ||
# this comment and the xfail below. | ||
self.assertAllClose( | ||
outputs["image_logits"], [[10.246354, 10.246353, 10.246354]] | ||
) | ||
self.assertAllClose( | ||
text_logits, ops.transpose([[1.896712, 1.896712, 1.896712]]) | ||
outputs["text_logits"], | ||
ops.transpose([[10.246354, 10.246353, 10.246354]]), | ||
) | ||
|
||
# True reference values computed using HF: | ||
# image_logits: [[17.8013, 17.8013, 17.8013]] | ||
# text_logits: image_logits.T | ||
|
||
# xfail after assertion | ||
pytest.xfail("KerasCV CLIP doesn't match the HF model.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@divyashreepathihalli I have included the reference values here computed using the same inputs to the HF model. Currently, xfail
ing the test to indicate we need to fix this.
Unrelated Keras 2 failure. @divyashreepathihalli Do we need to the initializer? I am inclined to remove it since we don't yet support training. |
yes! we can remove those. |
No description provided.