The parameters of BIKE are smaller than the original CLIP ViT-L/14 is that in the BIKE model.

              The reason why the parameters of BIKE are smaller than the original CLIP ViT-L/14 is that in the BIKE model, we only utilize the vision encoder from CLIP and do not include the parameters of CLIP's text encoder.

_Originally posted by @whwu95 in https://github.com/whwu95/BIKE/issues/3#issuecomment-1612891328_
       
In fact, the parameters of visual encoder is 303M for ViT-L/16,  which excludes text encoder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The parameters of BIKE are smaller than the original CLIP ViT-L/14 is that in the BIKE model. #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development