Skip to content

Commit

Permalink
update md
Browse files Browse the repository at this point in the history
  • Loading branch information
骁灵 committed Aug 31, 2023
1 parent 0798e34 commit 3165103
Show file tree
Hide file tree
Showing 3 changed files with 98 additions and 0 deletions.
50 changes: 50 additions & 0 deletions distillation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
[**中文说明**](distillation.md) | [**English**](distillation_En.md)

# 使用知识蒸馏提升Chinese-CLIP图像检索能力

Chinese-CLIP结合知识蒸馏进行微调训练,进一步提升ChineseClip的图像检索(image2image)能力。使用的Teacher model全都来自[ModelScope](https://github.com/modelscope/modelscope)

## 环境准备

+ **Turing****Ampere****Ada****Hopper**架构的Nvidia GPU显卡(如H100、A100、RTX 3090、T4、RTX 2080),Nvidia各架构对应显卡型号可参见[此文档表格](https://en.wikipedia.org/wiki/CUDA#GPUs_supported)
+ CUDA 11.4及以上版本。
+ Pytorch 1.12及以上版本。
+ [requirements.txt](requirements.txt)要求的其他依赖项
+ **ModelScope**:通过执行`pip install modelscope`安装ModelScope。

## 在Chinese-CLIP中用起来!

在Chinese-CLIP finetune中对于图像端应用知识蒸馏并不复杂。只需要在finetune的sh脚本中加入`--distllation`配置项。
然后在配置项`--teacher-model-name`填入所要使用的Teacher model名称。现在支持的Teacher mode包括以下四种。
<table border="1" width="120%">
<tr align="center">
<td><b>Teacher model</b></td><td><b>模型介绍</b></td>
</tr>
<tr align="center">
<td>damo/multi-modal_team-vit-large-patch14_multi-modal-similarity</td><td><a href="https://www.modelscope.cn/models/damo/multi-modal_team-vit-large-patch14_multi-modal-similarity/summary">TEAM图文检索模型-中文-large</a></td>
</tr>
<tr align="center">
<td>damo/multi-modal_rleg-vit-large-patch14</td><td><a href="https://www.modelscope.cn/models/damo/multi-modal_rleg-vit-large-patch14/summary">RLEG生成式多模态表征模型-英文-large
</a></td>
</tr>
<tr align="center">
<td>damo/multi-modal_clip-vit-huge-patch14_zh</td><td><a href="https://www.modelscope.cn/models/damo/multi-modal_clip-vit-huge-patch14_zh/summary">CLIP模型-中文-通用领域-huge</a></td>
</tr>
<tr align="center">
<td>damo/multi-modal_clip-vit-large-patch14_zh</td><td><a href="https://www.modelscope.cn/models/damo/multi-modal_clip-vit-large-patch14_zh/summary">CLIP模型-中文-通用领域-large</a></td>
</tr>
</table>
<br>

最后在配置项`--kd_loss_weight`填入蒸馏损失的权值,默认值是0.5。


其中各配置项定义如下:
+ `distllation`: 是否启用知识蒸馏微调模型图像端。
+ `teacher-model-name`: 指定使用的Teacher model。目前支持以上四个Teacher model,如填入`damo/multi-modal_team-vit-large-patch14_multi-modal-similarity`
+ `kd_loss_weight`(可选): 蒸馏损失的权值,默认值是0.5。

我们提供了样例脚本`run_scripts/muge_finetune_vit-b-16_rbt-base_distllation.sh`

## Todo
将会在阿里云官网上线相关的解决方案的Jupyter Notebook。
48 changes: 48 additions & 0 deletions distillation_En.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
[**中文说明**](distillation.md) | [**English**](distillation_En.md)

# Improving Chinese-CLIP Image Retrieval Ability Using Knowledge Distillation

Chinese-CLIP combines knowledge distillation for fine-tuning training to further improve the image retrieval (image2image) ability of ChineseClip. The Teacher models used are all from [ModelScope](https://github.com/modelscope/modelscope).

## Environmental Preparation

+ Nvidia GPUs **with Turning, Ampere, Ada or Hopper architecture** (such as H100, A100, RTX 3090, T4, and RTX 2080). Please refer to [this document](https://en.wikipedia.org/wiki/CUDA#GPUs_supported) for the corresponding GPUs of each Nvidia architecture.
+ CUDA 11.4 and above.
+ PyTorch 1.12 and above.
+ **ModelScope**:Install FlashAttention by executing `pip install modelscope`.
+ Other dependencies as required in [requirements.txt](requirements.txt).

## Use it in Chinese-CLIP!
It is not complicated to apply knowledge distillation to the image side in Chinese-CLIP finetune. Just add the `--distllation` configuration item to the sh script of finetune.
Then fill in the name of the Teacher model to be used in the configuration item `--teacher-model-name`. The currently supported Teacher modes include the following four.
<table border="1" width="120%">
<tr align="center">
<td><b>Teacher model</b></td><td><b>模型介绍</b></td>
</tr>
<tr align="center">
<td>damo/multi-modal_team-vit-large-patch14_multi-modal-similarity</td><td><a href="https://www.modelscope.cn/models/damo/multi-modal_team-vit-large-patch14_multi-modal-similarity/summary">TEAM图文检索模型-中文-large</a></td>
</tr>
<tr align="center">
<td>damo/multi-modal_rleg-vit-large-patch14</td><td><a href="https://www.modelscope.cn/models/damo/multi-modal_rleg-vit-large-patch14/summary">RLEG生成式多模态表征模型-英文-large
</a></td>
</tr>
<tr align="center">
<td>damo/multi-modal_clip-vit-huge-patch14_zh</td><td><a href="https://www.modelscope.cn/models/damo/multi-modal_clip-vit-huge-patch14_zh/summary">CLIP模型-中文-通用领域-huge</a></td>
</tr>
<tr align="center">
<td>damo/multi-modal_clip-vit-large-patch14_zh</td><td><a href="https://www.modelscope.cn/models/damo/multi-modal_clip-vit-large-patch14_zh/summary">CLIP模型-中文-通用领域-large</a></td>
</tr>
</table>
<br>

Finally, fill in the weight of the distillation loss in the configuration item `--kd_loss_weight`, the default value is 0.5.

The configuration items are defined as follows:
+ `distllation`: Whether to enable knowledge distillation to fine-tune the image side of the model.
+ `teacher-model-name`: Specify the Teacher model to use. Currently supports the above four Teacher models, such as filling in `damo/multi-modal_team-vit-large-patch14_multi-modal-similarity`.
+ `kd_loss_weight` (optional): Distillation loss weight, default value is 0.5.

We provide a sample script `run_scripts/muge_finetune_vit-b-16_rbt-base_distllation.sh`.

## Todo
The Jupyter Notebook of related solutions will be launched on the Alibaba Cloud official website.
File renamed without changes.

0 comments on commit 3165103

Please sign in to comment.