update md

Syno8 · Aug 31, 2023 · 3165103 · 3165103
1 parent 0798e34
commit 3165103
Show file tree

Hide file tree

Showing 3 changed files with 98 additions and 0 deletions.
diff --git a/distillation.md b/distillation.md
@@ -0,0 +1,50 @@
+[**中文说明**](distillation.md) | [**English**](distillation_En.md)
+
+# 使用知识蒸馏提升Chinese-CLIP图像检索能力
+
+Chinese-CLIP结合知识蒸馏进行微调训练，进一步提升ChineseClip的图像检索(image2image)能力。使用的Teacher model全都来自[ModelScope](https://github.com/modelscope/modelscope)。
+
+## 环境准备
+
++ **Turing**、**Ampere**、**Ada**、**Hopper**架构的Nvidia GPU显卡（如H100、A100、RTX 3090、T4、RTX 2080），Nvidia各架构对应显卡型号可参见[此文档表格](https://en.wikipedia.org/wiki/CUDA#GPUs_supported)。
++ CUDA 11.4及以上版本。
++ Pytorch 1.12及以上版本。
++ [requirements.txt](requirements.txt)要求的其他依赖项
++ **ModelScope**：通过执行`pip install modelscope`安装ModelScope。
+
+## 在Chinese-CLIP中用起来！
+
+在Chinese-CLIP finetune中对于图像端应用知识蒸馏并不复杂。只需要在finetune的sh脚本中加入`--distllation`配置项。
+然后在配置项`--teacher-model-name`填入所要使用的Teacher model名称。现在支持的Teacher mode包括以下四种。
+<table border="1" width="120%">
+    <tr align="center">
+        <td><b>Teacher model</b></td><td><b>模型介绍</b></td>
+    </tr>
+	<tr align="center">
+        <td>damo/multi-modal_team-vit-large-patch14_multi-modal-similarity</td><td><a href="https://www.modelscope.cn/models/damo/multi-modal_team-vit-large-patch14_multi-modal-similarity/summary">TEAM图文检索模型-中文-large</a></td>
+    </tr>  
+	<tr align="center">
+        <td>damo/multi-modal_rleg-vit-large-patch14</td><td><a href="https://www.modelscope.cn/models/damo/multi-modal_rleg-vit-large-patch14/summary">RLEG生成式多模态表征模型-英文-large
+</a></td>
+    </tr>  
+	<tr align="center">
+        <td>damo/multi-modal_clip-vit-huge-patch14_zh</td><td><a href="https://www.modelscope.cn/models/damo/multi-modal_clip-vit-huge-patch14_zh/summary">CLIP模型-中文-通用领域-huge</a></td>
+    </tr>
+	<tr align="center">
+        <td>damo/multi-modal_clip-vit-large-patch14_zh</td><td><a href="https://www.modelscope.cn/models/damo/multi-modal_clip-vit-large-patch14_zh/summary">CLIP模型-中文-通用领域-large</a></td>
+    </tr>
+</table>
+<br>
+
+最后在配置项`--kd_loss_weight`填入蒸馏损失的权值，默认值是0.5。
+
+
+其中各配置项定义如下：
++ `distllation`: 是否启用知识蒸馏微调模型图像端。
++ `teacher-model-name`: 指定使用的Teacher model。目前支持以上四个Teacher model，如填入`damo/multi-modal_team-vit-large-patch14_multi-modal-similarity`。
++ `kd_loss_weight`（可选）: 蒸馏损失的权值，默认值是0.5。
+
+我们提供了样例脚本`run_scripts/muge_finetune_vit-b-16_rbt-base_distllation.sh`。
+
+## Todo
+将会在阿里云官网上线相关的解决方案的Jupyter Notebook。
diff --git a/distillation_En.md b/distillation_En.md
@@ -0,0 +1,48 @@
+[**中文说明**](distillation.md) | [**English**](distillation_En.md)
+
+# Improving Chinese-CLIP Image Retrieval Ability Using Knowledge Distillation
+
+Chinese-CLIP combines knowledge distillation for fine-tuning training to further improve the image retrieval (image2image) ability of ChineseClip. The Teacher models used are all from [ModelScope](https://github.com/modelscope/modelscope).
+
+## Environmental Preparation
+
++ Nvidia GPUs **with Turning, Ampere, Ada or Hopper architecture** (such as H100, A100, RTX 3090, T4, and RTX 2080). Please refer to [this document](https://en.wikipedia.org/wiki/CUDA#GPUs_supported) for the corresponding GPUs of each Nvidia architecture.
++ CUDA 11.4 and above.
++ PyTorch 1.12 and above.
++ **ModelScope**：Install FlashAttention by executing `pip install modelscope`.
++ Other dependencies as required in [requirements.txt](requirements.txt).
+
+## Use it in Chinese-CLIP!
+It is not complicated to apply knowledge distillation to the image side in Chinese-CLIP finetune. Just add the `--distllation` configuration item to the sh script of finetune.
+Then fill in the name of the Teacher model to be used in the configuration item `--teacher-model-name`. The currently supported Teacher modes include the following four.
+<table border="1" width="120%">
+    <tr align="center">
+        <td><b>Teacher model</b></td><td><b>模型介绍</b></td>
+    </tr>
+	<tr align="center">
+        <td>damo/multi-modal_team-vit-large-patch14_multi-modal-similarity</td><td><a href="https://www.modelscope.cn/models/damo/multi-modal_team-vit-large-patch14_multi-modal-similarity/summary">TEAM图文检索模型-中文-large</a></td>
+    </tr>  
+	<tr align="center">
+        <td>damo/multi-modal_rleg-vit-large-patch14</td><td><a href="https://www.modelscope.cn/models/damo/multi-modal_rleg-vit-large-patch14/summary">RLEG生成式多模态表征模型-英文-large
+</a></td>
+    </tr>  
+	<tr align="center">
+        <td>damo/multi-modal_clip-vit-huge-patch14_zh</td><td><a href="https://www.modelscope.cn/models/damo/multi-modal_clip-vit-huge-patch14_zh/summary">CLIP模型-中文-通用领域-huge</a></td>
+    </tr>
+	<tr align="center">
+        <td>damo/multi-modal_clip-vit-large-patch14_zh</td><td><a href="https://www.modelscope.cn/models/damo/multi-modal_clip-vit-large-patch14_zh/summary">CLIP模型-中文-通用领域-large</a></td>
+    </tr>
+</table>
+<br>
+
+Finally, fill in the weight of the distillation loss in the configuration item `--kd_loss_weight`, the default value is 0.5.
+
+The configuration items are defined as follows:
++ `distllation`: Whether to enable knowledge distillation to fine-tune the image side of the model.
++ `teacher-model-name`: Specify the Teacher model to use. Currently supports the above four Teacher models, such as filling in `damo/multi-modal_team-vit-large-patch14_multi-modal-similarity`.
++ `kd_loss_weight` (optional): Distillation loss weight, default value is 0.5.
+
+We provide a sample script `run_scripts/muge_finetune_vit-b-16_rbt-base_distllation.sh`.
+
+## Todo
+The Jupyter Notebook of related solutions will be launched on the Alibaba Cloud official website.
diff --git a/run_scripts/clip_dist.sh → ...finetune_vit-b-16_rbt-base_distllation.sh b/run_scripts/clip_dist.sh → ...finetune_vit-b-16_rbt-base_distllation.sh