update readme to support bilingual.

Brady-X · Mar 21, 2023 · 6e20414 · 6e20414
1 parent cda4bd6
commit 6e20414
Show file tree

Hide file tree

Showing 59 changed files with 2,145 additions and 3,270 deletions.
diff --git a/2_nlp_sdks/embedding/sentence_encoder_100_sdk/README.md b/2_nlp_sdks/embedding/sentence_encoder_100_sdk/README.md
@@ -1,75 +1,77 @@
-### 官网：
-[官网链接](http://www.aias.top/)
 
-### 下载模型，放置于models目录
+### Download the model and put it in the models directory
 - 链接: https://github.com/mymagicpower/AIAS/releases/download/apps/paraphrase-xlm-r-multilingual-v1.zip
 
-### 句向量SDK【支持100种语言】
-句向量是指将语句映射至固定维度的实数向量。
-将不定长的句子用定长的向量表示，为NLP下游任务提供服务。
+### Sentence Vector SDK [Supports 100 languages]
 
-- 支持下面100种语言：          
+Sentence vector refers to mapping sentences to fixed-dimensional real vectors.
+Representing variable-length sentences as fixed-length vectors serves downstream NLP tasks.
+
+- Supports the following 100 languages:  
 ![img](https://aias-home.oss-cn-beijing.aliyuncs.com/AIAS/nlp_sdks/languages_100.jpeg)
 
-- 句向量         
+- Sentence Vector  
 ![img](https://aias-home.oss-cn-beijing.aliyuncs.com/AIAS/nlp_sdks/Universal-Sentence-Encoder.png)
 
--  
 
-句向量应用：
-- 语义搜索，通过句向量相似性，检索语料库中与query最匹配的文本
-- 文本聚类，文本转为定长向量，通过聚类模型可无监督聚集相似文本
-- 文本分类，表示成句向量，直接用简单分类器即训练文本分类器
+Sentence Vector Applications:
+
+- Semantic search: Retrieve text from a corpus that best matches the query by sentence vector similarity.
+- Text clustering: Convert text to fixed-length vectors and use clustering models to cluster similar texts without supervision.
+- Text classification: Represent text as sentence vectors and train text classifiers directly with a simple classifier.
+
+### SDK Functionality:
+
+- Sentence vector extraction
+- Similarity (cosine) calculation
+- max_seq_length: 128 (subword segmentation, up to an average of about 60 words for English sentences)
 
-### SDK功能：
--  句向量提取
--  相似度（余弦）计算
--  max_seq_length: 128（subword切词，如果是英文句子，上限平均大约60个单词）
+### Running example - SentenceEncoderExample
+
+After running successfully, you should see the following information on the command line:
 
-#### 运行例子 - SentenceEncoderExample
-运行成功后，命令行应该看到下面的信息:
 ```text
 ...
-# 测试语句：
-# 英文一组
+#Test sentences:
+# A set of English
 [INFO ] - input Sentence1: This model generates embeddings for input sentence
 [INFO ] - input Sentence2: This model generates embeddings
 
-# 中文一组
+# A set of Chinese
 [INFO ] - input Sentence3: 今天天气不错
 [INFO ] - input Sentence4: 今天风和日丽
 
-# 向量维度：
+# Vector dimensions:
 [INFO ] - Vector dimensions: 768
 
-# 英文 - 生成向量：
+# English - Generate vectors:
 [INFO ] - Sentence1 embeddings: [0.10717804, 0.0023716218, ..., -0.087652676, 0.5144994]
 [INFO ] - Sentence2 embeddings: [0.06960095, 0.09246655, ..., -0.06324193, 0.2669841]
 
-#计算英文相似度：
-[INFO ] - 英文 Similarity: 0.84808713
+# Calculate English similarity:
+[INFO ] - Similarity: 0.84808713
 
-# 中文 - 生成向量：
+# Chinese - Generate vectors:
 [INFO ] - Sentence1 embeddings: [0.19896796, 0.46568888,..., 0.09489663, 0.19511698]
 [INFO ] - Sentence2 embeddings: [0.1639189, 0.43350196, ..., -0.025053274, -0.121924624]
 
-#计算中文相似度：
-#由于使用了sentencepiece切词器，中文切词更准确，比15种语言的模型（只切成字，没有考虑词）精度更好。
-[INFO ] - 中文 Similarity: 0.67201
+# Calculate Chinese Similarity:
+# Due to the use of the sentencepiece tokenizer, Chinese word segmentation is more accurate and has better precision than the 15-language model (which only segments into characters without considering words).
+[INFO ] - Similarity: 0.67201
 
 ```
 
-### 开源算法
-#### 1. sdk使用的开源算法
+### Open source algorithm
+#### 1. Open source algorithms used by the SDK
 - [sentence-transformers](https://github.com/UKPLab/sentence-transformers)
-- [预训练模型](https://www.sbert.net/docs/pretrained_models.html)
-- [安装](https://www.sbert.net/docs/installation.html)
+- [Pre-trained models](https://www.sbert.net/docs/pretrained_models.html)
+- [Installation](https://www.sbert.net/docs/installation.html)
 
 
-#### 2. 模型如何导出 ?
+#### 2. How to export the model?
 - [how_to_convert_your_model_to_torchscript](http://docs.djl.ai/docs/pytorch/how_to_convert_your_model_to_torchscript.html)
 
-- 导出CPU模型（pytorch 模型特殊，CPU&GPU模型不通用。所以CPU，GPU需要分别导出）
+- Exporting CPU models (PyTorch models are special, and CPU and GPU models are not interchangeable. Therefore, CPU and GPU models need to be exported separately)
 - device = torch.device("cpu")
 - device = torch.device("gpu")
 - export_model_100.py
@@ -94,26 +96,3 @@ input_features = {'input_ids': input_ids, 'attention_mask': input_mask}
 traced_model = torch.jit.trace(model, example_inputs=input_features,strict=False)
 traced_model.save("models/paraphrase-xlm-r-multilingual-v1/paraphrase-xlm-r-multilingual-v1.pt")
 ```
-
-
-
-### 其它帮助信息
-http://aias.top/guides.html
-
-
-### Git地址：   
-[Github链接](https://github.com/mymagicpower/AIAS)    
-[Gitee链接](https://gitee.com/mymagicpower/AIAS)   
-
-
-
-#### 帮助文档：
-- http://aias.top/guides.html
-- 1.性能优化常见问题:
-- http://aias.top/AIAS/guides/performance.html
-- 2.引擎配置（包括CPU，GPU在线自动加载，及本地配置）:
-- http://aias.top/AIAS/guides/engine_config.html
-- 3.模型加载方式（在线自动加载，及本地配置）:
-- http://aias.top/AIAS/guides/load_model.html
-- 4.Windows环境常见问题:
-- http://aias.top/AIAS/guides/windows.html
diff --git a/2_nlp_sdks/embedding/sentence_encoder_15_sdk/README.md b/2_nlp_sdks/embedding/sentence_encoder_15_sdk/README.md
@@ -1,72 +1,72 @@
-### 官网：
-[官网链接](http://www.aias.top/)
 
-### 下载模型，放置于models目录
-- 链接: https://github.com/mymagicpower/AIAS/releases/download/apps/distiluse-base-multilingual-cased-v1.zip
+### Download the model and place it in the models directory
+- Link: https://github.com/mymagicpower/AIAS/releases/download/apps/distiluse-base-multilingual-cased-v1.zip
 
-### 句向量SDK【支持15种语言】
-句向量是指将语句映射至固定维度的实数向量。
-将不定长的句子用定长的向量表示，为NLP下游任务提供服务。
-支持 15 种语言： 
+### Sentence Vector SDK [Supports 15 languages]
+Sentence vector refers to mapping sentences to fixed-dimensional real vectors.
+Representing variable-length sentences as fixed-length vectors serves downstream NLP tasks.
+Supports 15 languages:
 Arabic, Chinese, Dutch, English, French, German, Italian, Korean, Polish, Portuguese, Russian, Spanish, Turkish.
- 
-- 句向量    
+
+- Sentence vector   
 ![img](https://aias-home.oss-cn-beijing.aliyuncs.com/AIAS/nlp_sdks/Universal-Sentence-Encoder.png)
 
 
-句向量应用：
-- 语义搜索，通过句向量相似性，检索语料库中与query最匹配的文本
-- 文本聚类，文本转为定长向量，通过聚类模型可无监督聚集相似文本
-- 文本分类，表示成句向量，直接用简单分类器即训练文本分类器
+Sentence vector applications:
+
+- Semantic search retrieves text from the corpus that matches the query best based on sentence vector similarity.
+- Text clustering: Text is converted to fixed-length vectors and unsupervised clustering of similar text is performed using a clustering model.
+- Text classification: Representing text as sentence vectors and directly training text classifiers using simple classifiers.
+
+### SDK functions:
 
-### SDK功能：
--  句向量提取
--  相似度（余弦）计算
+- Sentence vector extraction
+- Similarity (cosine) calculation
 
+### Running example - SentenceEncoderExample
 
-#### 运行例子 - SentenceEncoderExample
-运行成功后，命令行应该看到下面的信息:
+After running successfully, the command line should see the following information:
 ```text
 ...
-# 测试语句：
-# 英文一组
+# Test sentences:
+# A set of English sentences
 [INFO ] - input Sentence1: This model generates embeddings for input sentence
 [INFO ] - input Sentence2: This model generates embeddings
 
-# 中文一组
+# A set of Chinese sentences
 [INFO ] - input Sentence3: 今天天气不错
 [INFO ] - input Sentence4: 今天风和日丽
 
-# 向量维度：
+# Vector dimensions:
 [INFO ] - Vector dimensions: 512
 
-# 英文 - 生成向量：
+# English - Generated vectors:
 [INFO ] - Sentence1 embeddings: [-0.07397884, 0.023079528, ..., -0.028247012, -0.08646198]
 [INFO ] - Sentence2 embeddings: [-0.084004365, -0.021871908, ..., -0.039803937, -0.090846084]
 
-#计算英文相似度：
-[INFO ] - 英文 Similarity: 0.77445346
+# Calculating English similarity:
+[INFO ] - Similarity: 0.77445346
 
-# 中文 - 生成向量：
+# Chinese - Generated vectors:
 [INFO ] - Sentence1 embeddings: [0.012180057, -0.035749275, ..., 0.0208446, -0.048238125]
 [INFO ] - Sentence2 embeddings: [0.016560446, -0.03528302, ..., 0.023508975, -0.046362665]
 
-#计算中文相似度：
+# Calculating Chinese similarity:
 [INFO ] - 中文 Similarity: 0.9972926
 
 ```
 
-### 开源算法
-#### 1. sdk使用的开源算法
+### Open source algorithm
+#### 1. Open source algorithms used by the SDK
 - [sentence-transformers](https://github.com/UKPLab/sentence-transformers)
-- [预训练模型](https://www.sbert.net/docs/pretrained_models.html)
-- [安装](https://www.sbert.net/docs/installation.html)
+- [Pre-trained models](https://www.sbert.net/docs/pretrained_models.html)
+- [Installation](https://www.sbert.net/docs/installation.html)
 
 
-#### 2. 模型如何导出 ?
+#### 2. How to export the model?
 - [how_to_convert_your_model_to_torchscript](http://docs.djl.ai/docs/pytorch/how_to_convert_your_model_to_torchscript.html)
 
-- 导出CPU模型（pytorch 模型特殊，CPU&GPU模型不通用。所以CPU，GPU需要分别导出）
+- Exporting CPU models (PyTorch models are special, and CPU and GPU models are not interchangeable. Therefore, CPU and GPU models need to be exported separately)
 - device = torch.device("cpu")
 - device = torch.device("gpu")
 - export_model_15.py
@@ -92,25 +92,3 @@ traced_model = torch.jit.trace(model, example_inputs=input_features,strict=False
 traced_model.save("models/distiluse-base-multilingual-cased-v1/distiluse-base-multilingual-cased-v1.pt")
 ```
 
-
-
-### 其它帮助信息
-http://aias.top/guides.html
-
-
-### Git地址：   
-[Github链接](https://github.com/mymagicpower/AIAS)    
-[Gitee链接](https://gitee.com/mymagicpower/AIAS)   
-
-
-#### 帮助文档：
-- http://aias.top/guides.html
-- 1.性能优化常见问题:
-- http://aias.top/AIAS/guides/performance.html
-- 2.引擎配置（包括CPU，GPU在线自动加载，及本地配置）:
-- http://aias.top/AIAS/guides/engine_config.html
-- 3.模型加载方式（在线自动加载，及本地配置）:
-- http://aias.top/AIAS/guides/load_model.html
-- 4.Windows环境常见问题:
-- http://aias.top/AIAS/guides/windows.html
-
diff --git a/2_nlp_sdks/embedding/sentence_encoder_en_sdk/README.md b/2_nlp_sdks/embedding/sentence_encoder_en_sdk/README.md
@@ -1,60 +1,60 @@
-### 官网：
-[官网链接](http://www.aias.top/)
 
-### 下载模型，放置于models目录
-- 链接: https://github.com/mymagicpower/AIAS/releases/download/apps/paraphrase-MiniLM-L6-v2.zip
+### Download the model and place it in the models directory
+- Link: https://github.com/mymagicpower/AIAS/releases/download/apps/paraphrase-MiniLM-L6-v2.zip
 
-### 轻量句向量SDK【英文】
-句向量是指将语句映射至固定维度的实数向量。
-将不定长的句子用定长的向量表示，为NLP下游任务提供服务。
+### Lightweight sentence vector SDK [English]
 
-- 句向量
+Sentence vector refers to mapping sentences to fixed-dimensional real vectors.
+Representing variable-length sentences as fixed-length vectors provides services for downstream NLP tasks.
+
+- Sentence vector
 ![img](https://aias-home.oss-cn-beijing.aliyuncs.com/AIAS/nlp_sdks/Universal-Sentence-Encoder.png)
 
 
-句向量应用：
-- 语义搜索，通过句向量相似性，检索语料库中与query最匹配的文本
-- 文本聚类，文本转为定长向量，通过聚类模型可无监督聚集相似文本
-- 文本分类，表示成句向量，直接用简单分类器即训练文本分类器
+Applications of sentence vectors:
+-Semantic search: Retrieve the most matching text in the corpus with the query through sentence vector similarity
+-Text clustering: Convert text to fixed-length vectors, and unsupervisedly cluster similar texts through clustering models
+-Text classification: Represented as sentence vectors, training text classifiers directly using simple classifiers
+
+### SDK functions:
 
-### SDK功能：
--  句向量提取
--  相似度计算
+- Sentence vector extraction
+- Similarity calculation
 
-#### 运行例子 - SentenceEncoderExample
-运行成功后，命令行应该看到下面的信息:
+#### Running example - SentenceEncoderExample
+After running successfully, you should see the following information on the command line:
 ```text
 ...
-# 测试语句：
+# Test sentences:
 [INFO ] - input Sentence1: This model generates embeddings for input sentence
 [INFO ] - input Sentence2: This model generates embeddings
 
-# 向量维度：
+# Vector dimensions:
 [INFO ] - Vector dimensions: 384
 
-# 生成向量：
+# Generate vectors:
 [INFO ] - Sentence1 embeddings: [-0.14147712, -0.025930656, -0.18829542,..., -0.11860573, -0.13064586]
 [INFO ] - Sentence2 embeddings: [-0.43392915, -0.23374224, -0.12924, ..., 0.0916177, 0.080070406]
 
-#计算相似度：
+# Calculate Similarity:
 [INFO ] - Similarity: 0.7306041
 
 ```
 
 
-### 开源算法
-#### 1. sdk使用的开源算法
+### Open source algorithm
+#### 1. Open source algorithms used by the SDK
 - [sentence-transformers](https://github.com/UKPLab/sentence-transformers)
-- [预训练模型](https://www.sbert.net/docs/pretrained_models.html)
-- [安装](https://www.sbert.net/docs/installation.html)
+- [Pre-trained models](https://www.sbert.net/docs/pretrained_models.html)
+- [Installation](https://www.sbert.net/docs/installation.html)
 
 
-#### 2. 模型如何导出 ?
+#### 2. How to export the model?
 - [how_to_convert_your_model_to_torchscript](http://docs.djl.ai/docs/pytorch/how_to_convert_your_model_to_torchscript.html)
 
-- 导出CPU模型（pytorch 模型特殊，CPU&GPU模型不通用。所以CPU，GPU需要分别导出）
-- device='cpu'
-- device='gpu'
+- Exporting CPU models (PyTorch models are special, and CPU and GPU models are not interchangeable. Therefore, CPU and GPU models need to be exported separately)
+- device = torch.device("cpu")
+- device = torch.device("gpu")
 - export_model.py
 ```text
 from sentence_transformers import SentenceTransformer
@@ -76,22 +76,4 @@ input_features = {'input_ids': input_ids, 'token_type_ids': input_type_ids, 'att
 # traced_model = torch.jit.trace(model, example_inputs=input_features)
 traced_model = torch.jit.trace(model, example_inputs=input_features,strict=False)
 traced_model.save("traced_st_model.pt")
-```
-
-
-
-### Git地址：   
-[Github链接](https://github.com/mymagicpower/AIAS)    
-[Gitee链接](https://gitee.com/mymagicpower/AIAS)   
-
-
-#### 帮助文档：
-- http://aias.top/guides.html
-- 1.性能优化常见问题:
-- http://aias.top/AIAS/guides/performance.html
-- 2.引擎配置（包括CPU，GPU在线自动加载，及本地配置）:
-- http://aias.top/AIAS/guides/engine_config.html
-- 3.模型加载方式（在线自动加载，及本地配置）:
-- http://aias.top/AIAS/guides/load_model.html
-- 4.Windows环境常见问题:
-- http://aias.top/AIAS/guides/windows.html
+```