Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/develop' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
longRookie committed Mar 24, 2023
2 parents b74f9d7 + 793effa commit e91bff7
Show file tree
Hide file tree
Showing 159 changed files with 11,052 additions and 1,031 deletions.
4 changes: 2 additions & 2 deletions .pre-commit-hooks/copyright-check.hook
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ import subprocess
import platform

COPYRIGHT = '''
Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -128,4 +128,4 @@ def main(argv=None):


if __name__ == '__main__':
exit(main())
exit(main())
32 changes: 21 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,10 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
- 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).

### Recent Update
- 🎉 2023.03.07: Add [TTS ARM Linux C++ Demo](./demos/TTSArmLinux).
- 🔥 2023.03.14: Add SVS(Singing Voice Synthesis) examples with Opencpop dataset, including [DiffSinger](./examples/opencpop/svs1)[PWGAN](./examples/opencpop/voc1) and [HiFiGAN](./examples/opencpop/voc5), the effect is continuously optimized.
- 👑 2023.03.09: Add [Wav2vec2ASR-zh](./examples/aishell/asr3).
- 🎉 2023.03.07: Add [TTS ARM Linux C++ Demo (with C++ Chinese Text Frontend)](./demos/TTSArmLinux).
- 🔥 2023.03.03 Add Voice Conversion [StarGANv2-VC synthesize pipeline](./examples/vctk/vc3).
- 🎉 2023.02.16: Add [Cantonese TTS](./examples/canton/tts3).
- 🔥 2023.01.10: Add [code-switch asr CLI and Demos](./demos/speech_recognition).
- 👑 2023.01.06: Add [code-switch asr tal_cs recipe](./examples/tal_cs/asr1/).
Expand Down Expand Up @@ -575,14 +578,14 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
</thead>
<tbody>
<tr>
<td> Text Frontend </td>
<td colspan="2"> &emsp; </td>
<td>
<a href = "./examples/other/tn">tn</a> / <a href = "./examples/other/g2p">g2p</a>
</td>
<td> Text Frontend </td>
<td colspan="2"> &emsp; </td>
<td>
<a href = "./examples/other/tn">tn</a> / <a href = "./examples/other/g2p">g2p</a>
</td>
</tr>
<tr>
<td rowspan="5">Acoustic Model</td>
<td rowspan="6">Acoustic Model</td>
<td>Tacotron2</td>
<td>LJSpeech / CSMSC</td>
<td>
Expand Down Expand Up @@ -617,6 +620,13 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
<a href = "./examples/vctk/ernie_sat">ERNIE-SAT-vctk</a> / <a href = "./examples/aishell3/ernie_sat">ERNIE-SAT-aishell3</a> / <a href = "./examples/aishell3_vctk/ernie_sat">ERNIE-SAT-zh_en</a>
</td>
</tr>
<tr>
<td>DiffSinger</td>
<td>Opencpop</td>
<td>
<a href = "./examples/opencpop/svs1">DiffSinger-opencpop</a>
</td>
</tr>
<tr>
<td rowspan="6">Vocoder</td>
<td >WaveFlow</td>
Expand All @@ -627,9 +637,9 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
</tr>
<tr>
<td >Parallel WaveGAN</td>
<td >LJSpeech / VCTK / CSMSC / AISHELL-3</td>
<td >LJSpeech / VCTK / CSMSC / AISHELL-3 / Opencpop</td>
<td>
<a href = "./examples/ljspeech/voc1">PWGAN-ljspeech</a> / <a href = "./examples/vctk/voc1">PWGAN-vctk</a> / <a href = "./examples/csmsc/voc1">PWGAN-csmsc</a> / <a href = "./examples/aishell3/voc1">PWGAN-aishell3</a>
<a href = "./examples/ljspeech/voc1">PWGAN-ljspeech</a> / <a href = "./examples/vctk/voc1">PWGAN-vctk</a> / <a href = "./examples/csmsc/voc1">PWGAN-csmsc</a> / <a href = "./examples/aishell3/voc1">PWGAN-aishell3</a> / <a href = "./examples/opencpop/voc1">PWGAN-opencpop</a>
</td>
</tr>
<tr>
Expand All @@ -648,9 +658,9 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
</tr>
<tr>
<td>HiFiGAN</td>
<td>LJSpeech / VCTK / CSMSC / AISHELL-3</td>
<td>LJSpeech / VCTK / CSMSC / AISHELL-3 / Opencpop</td>
<td>
<a href = "./examples/ljspeech/voc5">HiFiGAN-ljspeech</a> / <a href = "./examples/vctk/voc5">HiFiGAN-vctk</a> / <a href = "./examples/csmsc/voc5">HiFiGAN-csmsc</a> / <a href = "./examples/aishell3/voc5">HiFiGAN-aishell3</a>
<a href = "./examples/ljspeech/voc5">HiFiGAN-ljspeech</a> / <a href = "./examples/vctk/voc5">HiFiGAN-vctk</a> / <a href = "./examples/csmsc/voc5">HiFiGAN-csmsc</a> / <a href = "./examples/aishell3/voc5">HiFiGAN-aishell3</a> / <a href = "./examples/opencpop/voc5">HiFiGAN-opencpop</a>
</td>
</tr>
<tr>
Expand Down
45 changes: 28 additions & 17 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,10 @@
- 🧩 级联模型应用: 作为传统语音任务的扩展,我们结合了自然语言处理、计算机视觉等任务,实现更接近实际需求的产业级应用。

### 近期更新
- 🎉 2023.03.07: 新增 [TTS ARM Linux C++ 部署示例](./demos/TTSArmLinux)
- 🔥 2023.03.14: 新增基于 Opencpop 数据集的 SVS (歌唱合成) 示例,包含 [DiffSinger](./examples/opencpop/svs1)[PWGAN](./examples/opencpop/voc1)[HiFiGAN](./examples/opencpop/voc5),效果持续优化中。
- 👑 2023.03.09: 新增 [Wav2vec2ASR-zh](./examples/aishell/asr3)
- 🎉 2023.03.07: 新增 [TTS ARM Linux C++ 部署示例 (包含 C++ 中文文本前端模块)](./demos/TTSArmLinux)
- 🔥 2023.03.03: 新增声音转换模型 [StarGANv2-VC 合成流程](./examples/vctk/vc3)
- 🎉 2023.02.16: 新增[粤语语音合成](./examples/canton/tts3)
- 🔥 2023.01.10: 新增[中英混合 ASR CLI 和 Demos](./demos/speech_recognition)
- 👑 2023.01.06: 新增 [ASR 中英混合 tal_cs 训练推理流程](./examples/tal_cs/asr1/)
Expand Down Expand Up @@ -574,43 +577,50 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
<td>
<a href = "./examples/other/tn">tn</a> / <a href = "./examples/other/g2p">g2p</a>
</td>
</tr>
<tr>
<td rowspan="5">声学模型</td>
</tr>
<tr>
<td rowspan="6">声学模型</td>
<td>Tacotron2</td>
<td>LJSpeech / CSMSC</td>
<td>
<a href = "./examples/ljspeech/tts0">tacotron2-ljspeech</a> / <a href = "./examples/csmsc/tts0">tacotron2-csmsc</a>
</td>
</tr>
<tr>
</tr>
<tr>
<td>Transformer TTS</td>
<td>LJSpeech</td>
<td>
<a href = "./examples/ljspeech/tts1">transformer-ljspeech</a>
</td>
</tr>
<tr>
</tr>
<tr>
<td>SpeedySpeech</td>
<td>CSMSC</td>
<td >
<a href = "./examples/csmsc/tts2">speedyspeech-csmsc</a>
</td>
</tr>
<tr>
</tr>
<tr>
<td>FastSpeech2</td>
<td>LJSpeech / VCTK / CSMSC / AISHELL-3 / ZH_EN / finetune</td>
<td>
<a href = "./examples/ljspeech/tts3">fastspeech2-ljspeech</a> / <a href = "./examples/vctk/tts3">fastspeech2-vctk</a> / <a href = "./examples/csmsc/tts3">fastspeech2-csmsc</a> / <a href = "./examples/aishell3/tts3">fastspeech2-aishell3</a> / <a href = "./examples/zh_en_tts/tts3">fastspeech2-zh_en</a> / <a href = "./examples/other/tts_finetune/tts3">fastspeech2-finetune</a>
</td>
</tr>
<tr>
</tr>
<tr>
<td><a href = "https://arxiv.org/abs/2211.03545">ERNIE-SAT</a></td>
<td>VCTK / AISHELL-3 / ZH_EN</td>
<td>
<a href = "./examples/vctk/ernie_sat">ERNIE-SAT-vctk</a> / <a href = "./examples/aishell3/ernie_sat">ERNIE-SAT-aishell3</a> / <a href = "./examples/aishell3_vctk/ernie_sat">ERNIE-SAT-zh_en</a>
</td>
</tr>
</tr>
<tr>
<td>DiffSinger</td>
<td>Opencpop</td>
<td>
<a href = "./examples/opencpop/svs1">DiffSinger-opencpop</a>
</td>
</tr>
<tr>
<td rowspan="6">声码器</td>
<td >WaveFlow</td>
Expand All @@ -621,9 +631,9 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
</tr>
<tr>
<td >Parallel WaveGAN</td>
<td >LJSpeech / VCTK / CSMSC / AISHELL-3</td>
<td >LJSpeech / VCTK / CSMSC / AISHELL-3 / Opencpop</td>
<td>
<a href = "./examples/ljspeech/voc1">PWGAN-ljspeech</a> / <a href = "./examples/vctk/voc1">PWGAN-vctk</a> / <a href = "./examples/csmsc/voc1">PWGAN-csmsc</a> / <a href = "./examples/aishell3/voc1">PWGAN-aishell3</a>
<a href = "./examples/ljspeech/voc1">PWGAN-ljspeech</a> / <a href = "./examples/vctk/voc1">PWGAN-vctk</a> / <a href = "./examples/csmsc/voc1">PWGAN-csmsc</a> / <a href = "./examples/aishell3/voc1">PWGAN-aishell3</a> / <a href = "./examples/opencpop/voc1">PWGAN-opencpop</a>
</td>
</tr>
<tr>
Expand All @@ -642,9 +652,9 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
</tr>
<tr>
<td >HiFiGAN</td>
<td >LJSpeech / VCTK / CSMSC / AISHELL-3</td>
<td >LJSpeech / VCTK / CSMSC / AISHELL-3 / Opencpop</td>
<td>
<a href = "./examples/ljspeech/voc5">HiFiGAN-ljspeech</a> / <a href = "./examples/vctk/voc5">HiFiGAN-vctk</a> / <a href = "./examples/csmsc/voc5">HiFiGAN-csmsc</a> / <a href = "./examples/aishell3/voc5">HiFiGAN-aishell3</a>
<a href = "./examples/ljspeech/voc5">HiFiGAN-ljspeech</a> / <a href = "./examples/vctk/voc5">HiFiGAN-vctk</a> / <a href = "./examples/csmsc/voc5">HiFiGAN-csmsc</a> / <a href = "./examples/aishell3/voc5">HiFiGAN-aishell3</a> / <a href = "./examples/opencpop/voc5">HiFiGAN-opencpop</a>
</td>
</tr>
<tr>
Expand Down Expand Up @@ -701,6 +711,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
</tbody>
</table>


<a name="声音分类模型"></a>
**声音分类**

Expand Down
4 changes: 4 additions & 0 deletions demos/TTSArmLinux/.gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
# 目录
build/
output/
libs/
models/

# 符号连接
dict
20 changes: 12 additions & 8 deletions demos/TTSArmLinux/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@

### 安装依赖

```
```bash
# Ubuntu
sudo apt install build-essential cmake wget tar unzip
sudo apt install build-essential cmake pkg-config wget tar unzip

# CentOS
sudo yum groupinstall "Development Tools"
Expand All @@ -25,15 +25,13 @@ sudo yum install cmake wget tar unzip

可用以下命令下载:

```
git clone https://github.com/PaddlePaddle/PaddleSpeech.git
cd PaddleSpeech/demos/TTSArmLinux
```bash
./download.sh
```

### 编译 Demo

```
```bash
./build.sh
```

Expand All @@ -43,12 +41,18 @@ cd PaddleSpeech/demos/TTSArmLinux

### 运行

```
你可以修改 `./front.conf``--phone2id_path` 参数为你自己的声学模型的 `phone_id_map.txt`

```bash
./run.sh
./run.sh --sentence "语音合成测试"
./run.sh --sentence "输出到指定的音频文件" --output_wav ./output/test.wav
./run.sh --help
```

将把 [src/main.cpp](src/main.cpp) 里定义在 `sentencesToChoose` 数组中的十句话转换为 `wav` 文件,保存在 `output` 文件夹中
目前只支持中文合成,出现任何英文都会导致程序崩溃

如果未指定`--wav_file`,默认输出到`./output/tts.wav`

## 手动编译 Paddle Lite 库

Expand Down
1 change: 1 addition & 0 deletions demos/TTSArmLinux/build-depends.sh
17 changes: 13 additions & 4 deletions demos/TTSArmLinux/build.sh
Original file line number Diff line number Diff line change
@@ -1,20 +1,29 @@
#!/bin/bash
set -e
set -x

cd "$(dirname "$(realpath "$0")")"

BASE_DIR="$PWD"

# load configure
. ./config.sh

# build
echo "ARM_ABI is ${ARM_ABI}"
echo "PADDLE_LITE_DIR is ${PADDLE_LITE_DIR}"

rm -rf build
mkdir -p build
cd build
echo "Build depends..."
./build-depends.sh "$@"

mkdir -p "$BASE_DIR/build"
cd "$BASE_DIR/build"
cmake -DPADDLE_LITE_DIR="${PADDLE_LITE_DIR}" -DARM_ABI="${ARM_ABI}" ../src
make

if [ "$*" = "" ]; then
make -j$(nproc)
else
make "$@"
fi

echo "make successful!"
9 changes: 9 additions & 0 deletions demos/TTSArmLinux/clean.sh
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
#!/bin/bash
set -e
set -x

cd "$(dirname "$(realpath "$0")")"

BASE_DIR="$PWD"

# load configure
. ./config.sh

Expand All @@ -12,3 +15,9 @@ set -x
rm -rf "$OUTPUT_DIR"
rm -rf "$LIBS_DIR"
rm -rf "$MODELS_DIR"
rm -rf "$BASE_DIR/build"

"$BASE_DIR/src/TTSCppFrontend/clean.sh"

# 符号连接
rm "$BASE_DIR/dict"
5 changes: 3 additions & 2 deletions demos/TTSArmLinux/config.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,6 @@ OUTPUT_DIR="${PWD}/output"
PADDLE_LITE_DIR="${LIBS_DIR}/inference_lite_lib.armlinux.${ARM_ABI}.gcc.with_extra.with_cv/cxx"
#PADDLE_LITE_DIR="/path/to/Paddle-Lite/build.lite.linux.${ARM_ABI}.gcc/inference_lite_lib.armlinux.${ARM_ABI}/cxx"

AM_MODEL_PATH="${MODELS_DIR}/cpu/fastspeech2_csmsc_arm.nb"
VOC_MODEL_PATH="${MODELS_DIR}/cpu/mb_melgan_csmsc_arm.nb"
ACOUSTIC_MODEL_PATH="${MODELS_DIR}/cpu/fastspeech2_csmsc_arm.nb"
VOCODER_PATH="${MODELS_DIR}/cpu/mb_melgan_csmsc_arm.nb"
FRONT_CONF="${PWD}/front.conf"
14 changes: 14 additions & 0 deletions demos/TTSArmLinux/download.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ set -e

cd "$(dirname "$(realpath "$0")")"

BASE_DIR="$PWD"

# load configure
. ./config.sh

Expand Down Expand Up @@ -38,6 +40,10 @@ download() {
echo '======================='
}

########################################

echo "Download models..."

download 'inference_lite_lib.armlinux.armv8.gcc.with_extra.with_cv.tar.gz' \
'https://paddlespeech.bj.bcebos.com/demos/TTSArmLinux/inference_lite_lib.armlinux.armv8.gcc.with_extra.with_cv.tar.gz' \
'39e0c6604f97c70f5d13c573d7e709b9' \
Expand All @@ -54,3 +60,11 @@ download 'fs2cnn_mbmelgan_cpu_v1.3.0.tar.gz' \
"$MODELS_DIR"

echo "Done."

########################################

echo "Download dictionary files..."

ln -s src/TTSCppFrontend/front_demo/dict "$BASE_DIR/"

"$BASE_DIR/src/TTSCppFrontend/download.sh"
21 changes: 21 additions & 0 deletions demos/TTSArmLinux/front.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# jieba conf
--jieba_dict_path=./dict/jieba/jieba.dict.utf8
--jieba_hmm_path=./dict/jieba/hmm_model.utf8
--jieba_user_dict_path=./dict/jieba/user.dict.utf8
--jieba_idf_path=./dict/jieba/idf.utf8
--jieba_stop_word_path=./dict/jieba/stop_words.utf8

# dict conf fastspeech2_0.4
--seperate_tone=false
--word2phone_path=./dict/fastspeech2_nosil_baker_ckpt_0.4/word2phone_fs2.dict
--phone2id_path=./dict/fastspeech2_nosil_baker_ckpt_0.4/phone_id_map.txt
--tone2id_path=./dict/fastspeech2_nosil_baker_ckpt_0.4/word2phone_fs2.dict

# dict conf speedyspeech_0.5
#--seperate_tone=true
#--word2phone_path=./dict/speedyspeech_nosil_baker_ckpt_0.5/word2phone.dict
#--phone2id_path=./dict/speedyspeech_nosil_baker_ckpt_0.5/phone_id_map.txt
#--tone2id_path=./dict/speedyspeech_nosil_baker_ckpt_0.5/tone_id_map.txt

# dict of tranditional_to_simplified
--trand2simpd_path=./dict/tranditional_to_simplified/trand2simp.txt
13 changes: 7 additions & 6 deletions demos/TTSArmLinux/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,13 @@ cd "$(dirname "$(realpath "$0")")"
. ./config.sh

# create dir
rm -rf "$OUTPUT_DIR"
mkdir -p "$OUTPUT_DIR"

# run
for i in {1..10}; do
(set -x; ./build/paddlespeech_tts_demo "$AM_MODEL_PATH" "$VOC_MODEL_PATH" $i "$OUTPUT_DIR/$i.wav")
done

ls -lh "$OUTPUT_DIR"/*.wav
set -x
./build/paddlespeech_tts_demo \
--front_conf "$FRONT_CONF" \
--acoustic_model "$ACOUSTIC_MODEL_PATH" \
--vocoder "$VOCODER_PATH" \
"$@"
# end
Loading

0 comments on commit e91bff7

Please sign in to comment.