Skip to content

Commit

Permalink
speaker mean
Browse files Browse the repository at this point in the history
  • Loading branch information
MaxMax2016 committed May 24, 2023
1 parent 1d68fbb commit 9246fef
Show file tree
Hide file tree
Showing 2 changed files with 94 additions and 9 deletions.
73 changes: 64 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

</div>

- 💗本项目的目标群体是:深度学习初学者,具备Pyhon和PyTorch的基本操作是使用本项目的前置条件
- 💗本项目的目标群体是:深度学习初学者,具备Python和PyTorch的基本操作是使用本项目的前置条件
- 💗本项目旨在帮助深度学习初学者,摆脱枯燥的纯理论学习,通过与实践结合,熟练掌握深度学习基本知识;
- 💗本项目不支持实时变声;(也许以后会支持,但要替换掉whisper)
- 💗本项目不会开发用于其他用途的一键包。(不会指没学会)
Expand Down Expand Up @@ -67,13 +67,13 @@
```shell
dataset_raw
├───speaker0
│ ├───xxx1-xxx1.wav
│ ├───000001.wav
│ ├───...
│ └───Lxx-0xx8.wav
│ └───000xxx.wav
└───speaker1
├───xx2-0xxx2.wav
├───000001.wav
├───...
└───xxx7-xxx007.wav
└───000xxx.wav
```

## 安装依赖
Expand Down Expand Up @@ -112,18 +112,73 @@ dataset_raw
- 4, 使用16k音频,提取内容编码
> python prepare/preprocess_ppg.py -w data_svc/waves-16k/ -p data_svc/whisper
- 5, 使用16k音频,提取音色编码
- 5, 使用16k音频,提取音色编码;应该将speaker改为timbre,才准确
> python prepare/preprocess_speaker.py data_svc/waves-16k/ data_svc/speaker
- 6, 使用32k音频,提取线性谱
- 6, 提取音色编码均值,用于推理;也可以在生成训练索引中,替换单个音频音色,作为发音人统一音色用于训练
> python prepare/preprocess_speaker_ave.py data_svc/speaker/ data_svc/singer
- 7, 使用32k音频,提取线性谱
> python prepare/preprocess_spec.py -w data_svc/waves-32k/ -s data_svc/specs
- 7, 使用32k音频,生成训练索引
- 8, 使用32k音频,生成训练索引
> python prepare/preprocess_train.py
- 8, 训练文件调试
- 9, 训练文件调试
> python prepare/preprocess_zzz.py
```shell
data_svc/
└── waves-16k
│ │
│ └── speaker0
│ │ ├── 000001.wav
│ │ └── 000xxx.wav
│ └── speaker1
│ ├── 000001.wav
│ └── 000xxx.wav
└── waves-32k
│ │
│ └── speaker0
│ │ ├── 000001.wav
│ │ └── 000xxx.wav
│ └── speaker1
│ ├── 000001.wav
│ └── 000xxx.wav
└── pitch
│ │
│ └── speaker0
│ │ ├── 000001.pit.npy
│ │ └── 000xxx.pit.npy
│ └── speaker1
│ ├── 000001.pit.npy
│ └── 000xxx.pit.npy
└── whisper
│ │
│ └── speaker0
│ │ ├── 000001.ppg.npy
│ │ └── 000xxx.ppg.npy
│ └── speaker1
│ ├── 000001.ppg.npy
│ └── 000xxx.ppg.npy
└── speaker
│ │
│ └── speaker0
│ │ ├── 000001.spk.npy
│ │ └── 000xxx.spk.npy
│ └── speaker1
│ ├── 000001.spk.npy
│ └── 000xxx.spk.npy
|
└── singer
├── speaker0.spk.npy
└── speaker1.spk.npy
```

## 训练
- 0, 如果基于预训练模型微调,需要下载预训练模型5.0.epoch1200.full.pth
Expand Down
30 changes: 30 additions & 0 deletions prepare/preprocess_speaker_ave.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import os
import argparse
import numpy as np


if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.description = 'please enter embed parameter ...'
parser.add_argument("dataset_speaker", type=str)
parser.add_argument("dataset_singer", type=str)
data_speaker = parser.parse_args().dataset_speaker
data_singer = parser.parse_args().dataset_singer

os.makedirs(data_singer)

for speaker in os.listdir(data_speaker):
print(speaker)
subfile_num = 0
speaker_ave = 0
for file in os.listdir(os.path.join(data_speaker, speaker)):
if file.endswith(".npy"):
source_embed = np.load(
os.path.join(data_speaker, speaker, file))
source_embed = source_embed.astype(np.float32)
speaker_ave = speaker_ave + source_embed
subfile_num = subfile_num + 1
speaker_ave = speaker_ave / subfile_num

np.save(os.path.join(data_singer, f"{speaker}.spk.npy"),
speaker_ave, allow_pickle=False)

0 comments on commit 9246fef

Please sign in to comment.