You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thank you for open source the wonderful work !
I followed your instructions 1) install lightconv_cuda, 2) download the checkpoint, 3) download the speaker embedding npy.
However, the generated result is not good.
# sh run.sh
2022-11-30 13:45:22.626404: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Device of XSpkEmoTrans: cuda
Removing weight norm...
Raw Text Sequence: Hello world
Phoneme Sequence: {HH AH0 L OW1 W ER1 L D}
ENV
python 3.6.8
fairseq 0.10.2
torch 1.7.0+cu110
CUDA 11.0
Hi @pangtouyuqqq , thanks for your attention. It is because of the dataset where there are only two different texts (It will give you more natural output when you try with one of them). If you need to generate unseen text, you may get some helps by training on other dataset which has more generic text-speech pairs. It would be also helpful to replace light convolution with transformer when you do that.
Hi, thank you for open source the wonderful work !
I followed your instructions 1) install
lightconv_cuda
, 2) download the checkpoint, 3) download the speaker embedding npy.However, the generated result is not good.
Below is my running command
ENV
Hello world_Actor_22_sad.wav.zip
The text was updated successfully, but these errors were encountered: