Skip to content

Commit

Permalink
Merge pull request #17 from qiliao123/patch-46
Browse files Browse the repository at this point in the history
Update record-custom-voice-samples.md
  • Loading branch information
sally-baolian authored Dec 28, 2022
2 parents 692cac1 + 16bf5a0 commit 02957b9
Showing 1 changed file with 4 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,13 @@ Many small but important details go into creating a professional voice recording

## Tips for preparing data for a high-quality voice

A great custom neural voice depends on several factors, like the quality and size of your training data.
A highly-natural custom neural voice depends on several factors, like the quality and size of your training data.

The quality of your training data is a primary factor. For example, in the same training set, consistent volume, speaking rate, speaking pitch, and speaking style are essential to create a great custom neural voice. You should also avoid background noise in the recording and make sure the script and recording match. To ensure the quality of your data, you need to follow [script selection criteria](#script-selection-criteria) and [recording requirements](#recording-your-script).
The quality of your training data is a primary factor. For example, in the same training set, consistent volume, speaking rate, speaking pitch, and speaking style are essential to create a high-quality custom neural voice. You should also avoid background noise in the recording and make sure the script and recording match. To ensure the quality of your data, you need to follow [script selection criteria](#script-selection-criteria) and [recording requirements](#recording-your-script).

For the size of the training data, in most cases you can usually prepare at least 500 utterances to build a reasonable custom neural voice that sounds natural. The more training data you prepare, the higher quality of voice you can get.
Regarding the size of the training data, in most cases you can build a reasonable custom neural voice with 500 utterances. According to our tests, adding more training data in most languages does not necessarily improve naturalness of the voice itself (tested using the MOS score), however, with more training data that covers more word instances, you have higher possibility to reduce the DSAT (dis-satisfied part of the speech, for example, the glitches) ratio for the voice.

In some cases, you may want a voice persona with unique characteristics. For example, a cartoon persona needs a voice with a special speaking style, or a voice that is very dynamic in intonation. For this voice, we recommend that you prepare at least 1000 (preferably 2000) utterances, and record them at a professional recording studio. To learn more about how to improve the quality of your voice model, see [characteristics and limitations for using Custom Neural Voice](/legal/cognitive-services/speech-service/custom-neural-voice/characteristics-and-limitations-custom-neural-voice?context=/azure/cognitive-services/speech-service/context/context).
In some cases, you may want a voice persona with unique characteristics. For example, a cartoon persona needs a voice with a special speaking style, or a voice that is very dynamic in intonation. For such cases, we recommend that you prepare at least 1000 (preferably 2000) utterances, and record them at a professional recording studio. To learn more about how to improve the quality of your voice model, see [characteristics and limitations for using Custom Neural Voice](/legal/cognitive-services/speech-service/custom-neural-voice/characteristics-and-limitations-custom-neural-voice?context=/azure/cognitive-services/speech-service/context/context).

## Voice recording roles

Expand Down

0 comments on commit 02957b9

Please sign in to comment.