You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What's the difference between those two mel spectrograms that MelDataset returns (mel and mel_loss)?
What's the difference of the former alleged mels from Tacotron 2?
Thanks
The text was updated successfully, but these errors were encountered:
You can think of mel_loss as the ground truth mel-spectrogram, and mel is the input mel-spectrogram.
So in case you are doing pertaining, they both will be the same, if you are doing fine-tuning, then mel is the input which generated from tacotron, and mel_loss is the ground truth.
@msalhab96 Hello. I noticed that in training, we use the LJ-Speech dataset which contains many wav files. What we do in training is that we extract the mel spectrogram from these wavs with highest frequency up tp 8kHz (see fmax in config file) and feed them to the model. The model outputs wavs, then we calculate the mel-spectrogram again, but with highest frequency up to 11.025kHz (see fmax_for_loss=null in config file, in librosa, the default value of highest frequency is half of the sampling rate) for calculating the loss. Do you know why is there a gap?
What's the difference between those two mel spectrograms that MelDataset returns (mel and mel_loss)?
What's the difference of the former alleged mels from Tacotron 2?
Thanks
The text was updated successfully, but these errors were encountered: