MelDataset mel VS mel_loss #163

nikifori · 2024-04-18T09:57:36Z

What's the difference between those two mel spectrograms that MelDataset returns (mel and mel_loss)?
What's the difference of the former alleged mels from Tacotron 2?

Thanks

msalhab96 · 2024-05-07T10:45:54Z

You can think of mel_loss as the ground truth mel-spectrogram, and mel is the input mel-spectrogram.

So in case you are doing pertaining, they both will be the same, if you are doing fine-tuning, then mel is the input which generated from tacotron, and mel_loss is the ground truth.

check the code here

Ziyi6 · 2024-05-23T02:22:59Z

@msalhab96 Hello. I noticed that in training, we use the LJ-Speech dataset which contains many wav files. What we do in training is that we extract the mel spectrogram from these wavs with highest frequency up tp 8kHz (see fmax in config file) and feed them to the model. The model outputs wavs, then we calculate the mel-spectrogram again, but with highest frequency up to 11.025kHz (see fmax_for_loss=null in config file, in librosa, the default value of highest frequency is half of the sampling rate) for calculating the loss. Do you know why is there a gap?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MelDataset mel VS mel_loss #163

MelDataset mel VS mel_loss #163

nikifori commented Apr 18, 2024 •

edited

Loading

msalhab96 commented May 7, 2024

Ziyi6 commented May 23, 2024 •

edited

Loading

MelDataset mel VS mel_loss #163

MelDataset mel VS mel_loss #163

Comments

nikifori commented Apr 18, 2024 • edited Loading

msalhab96 commented May 7, 2024

Ziyi6 commented May 23, 2024 • edited Loading

nikifori commented Apr 18, 2024 •

edited

Loading

Ziyi6 commented May 23, 2024 •

edited

Loading