Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MelDataset mel VS mel_loss #163

Open
nikifori opened this issue Apr 18, 2024 · 2 comments
Open

MelDataset mel VS mel_loss #163

nikifori opened this issue Apr 18, 2024 · 2 comments

Comments

@nikifori
Copy link

nikifori commented Apr 18, 2024

What's the difference between those two mel spectrograms that MelDataset returns (mel and mel_loss)?
What's the difference of the former alleged mels from Tacotron 2?

Thanks

@msalhab96
Copy link

You can think of mel_loss as the ground truth mel-spectrogram, and mel is the input mel-spectrogram.

So in case you are doing pertaining, they both will be the same, if you are doing fine-tuning, then mel is the input which generated from tacotron, and mel_loss is the ground truth.

check the code here

@Ziyi6
Copy link

Ziyi6 commented May 23, 2024

@msalhab96 Hello. I noticed that in training, we use the LJ-Speech dataset which contains many wav files. What we do in training is that we extract the mel spectrogram from these wavs with highest frequency up tp 8kHz (see fmax in config file) and feed them to the model. The model outputs wavs, then we calculate the mel-spectrogram again, but with highest frequency up to 11.025kHz (see fmax_for_loss=null in config file, in librosa, the default value of highest frequency is half of the sampling rate) for calculating the loss. Do you know why is there a gap?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants