Inverse Mel-transform - Not getting original audio back #2541
Description
🐛 Describe the bug
Trying to compute Mel-spectrogram for audio signal and do inverse operation on this output to ensure we are getting the original audio back.
assert orig_audio == inverse_mel_spectrogram(mel_spectrogram(orig_audio))
#Not an actual code, just for understanding.
I use standard torch's STFT and mel-scale transforms for getting mel-spectrogram output, whereas for inverse mel-spectrogram I can't find an approach in torch similar to that of Librosa's mel_to_stft.
The reason why am looking for at Librosa's implementation instead of torch's offering such as InverseMelScale is because
- Librosa is cheaper (I guess it uses LBFGS)
- I don't care about the perfect approximation of the phase because I already have the output of STFT complex ( I have original magphase). For my use case, it's not necessary to get an accurate phase of audio signal, all that i care is converting from mel-scale to linear-scale. Hence, trying to avoid costlier methods such as SGD or Griffin-Lim.
I tried to port the Librosa's mel_to_stft code using torch, but am not getting the original audio signal back. Librosa uses np.linalg.lstsq, equivalent of the same in pytorch is torch.linalg.lstsq(). The code I tried for the same is available in this colab notebook.
Kindly let me know if I have made any mistake in this, I will update and raise a PR for the same.
Why am i doing all this ?
I work on Speech Enhancement, hence i need to compute mel-features from the audio signal - pass it to my model - get TF-mask - multiply with mel-feature - do inverse melscale - reconstruct the audio using ISTFT and get denoised signal.
Versions
torch 1.11.0+cu115
torchaudio 0.11.0+cu115