-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can we train with this yet? #10
Comments
Hi @EmElleE , yes you could but need to tune hparam for residual encoder and it is really close to. |
@keonlee9420 quick question do you have the LJS Model? I would like to finetune on this, do you know how much data is required for fine tuning? also is the quality close to tacotron2? it seems like these days people use tacotron2 because it works well cloning voices. Do you think Parallel-Tacotron2 is similar or capable ? |
Hi @ArEnSc , I don't have it yet, but I'll share when I get it. But please note that the result would be much worse than expected since the maximum batch is too small compared to the original paper. |
Take a look at this: speaker_embedding_m = speaker_embedding.unsqueeze(1).expand(
-1, max_mel_len, -1
)
position_enc = self.position_enc[
:, :max_mel_len, :
].expand(batch_size, -1, -1)
enc_input = torch.cat([position_enc, speaker_embedding_m, mel], dim=-1)
|
Hi @phamlehuy53 , |
But you notice that |
oh, sorry I mistyped. position_enc = self.position_enc[
:, :max_mel_len, :
].expand(batch_size, -1, -1) |
Yep, when |
Just wondering if we can train with LJS on this implementation thanks!
The text was updated successfully, but these errors were encountered: