-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
recipe for Multi-Decoder DPRNN #463
recipe for Multi-Decoder DPRNN #463
Conversation
I just finished writing the eval script. Now only bash script is left. (I don't know how to write bash, btw, so I'd be nice if I could get some help) |
Also, I'm not entirely sure how to share models on huggingface. Can someone send me a tutorial somewhere? I currently have both .ckpt and .pth |
You just need to create a repo there, pull, add your model, change the readme and push. Thanks for your work on this recipe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Again, some changes before it can be merged.
train_dir: "../../../dataset/{}speakers/wav8k/min/tr" | ||
valid_dir: "../../../dataset/{}speakers/wav8k/min/cv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I said this before, but relative path in the config file doesn't seem right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I'll change this to data/{}speakers/wav8k/min/tr. I just wrote relative paths because how it was on my system
|
||
if [[ $stage -le 3 ]]; then | ||
echo "Stage 3: Training" | ||
echo "If you want to change n_srcs, please change the config file" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Printing this every time doesn't seem useful.
Also, why n_srcs is in the file if you can't change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think bash supports adding an integer list as argument, it just gets processed as strings automatically.
@@ -0,0 +1,214 @@ | |||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean unused imports
self.log("avg_sdr", avg_sdr) | ||
|
||
|
||
class WeightedPITLoss(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a docstring and some comments for understanding what this does on the high level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you can see the docstring above the init line.
I fixed the stuff you mentioned, could we pass this PR now? |
Have you managed to share a model on HF Hub? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are getting close, thanks again for your dedication.
est_sources = model.separate(mix[None]) | ||
p_si_snr = Penalized_PIT_Wrapper(pairwise_neg_sisdr_loose)(est_sources, sources) | ||
utt_metrics = { | ||
"P-Si-SNR": p_si_snr.item(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's P-Si-SNR
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's defined in the paper.
p_si_snr = Penalized_PIT_Wrapper(pairwise_neg_sisdr_loose)(est_sources, sources) | ||
utt_metrics = { | ||
"P-Si-SNR": p_si_snr.item(), | ||
"Accuracy": float(sources.size(0) == est_sources.size(0)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a counting accuracy, then counting_accuracy
is better.
from scipy.optimize import linear_sum_assignment | ||
|
||
|
||
class PairwiseNegSDR_Loose(_Loss): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loss intead of Loose
self.take_log = take_log | ||
self.EPS = EPS | ||
|
||
def forward(self, est_targets, targets): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way not to duplicate all this code, but subclass it, with minor modif.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main problem is the line assert targets.size() == est_targets.size() in your original code, as it does not support variable number of speakers.
Do we have any further issue to fix, or should we pass this PR? |
We're close.
|
Hi Pariente, |
Don't have the bash script and separate function yet