-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
without local/data.sh file in egs2/mixed_v3/s2t1/local #5469
Comments
@pyf98, can you answer it for me? |
Hi, we currently do not provide a centralized |
V1 also has some: https://github.com/espnet/espnet/tree/master/egs2/mixed_v1/s2t1/local |
It is suggested to run them separately and combine them later. |
For the new data added in v3: @jctian98 Could you provide the scripts? |
Why not include them in |
OK, we will try to provide
If we first prepare all datasets into |
Also note that we re-download the raw data for Multilingual LibriSpeech to use the fully formatted transcriptions. This downloading can take multiple days and may fail in the middle depending on the network. I have included some retry mechanism (see |
Thanks a lot, I will try them to prepare the data separately. |
@pyf98 Based on the local/prepare_wenetspeech.py for merging wenetspeech, if there is a 10-second audio gap in the wenetspeech corpus due to low confidence and it is skipped, will it cause a situation where there is audio but no text when merging for 30 seconds? |
Thanks @junshipeng for the question. I think such situation can happen in general (not limited to a specific dataset). I do not have a perfect solution now. I wanted to keep the original timestamps in the long recordings which is also easier, instead of concatenating the segmented utterances manually. |
@pyf98 Have you tried the effect of not concatenating into 30 seconds, such as using the original annotated audio length? |
@junshipeng No, we didn't try that. We tried to mimic Whisper so we always used long-form inputs. In this way, we can predict timestamps for each utterance in addition to the text transcript. If we use short segmented utterances, we cannot achieve this. |
@pyf98 https://github.com/espnet/espnet/blob/master/egs2/mixed_v1/s2t1/local/prepare_librispeech.py#L26 there are some errors when prepare the librispeech |
@nichongjia-2007 What are the errors? For LibriSpeech, I was using the unsegmented version which is different from the commonly used ESPnet data. |
@pyf98 When I run into Stage 7, I encounter an error: File "git/espnet/espnet2/bin/launch.py", line 265, in main After reviewing the code, I noticed that the cmd parameter in launch.py is set to run.pl, which is defined in cmd.sh. However, it seems that the espnet2.bin.launch in Stage 7 does not support run.pl: |
@junshipeng I do not know this type of errors. But for OWSM, we do not use any LM. So we do not execute Stage 7 |
@pyf98 hi,I noticed that stage10 is taking a long time to complete. Is there any way to speed it up? Also, the GPU parameter is set to 0. Is this normal? |
@junshipeng Stage 10 is used to collect stats which will be used to normalize the features. It does not require GPU. To speed up, you can use many parallel jobs, i.e., changing |
Describe the bug
In egs2/mixed_v3/s2t1/local directory, there is not data.sh file, but in https://github.com/espnet/espnet/blob/master/egs2/TEMPLATE/s2t1/s2t.sh#L549, it should has the file.
The text was updated successfully, but these errors were encountered: