Segmentation Strategies

This project consisted of comparing three popular VAD toolkits and understanding the outcome of applying automatic segmentations on a state-of-the-art multilanguage translation model compared to a cascaded one.

Presentation containing all BLEU scores and additional graphs: https://docs.google.com/presentation/d/1WqdPUqvJ0g0qn6PVGymANBHndMhTnmZ1AXZiSIEHcNo/edit?usp=sharing
VAD Toolkit_1: https://github.com/NickWilkinson37/voxseg
VAD Toolkit_2: https://github.com/ina-foss/inaSpeechSegmenter
VAD Toolkit_3: https://github.com/wiseman/py-webrtcvad
mTEDx test and validation corpora: http://www.openslr.org/100
Kaldi toolkit: https://github.com/kaldi-asr/kaldi
Machine translation and speech recognition/translation was done by employing the framework: https://github.com/nlp-dke/NMTGMinor
The end-to-end and cascaded multilingual translation models: https://aclanthology.org/2021.iwslt-1.15/

Manual Segmentations and reference translations can be found in the corresponding sets. The segmentations were performed using the three mentioned toolkits on a local machine. Next, the extraction of the MFCC features was performed on the same local machine by using the Kaldi toolkit. The resulting features (representing all the different segmentations) were uploaded on Google Drive. Then, through a GPU Hardware-accelerated Google Colab session, the cascaded end-to-end and cascaded were employed for each segmentation.

Results

Dominating segmentation strategy employing the end-to-end translation model

Language pair	Best segmentation	Segments count	Segments Difference*	BLEU score Difference*
pt-es_test	voxseg -s 0.90	1294	23.5%	17.7%
pt-es_valid	voxseg -s 0.95	1139	11.7%	18.1%
it-en_test	voxseg -s 0.95	1223	22.2%	14.9%
it-en_valid	webrtcvad -p 2	1075	14.4%	9.8%
it-es_test	voxseg -s 0.95 inaspeech -r 0.15 inaspeech -r 0.20	1223 1228 1335	22.2% 22.6% 30.8%	15.9%
it-es_valid	webrtcvad -p 2	1075	14.4%	11.7%
es-en_test	webrtcvad -p 0	1116	11.4%	6.5%
es-en_valid	webrtcvad -p 0 webrtcvad -p 1	1082 1117	11.4% 14.6%	9.5%
pt-en_test	voxseg -s 0.90	1294	23.5%	15.1%
pt-en_valid	voxseg -s 0.95 inaspeech -r 0.05	1139 1199	11.7% 16.8%	16.5%

Table displaying the best-found segmentation toolkit, the corresponding parameter, and the number of segments created. *The table also shows the percentage difference in segments counts and BLEU score compared to the scores given by the end-to-end translation model when utilizing the manual segmentation

Dominating segmentation strategy when employing the cascaded translation model

Language pair	Best segmentation	Segments count	Segments Difference*	BLEU score Difference*
pt-es_test	voxseg -s 0.90	1294	23.5%	19.3%
pt-es_valid	inaspeech -r 0.05	1199	16.8%	19.0%
it-en_test	inaspeech -r 0.15 inaspeech -r 0.20	1228 1335	22.6% 30.8%	13.1%
it-en_valid	webrtcvad -p 2	1075	14.4%	11.0%
it-es_test	inaspeech -r 0.15	1228	22.6%	16.6%
it-es_valid	voxseg -s 0.90 webrtcvad -p 2	991 1075	6.2% 14.4%	12.8%
es-en_test	webrtcvad -p 0	1116	11.4%	6.0%
es-en_valid	webrtcvad -p 1	1117	14.6%	7.7%
pt-en_test	voxseg -s 0.90	1294	23.5%	16.6%
pt-en_valid	inaspeech -r 0.05	1199	16.8%	17.4%

Table displaying the best-found segmentation toolkit, the corresponding parameter, and the number of segments created. *The table also shows the percentage difference in segments counts and BLEU score compared to the scores given by the cascaded translation model when utilizing the manual segmentation.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Hypotheses		Hypotheses
README.md		README.md
Segmentation_Strategies_Maastricht.ipynb		Segmentation_Strategies_Maastricht.ipynb
Segmentation_Strategies_Mihai_Tudor.pdf		Segmentation_Strategies_Mihai_Tudor.pdf
ted_banner.jpg		ted_banner.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Segmentation Strategies

Results

Dominating segmentation strategy employing the end-to-end translation model

Dominating segmentation strategy when employing the cascaded translation model

About

Releases

Packages

Languages

mihaitudor9/Segmentation_Strategies_Speech_Translation

Folders and files

Latest commit

History

Repository files navigation

Segmentation Strategies

Results

Dominating segmentation strategy employing the end-to-end translation model

Dominating segmentation strategy when employing the cascaded translation model

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages