-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate trainer on Code-Switched Speech fails with "ValueError: Multiple languages detected when trying to predict the most likely target language for transcription." #30654
Comments
cc @kamilakesbi |
Hi @kamilakesbi, |
Hey @sproocht - thanks for reporting! This issue was in-fact closed by #29938 for the Transformers example, and huggingface/blog#1944 for the blog post. If you copy the latest example script and use the latest version of Transformers, you should be able to force the language token by setting the Hope that helps! |
Hey @sanchit-gandhi, |
Hey @sanchit-gandhi, |
System Info
transformers
version: 4.41.0.dev0Who can help?
@sanchit-gandhi
@ArthurZucker
@muellerzr
This issue is related to finetuning Whisper on datasets that may contain switches from a base language to other languages, or simply low resource languages for which language identification by the pre-trained model is not accurate enough. So the issue may be reproduced by mixing a few audio utterances from French into a German dataset, for example, and running "trainer.evaluate" on it .
Up until transformers version 4.37.2, fine-tuning and evaluating on these types of datasets did not raise any issues and the fine-tuning result was very acceptable. In more recent versions, starting with 4.38.0, model evaluation systematically fails on such datasets (in transformers/models/whisper/generation_whisper.py)
I can understand the idea of forcing a single language in a batch, but in real-life situations, people use many languages concurrently in their daily interactions and this is reflected in the datasets. However, this issue prohibits fine-tuning for languages such as Luxembourgish, where it is frequent to mix Luxembourgish with English, French or German in the same utterances. Many other cases concerns Spanglish or Hinglish cases, or low resource languages borrowing words or phrases from other high-resource languages. So, it could prevent using the transformers library to fine-tune for such languages.
The only workaround that I have at the moment, is to stick to version 4.37.2 . Please have a look at this regression.
Thank you in advance!
Here is the full error code and messages:
`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_12853/1263219524.py in
1 # Get initial evaluation results
----> 2 trainer.evaluate()
~/.local/lib/python3.10/site-packages/transformers/trainer_seq2seq.py in evaluate(self, eval_dataset, ignore_keys, metric_key_prefix, **gen_kwargs)
178 self.gather_function = self.accelerator.gather
179 self._gen_kwargs = gen_kwargs
--> 180 return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
181
182 def predict(
~/.local/lib/python3.10/site-packages/transformers/trainer.py in evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
3513
3514 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
-> 3515 output = eval_loop(
3516 eval_dataloader,
3517 description="Evaluation",
~/.local/lib/python3.10/site-packages/transformers/trainer.py in evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
3696
3697 # Prediction step
-> 3698 loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
3699 main_input_name = getattr(self.model, "main_input_name", "input_ids")
3700 inputs_decode = self._prepare_input(inputs[main_input_name]) if args.include_inputs_for_metrics else None
~/.local/lib/python3.10/site-packages/transformers/trainer_seq2seq.py in prediction_step(self, model, inputs, prediction_loss_only, ignore_keys, **gen_kwargs)
308 k: v for k, v in inputs.items() if k not in ("decoder_input_ids", "decoder_attention_mask")
309 }
--> 310 generated_tokens = self.model.generate(**generation_inputs, **gen_kwargs)
311
312 # Temporary hack to ensure the generation config is not initialized for each iteration of the evaluation loop
~/.local/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py in generate(self, input_features, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, return_timestamps, task, language, is_multilingual, prompt_ids, prompt_condition_type, condition_on_prev_tokens, temperature, compression_ratio_threshold, logprob_threshold, no_speech_threshold, num_segment_frames, attention_mask, time_precision, return_token_timestamps, return_segments, return_dict_in_generate, **kwargs)
528
529 # pass self.config for backward compatibility
--> 530 init_tokens = self._retrieve_init_tokens(
531 input_features,
532 generation_config=generation_config,
_~/.local/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py in _retrieve_init_tokens(self, input_features, generation_config, config, num_segment_frames, kwargs)
1167
1168 if torch.unique(lang_ids).shape[0] > 1:
-> 1169 raise ValueError(
1170 "Multiple languages detected when trying to predict the most likely target language for transcription. It is currently not supported to transcribe to different languages in a single batch. Please make sure to either force a single language by passing
language='...'
or make sure all input audio is of the same language."1171 )
ValueError: Multiple languages detected when trying to predict the most likely target language for transcription. It is currently not supported to transcribe to different languages in a single batch. Please make sure to either force a single language by passing
language='...'
or make sure all input audio is of the same language.`_Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Run : trainer.evaluate() on a dataset containing a mix of languages.
Expected behavior
Works in transformers versions up to 4.37.2
The text was updated successfully, but these errors were encountered: