Implement several new metrics for speech recognition #2451

asumagic · 2024-03-04T17:24:34Z

What does this PR do?

The goal of this PR is to introduce a number of new metrics, and the supporting interfaces and package integrations that go along with it. The metrics picked here were suggested and compared by the paper Qualitative Evaluation of Language Model Rescoring in Automatic Speech Recognition, with the hope to address shortcomings of the WER in ASR.

Much of the PR upgrades existing metrics for flexibility, and some of the metrics are suitable for tasks other than ASR.

No new required dependencies are added. flair and spaCy are added as optional dependencies, in the form of optional modules under speechbrain.lobes, if only to help make model loading as consistent as possible with how we do HF hub loading in SB.
Whether this whole approach is the way to go should be discussed for this PR... I feel weird about them because they add annoying dependencies to the CI and for docs generation and they are rather incomplete as they only implement what is necessary for this PR (but can be extended for more usecases).

Other changes

The WER calculation was fixed to work with empty references. This can be changed, but I figure it is sane enough to scale errors as if the reference contained 1 word.
There were some other edits to e.g. the WER calculation code, but nothing that changes the default behavior.

Tutorial:

The following tutorial demonstrates how to use all the proposed metrics using sample ASR predictions over a French corpus (taken from https://github.com/thibault-roux/hypereval/) in terms of hyperparameters, so that they can easily be copied and integrated into recipes:

https://gist.github.com/asumagic/75a362614b55695be8c4b729567b252a

Introduced/suggested metrics

Part-of-speech Error Rate (POSER)

WER is estimated over parts of speech instead of words.
In order to support this conveniently, this PR adds a thin integration with the flair toolkit, which is frequently used to implement POS-tagging models.

The paper proposes a variant (uPOSER) with broad POS categories, but we do not explicitly reference that detail: with this PR, implementing uPOSER can be done with the synonym dictionary mechanism (or with token mapping that already exists in the ER classes but I haven't tried).

Lemma Error Rate (LER)

WER is estimated over lemmas instead of words.
In order to support this conveniently, this PR adds a thin integration with the spaCy toolkit. Note that the download mechanism for spaCy is not the same, and they do not use the HF hub, so we do not try to integrate it any more nicely.

Embedding Error Rate (EmbER)

EmbER weights the WER with a check over the cosine similarity of word embeddings. See the code for more details.

Because word-level embeddings are required, subword tokenization is an issue. Thus, this PR adds also adds a simple wrapper for flair embeddings, which provides support for some word-level embeddings like fastText. The models are rather large, but it works.

Note: facebook's fasttext package was initially used, but it comes with headaches at install time, has been archived and flair was being integrated anyway, with equivalent support, and significantly stronger word embedding support - so that was more powerful and simpler.

BERTScore

BERTScore introduces recall, precision and F1 metrics which are calculated using contextualized embeddings using a BERT-like LM, currently hardcoded to use a HuggingFace Transformers interface. See code and docs for more details.

This PR adds a simple, well-documented reimplementation of BERTScore, which should closely match the scores obtained by the reference implementation. No additional dependency is required.

Sentence Semantic Distance (SemDist)

Compare sentence embeddings as output by a BERT-like LM, using cosine similarity.

Two modes are currently proposed to determine what to compute the similarity on:

mean of all contextualized embeddings
embedding of the output [CLS] token

Additionally, Roux's paper cited earlier uses a sentence embedding model, which we do not explicitly use. Currently, this is hardcoded to use a HuggingFace Transformers LM interface. No interface for sentence embedding models is currently provided, but this is an easy addition. However, it would require adding a dependency as HF Transformers does not seem to wrap such models.

Synonym dictionaries

This PR also allows defining "synonym" dictionaries for words that should be considered identical for the WER.
Since the WER function is now made to allow taking an equality function as a parameter, plugging it into the WER calculation is trivial.

As mentioned earlier, one of the usecases is to define classes that should be considered equivalent when wrapping the WER (e.g. for the uPOSER metric implementation).

Before submitting

Did you read the contributor guideline?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified
Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
Review the self-review checklist to ensure the code is ready for review

asumagic · 2024-03-21T11:15:53Z

Tests seem to fail because speechbrain/SSL_Quantization on HF returns 401, should stuff requiring HF even be part of the doctests?

asumagic · 2024-03-25T17:20:16Z

Added a link to a tutorial I just finished completing in the main post.

asumagic

Updated the tutorial with the new TextEncoder HF interface. I believe this should be ready to review again.

speechbrain/utils/metric_stats.py

speechbrain/lobes/models/flair/embeddings.py

asumagic · 2024-03-28T10:19:08Z

speechbrain/utils/bertscore.py

+            batch_precision = precision_values * precision_weights
+
+            for i, utt_id in enumerate(ids):
+                # TODO: optionally provide a token->token map


It's not actually implemented yet, but the TODO indicates it can be done and roughly where that should be. It can be done later or just implemented if we figure out it's useful in practice.

I am actually not fully sure what form it would take and it wouldn't be very useful without a way to present it (which doesn't seem very convenient to do in a text interface, as opposed to e.g. a graph/table view using graphviz or matplotlib).

asumagic · 2024-03-29T08:42:30Z

That said, I still don't understand why CI is failing.

Adel-Moumen · 2024-03-29T13:27:13Z

That said, I still don't understand why CI is failing.

The error is due to https://huggingface.co/speechbrain/SSL_Quantization being private. Could you please add doctest skip in the example so that It skip running the example?

Adel-Moumen

The code looks clean! I am only wondering why the CI is not asking you to complte the "returns" section of some docstring since it should be required but other than it looks good to me.

speechbrain/lobes/models/flair/embeddings.py

speechbrain/lobes/models/huggingface_transformers/textencoder.py

Adel-Moumen · 2024-03-29T13:42:04Z

As for the tutorial, could you please try to stick with the shape of SpeechBrain colabs? (same header etc)

Adel-Moumen

LGTM. Thanks @asumagic this is a very great work.

CC: @mrouvier :)

asumagic force-pushed the metrics-roux22-interspeech branch from 014e4f0 to b71e830 Compare March 12, 2024 12:46

asumagic force-pushed the metrics-roux22-interspeech branch from 82f400f to 5c90ea9 Compare March 20, 2024 12:50

asumagic added the enhancement New feature or request label Mar 21, 2024

asumagic force-pushed the metrics-roux22-interspeech branch from a76f72d to 9cdfad0 Compare March 22, 2024 08:52

asumagic added 24 commits March 25, 2024 18:20

Add equality_comparator argument for error stats calculation

f9078d5

Add synonym dictionary class

ad3e44a

wip

cc0c118

Implement flair POS tagger

42ad4cc

Add spaCy lemmatizer pipeline

4d3ca68

flair sequencetagger fix

fb1dcb7

Use star imports in init

472061e

ember wip

cdec8bf

Module docs for models/flair and spacy

180acef

Add fastText wrapper in lobes/models

2057963

Add WeightedErrorRateStats class

2914ca7

Minor fixes for weighted wer

c472093

WIP BERTScore code dump

e7d75a6

Add some TODOs

9a86332

Apparently we're in 2024

f2d6cdb

Move cosine_similarity_matrix to sb.utils

57f64cd

Formatting and pre-commit

0ae996e

Add return annotation for cosine_similarity_matrix

34a97cc

WIP BERTScoreStats class

e281891

Implement token weighting for BERTScore

af64121

Docs and refactor for BERTScore

1ab646a

BERTScore cleanup, fixes and rework

4d7e4a4

Add missing docstring

0664e9e

More BERTscore and debug

f50d209

asumagic added 10 commits March 28, 2024 11:26

blbl

c9d674b

Overhaul WeightedErrorRateStats summarize docstring

e257866

EmbER return docstring

ec43e21

Fix brainfarts in SemDist docstrings

e6bf73b

SemDist docstrings

2815abf

Add HF transformers TextEncoder interface

224632f

Use new TextEncoder interface in BERTScore

cca6f99

Formatting

762313e

Change TextEncoder API, use it in SemDist

b0f5d1d

Allow **kwargs for hf interface constructor

899822b

asumagic force-pushed the metrics-roux22-interspeech branch from 899822b to b0f5d1d Compare March 28, 2024 14:56

Optional device arg in huggingface wrapper

0791eaf

asumagic force-pushed the metrics-roux22-interspeech branch from 5ec6e09 to 0791eaf Compare March 28, 2024 15:04

asumagic added 4 commits March 28, 2024 16:26

Formatting

75f7ad6

Fix special_tokens_map handling for some models

e0639e0

Fix mistake from BERTScore refactor

91ea918

Handle lists in special_tokens_map...

563a1e9

asumagic commented Mar 29, 2024

View reviewed changes

skip broken doctests due to ssl model unavailability

11b981c

Adel-Moumen approved these changes Mar 29, 2024

View reviewed changes

speechbrain/lobes/models/flair/embeddings.py Show resolved Hide resolved

speechbrain/lobes/models/huggingface_transformers/textencoder.py Outdated Show resolved Hide resolved

Set default value for return_tokens

3fa3c7e

asumagic added 2 commits March 29, 2024 14:46

Add missing return docstring

539648f

Remove TOKENIZERS_PARALLELISM=false for TextEncoder

4c0f866

Adel-Moumen approved these changes Mar 29, 2024

View reviewed changes

blblblblbl

22bd6dd

Adel-Moumen merged commit 1350e9b into speechbrain:develop Mar 29, 2024
5 checks passed

asumagic mentioned this pull request Apr 11, 2024

Circular Import Error #2500

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement several new metrics for speech recognition #2451

Implement several new metrics for speech recognition #2451

asumagic commented Mar 4, 2024 •

edited

Loading

asumagic commented Mar 21, 2024

asumagic commented Mar 25, 2024

asumagic left a comment

asumagic Mar 28, 2024

asumagic commented Mar 29, 2024

Adel-Moumen commented Mar 29, 2024

Adel-Moumen left a comment

Adel-Moumen commented Mar 29, 2024

Adel-Moumen left a comment

Implement several new metrics for speech recognition #2451

Implement several new metrics for speech recognition #2451

Conversation

asumagic commented Mar 4, 2024 • edited Loading

What does this PR do?

Other changes

Tutorial:

Introduced/suggested metrics

Part-of-speech Error Rate (POSER)

Lemma Error Rate (LER)

Embedding Error Rate (EmbER)

BERTScore

Sentence Semantic Distance (SemDist)

Synonym dictionaries

PR review

asumagic commented Mar 21, 2024

asumagic commented Mar 25, 2024

asumagic left a comment

Choose a reason for hiding this comment

asumagic Mar 28, 2024

Choose a reason for hiding this comment

asumagic commented Mar 29, 2024

Adel-Moumen commented Mar 29, 2024

Adel-Moumen left a comment

Choose a reason for hiding this comment

Adel-Moumen commented Mar 29, 2024

Adel-Moumen left a comment

Choose a reason for hiding this comment

asumagic commented Mar 4, 2024 •

edited

Loading