[WIP] initial implementation to support audio processing as arrays #40

g8a9 · 2024-03-06T19:21:01Z

This PR has multiple goals.
The main one is to support using numpy arrays for audio rather than local paths only. I believe this could be a strong limitation since many audio datasets today come from the HF Hub already decoded into np.ndarray.

However, while inspecting the code, I've noticed there are still several dependencies from local files that we don't want to have. For example, the code expects:

local transcripts for most of the explainers (I think we should have a centralized place where we compute them)
local files for adding noise (white and pink)

Another important goal is to standardize transcript generation, which is currently made only by the LOO explainers.

g8a9 · 2024-03-07T10:13:51Z

@emanuele-moscato I've left around several todo markers for points to discuss. You can find them with #TODO GA

emanuele-moscato · 2024-03-07T14:28:39Z

Look at the explainers under ferret/explainers/explanation_speech/:

Whenever the path to an audio file is required (typically in the compute_explanation methods), pass an object of type FerretAudio (newly defined in the PR).
Make FerretAudio able to transcribe to text (dedicated method called from within the explainers when/if needed). --> NOT VALID ANYMORE
Check the SpeechXAI examples notebook.

ferret/explainers/explanation_speech/gradient_speech_explainer.py

ferret/explainers/explanation_speech/lime_speech_explainer.py

ferret/explainers/explanation_speech/loo_speech_explainer.py

ferret/speechxai_utils.py

…to the speechxai_utils.py module

emanuele-moscato · 2024-03-15T13:53:08Z

To do:

In the definition of the ExplanationSpeech class, change the kwarg name from audio_path to audio and fix code accordingly everywhere the class is implemented (make sure an object of type FerretAudio is passed).
Address the comments in the last code review.
Complete the update of the speech explainers so they accept an object of type FerretAudio as their input.
See previous to-do list.

Note: the audio transcription functions transcribe_audio and transcribe_audio_with_model were moved from the ferret/explainers/explanation_speech/utils_removal.py module to ferret/speechxai_utils.py to avoid a circular import. The rest of the code has already been updated, but have one last check please!

…ers, changes to ExplanationSpeech Class

ferret/explainers/explanation_speech/equal_width/gradient_equal_width_explainer.py

ferret/explainers/explanation_speech/equal_width/lime_equal_width_explainer.py

ferret/explainers/explanation_speech/equal_width/loo_equal_width_explainer.py

…x notebook

emanuele-moscato · 2024-03-18T10:51:34Z

Issues to solve:

When loading an audio file into a FerretAudio object and then extracting the AudioSegment object (to_pydub method), the audio gets distorted. Check FerretAudio's to_pydub method.

Action: modified FerretAudio's to_pydub method so as to create a pudyb AudioSegment object from an array that is always unnormalized and of dtype int16.

Move audio resampling (if the native sample rate is different from 16 KHz) to within the transcription method.
Move normalization of the array corresponding to the audio (if needed) to within the transcription method as well (so there are no ambiguities on whether the array provided by the user is normalized or not: it is what it is from the start).
Check that the new way of obtaining arrays from audio (librosa.load) returns the same as (AudioSegment.load_wav --> pydub_to_np).

Result: numerically it is, they both return an array of dtype float32 normalized by a factor 2 ** 15, but the shape is different as (for mono audio) librosa returns a flattened array while pydub_to_np returns an array of shape (n_samples, 1).

Raise error if the numpy array or audio file passed to FerretAudio has more than one channel (we only support mono audio!).

Action: this is inferred looking at the shape of the array. 1-dimensional array --> good, 2-dimensional array --> good only if the trailing dimension is 1 (shape (n_samples, 1)), array of dimension > 2 --> shape not understood, raise exception.

Remove the remove_word_np function if it ends up not being used.

- if word timestamps are not provided they are generated on the fly - each word timestamps expects word transcripts - word timestamps are not external to the FerretAudio class - add a new notebook to show this behavior

- updated the new notebook - adapted the paraling explainer - [WIP] code crashed if no ffmpeg is found on the machine

- final edits to methods to update - update the notebook name - WIP need to check that everything returns expected results - WIP need to check that the notebook with local loading works

g8a9 · 2024-03-19T22:58:33Z

Regarding the normalization: we should not touch the input array unless 1) it's not normalized and 2) we are transcribing it with whisperX -- that's the only case where we really need normalization + 16kHz sampling rate.

…o using the numpy array when the pydub AS is not needed

…iners), commit re-evaluated notebooks

[WIP] initial implementation to support audio processing as arrays

e5dc297

gaiageagea assigned emanuele-moscato and unassigned emanuele-moscato Mar 15, 2024

proposed changes to integrate numpy arrays

ed15ea1

gaiageagea force-pushed the feat/support-speech-from-array branch from c0c631e to ed15ea1 Compare March 15, 2024 08:04

emanuele-moscato reviewed Mar 15, 2024

View reviewed changes

ferret/speechxai_utils.py Outdated Show resolved Hide resolved

Attach console handler to logger, fix bugs, move audio transcription …

2ff3f2a

…to the speechxai_utils.py module

implementing comments, using FerretAudio homogeneously across explain…

721beb2

…ers, changes to ExplanationSpeech Class

gaiageagea force-pushed the feat/support-speech-from-array branch from 5fd52b2 to 721beb2 Compare March 15, 2024 18:36

emanuele-moscato reviewed Mar 18, 2024

View reviewed changes

Adapt faithfulness measures for speech to use FerretAudio objects, fi…

d550900

…x notebook

emanuele-moscato and others added 4 commits March 19, 2024 12:55

Fix conversion of array to pydub type

3eea1bd

- Add a new "transcribe" method

87e9076

- if word timestamps are not provided they are generated on the fly - each word timestamps expects word transcripts - word timestamps are not external to the FerretAudio class - add a new notebook to show this behavior

- added some dependencies

bd9b5fa

- updated the new notebook - adapted the paraling explainer - [WIP] code crashed if no ffmpeg is found on the machine

Everything works with no coding errors on the new notebooks.

fa78095

- final edits to methods to update - update the notebook name - WIP need to check that everything returns expected results - WIP need to check that the notebook with local loading works

g8a9 and others added 9 commits March 19, 2024 23:04

update readme

cb92e14

update readme

aab3360

update readme

6660ac5

removing remove_word_np and its occurrences

8853e00

changing equal width explainers to calculate the duration of the audi…

0c50e85

…o using the numpy array when the pydub AS is not needed

Make sure to use normalized arrays in the evaluators for speech

4e157ef

Fix notebook

14d9a33

Update notebooks

ebd9930

Iron out minor details (use normalized arrays in all the speech expla…

4ff1c2b

…iners), commit re-evaluated notebooks

emanuele-moscato merged commit 4d46242 into dev Mar 27, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] initial implementation to support audio processing as arrays #40

[WIP] initial implementation to support audio processing as arrays #40

g8a9 commented Mar 6, 2024 •

edited

Loading

g8a9 commented Mar 7, 2024

emanuele-moscato commented Mar 7, 2024 •

edited

Loading

emanuele-moscato commented Mar 15, 2024 •

edited

Loading

emanuele-moscato commented Mar 18, 2024 •

edited

Loading

g8a9 commented Mar 19, 2024

[WIP] initial implementation to support audio processing as arrays #40

[WIP] initial implementation to support audio processing as arrays #40

Conversation

g8a9 commented Mar 6, 2024 • edited Loading

g8a9 commented Mar 7, 2024

emanuele-moscato commented Mar 7, 2024 • edited Loading

emanuele-moscato commented Mar 15, 2024 • edited Loading

emanuele-moscato commented Mar 18, 2024 • edited Loading

g8a9 commented Mar 19, 2024

g8a9 commented Mar 6, 2024 •

edited

Loading

emanuele-moscato commented Mar 7, 2024 •

edited

Loading

emanuele-moscato commented Mar 15, 2024 •

edited

Loading

emanuele-moscato commented Mar 18, 2024 •

edited

Loading