Visual Voice Activity Detection (VVAD) integration including live camera and prerecorded demo #330

cedric-cfk · 2024-01-16T23:28:43Z

This pull request includes:

an integration for the dataset VVAD-LRS3
a model presented by VVAD-LRS3 for binary 3D classification tasks
four different CNN2Plus1D models for binary 3D classification tasks
a detection pipeline for the VVAD task
two demo files using the VVAD pipeline. One using the live camera feed and the second using a prerecorded video file

Note:

There are two TODOs included in the code, regarding the downloading of model files. Those will be resolved in the next days.

# Conflicts: # examples/visual_voice_activity_detection/training/train_v2.py # paz/models/classification/__init__.py

This reverts commit 2c22fa2.

This reverts commit 19bed2e.

Fixed small bug in framerate reduction

oarriaga

I think this PR is almost there!

paz/backend/boxes.py

paz/models/classification/cnn2Plus1.py

oarriaga · 2024-04-12T04:45:05Z

paz/processors/standard.py

+        if value is None:
+            mean = None
+        else:
+            size = len(self.predictions)
+
+            self.predictions.append(value)
+            if size > self.window_size:
+                self.predictions.pop(0)
+
+            mean = 0
+            if len(self.predictions) <= 1:
+                mean = value
+            else:
+                total_weights = 0
+                for prediction_index in range(0, size):
+                    weight = (prediction_index + 1) / size
+                    mean = mean + self.predictions[prediction_index] * weight
+                    total_weights = total_weights + weight
+                mean = mean / total_weights
+        return mean
+
+


I think this can be split into smaller functions to improve readability. The 3 conditions + for loop in a call seems to complex to read and understand easily.

I have combined the AveragePredictions class with the WeightedAveragePredictions class. Additionally, I moved the computation of the weighted average (the for loop) to the backend.

For me it looks more readable now and I could save duplicated lines :)

paz/models/classification/cnn2Plus1.py

…tion_to_paz # Conflicts: # paz/backend/standard.py # paz/processors/__init__.py # paz/processors/standard.py

Cedric Carl-Franek Kränzle and others added 30 commits June 7, 2023 13:03

Tmp Update

e8024bc

Fixed typo

c2bbe7f

Added live_demo Error model not working correctly

3ca8e21

Working Data Generator and 2+1D CNN Model training ready

9121ed0

Added the original VVAD-LRS3 LSTM Model

0b0d9b2

Added all metrics and callbacks to the training

b88bb04

Fixed code carbon output file

6f5f0fe

Fixed default output path

40acaa1

Test fixed dependency versions

45d8632

Fixed Typo in Test fixed dependency versions

7f82692

Fixed default path to dataset

e654559

Fixed argument path to dataset

3251397

Fixed output path for tensorboard

f5afa04

Revert fixed dependency versions

7d2a81d

Added datetime to the output directory

a00c8b3

Removed CodeCarbon for testing reasons

03dab61

Fixed Output: creating output path before editing in it

507e101

Fixed Output: iterative creation of directories

24c01a9

Fixed Output: iterative creation of directories with try and catches

25736e8

Revert testing changes to code carbon

c5c0be1

Reordered file structure

250bd46

Deleting cnn2PlUS1 form this branch

2c22fa2

Deleting VVAD_LRS3_LSTM form this branch

19bed2e

Deleting additional lines of cnn3PLUS1D form this branch

9bc47ad

Deleted VVAD_LRS3_LSTM from this Branch

66b43b7

Merge branch '2+1D_CNN' into Training

590973c

# Conflicts: # examples/visual_voice_activity_detection/training/train_v2.py # paz/models/classification/__init__.py

Revert "Deleting cnn2PlUS1 form this branch"

017f227

This reverts commit 2c22fa2.

Revert "Deleting VVAD_LRS3_LSTM form this branch"

7ecac39

This reverts commit 19bed2e.

Fixed line spacing

fd278f8

Added VVAD_LRS3 model testing scripts

36998ac

cedric-cfk and others added 23 commits January 28, 2024 18:14

Removed Generator from the structure

388bdfa

Refactored some code segments

8f0ff3d

Refactored processor image according to the reviews

a323de1

Refactored processor image according to the reviews

97f5bd4

Renamed PredictBatch to PredictNoneable

fccb504

Removed unnecessary for loop

9372a70

Renamed play_rate into stride

3a103e0

Refactored input and output paths

3786ce0

Refactored variable names

ef800e8

Added argpass for file input path

7b7467f

Added argpass for file output path

768f016

Added weights download to vvad_lrs_lstm model

4442e41

Updated model weight

ec1645c

Update vvad_lrs3_dataset.py

c42f9d2

Fixed small bug in framerate reduction

Changed Pipeline order as Nones should be included into the averaging

4747797

Remove typing dependency

f454304

Renamed predict_noneable into predict_with_nones

c8a622e

Added returen to add_class_and_score

c43e808

Replaced double quotation marks with single quotation marks

b58ff03

Replaced variable names in for loops

3e3d1e3

Replaced +=

58879f8

Replaced result with other name

049971a

Merge branch 'master' into integration_to_paz

eaf15a8

oarriaga reviewed Apr 12, 2024

View reviewed changes

Cedric Carl-Franek Kränzle added 4 commits April 12, 2024 19:21

Replaced res with residual AND deleted unnecessary comments

ad5cdb3

Fixed the Copy Domain Bug

f2ea52a

Combined the Average with Weighted Average class

16999d4

Merge remote-tracking branch 'origin/integration_to_paz' into integra…

2935add

…tion_to_paz # Conflicts: # paz/backend/standard.py # paz/processors/__init__.py # paz/processors/standard.py

oarriaga approved these changes Apr 15, 2024

View reviewed changes

oarriaga merged commit 8bfa3b2 into oarriaga:master Apr 15, 2024
0 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Visual Voice Activity Detection (VVAD) integration including live camera and prerecorded demo #330

Visual Voice Activity Detection (VVAD) integration including live camera and prerecorded demo #330

cedric-cfk commented Jan 16, 2024 •

edited

Loading

oarriaga left a comment

oarriaga Apr 12, 2024

cedric-cfk Apr 12, 2024

Visual Voice Activity Detection (VVAD) integration including live camera and prerecorded demo #330

Visual Voice Activity Detection (VVAD) integration including live camera and prerecorded demo #330

Conversation

cedric-cfk commented Jan 16, 2024 • edited Loading

This pull request includes:

Note:

oarriaga left a comment

Choose a reason for hiding this comment

oarriaga Apr 12, 2024

Choose a reason for hiding this comment

cedric-cfk Apr 12, 2024

Choose a reason for hiding this comment

cedric-cfk commented Jan 16, 2024 •

edited

Loading