Decoding error in DAC when using HuggingFace models #5944

ashi-ta · 2024-11-05T08:31:27Z

Describe the bug
I am encountering a decoding error while using the DAC model in conjunction with HuggingFace models.
This issue seems to arise from discrepancies between

the dac.py configuration (i.e., https://github.com/espnet/espnet/blob/master/espnet2/gan_codec/dac/dac.py#L645) and
the config.yaml uploaded alongside the model (e.g., https://huggingface.co/espnet/libritts_dac_16k/blob/main/exp/codec_train_dac_16k_raw_fs16000/config.yaml#L184).

Therefore, this error occurred when using other HF models such as espnet/amuse_dac_16k (https://huggingface.co/espnet/amuse_dac_16k/blob/main/exp_16k/codec_train_dac_fs16000_raw_fs16000/config.yaml#L184).

Basic environments:

OS information: Linux 5.15.0-124-generic 134~20.04.1-Ubuntu SMP Tue Oct 1 15:27:33 UTC 2024 x86_64
python version: 3.10.15 (main, Oct 3 2024, 07:27:34) [GCC 11.2.0]
espnet version: espnet 202409
pytorch version: pytorch 2.3.0
Git hash: 4c55d6c9071fb36addcc8426f2befd8f9a1bd11e
- Commit date: Fri Nov 1 23:20:28 2024 +0200

Environments from torch.utils.collect_env:

Collecting environment information...
PyTorch version: 2.3.0
Is debug build: False
CUDA used to build PyTorch: 12.1

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
CMake version: version 3.16.3

Python version: 3.10.15 (main, Oct  3 2024, 07:27:34) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-124-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 12.6.77

Nvidia driver version: 560.35.03
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] pytorch-ranger==0.1.1
[pip3] torch==2.3.0
[pip3] torch-complex==0.4.4
[pip3] torch-optimizer==0.3.0
[pip3] torchaudio==2.3.0
[pip3] triton==2.3.0
[conda] blas                      1.0                         mkl
[conda] mkl                       2023.1.0         h213fc3f_46344
[conda] mkl-fft                   1.3.10                   pypi_0    pypi
[conda] mkl-random                1.2.7                    pypi_0    pypi
[conda] mkl-service               2.4.0                    pypi_0    pypi
[conda] mkl_fft                   1.3.10          py310h5eee18b_0
[conda] mkl_random                1.2.7           py310h1128e8f_0
[conda] numpy                     1.23.5                   pypi_0    pypi
[conda] numpy-base                1.26.4          py310hb5e798b_0
[conda] pytorch                   2.3.0           py3.10_cuda12.1_cudnn8.9.2_0    pytorch
[conda] pytorch-cuda              12.1                 ha16c6d3_6    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] pytorch-ranger            0.1.1                    pypi_0    pypi
[conda] torch                     2.3.0                    pypi_0    pypi
[conda] torch-complex             0.4.4                    pypi_0    pypi
[conda] torch-optimizer           0.3.0                    pypi_0    pypi
[conda] torchaudio                2.3.0                    pypi_0    pypi
[conda] torchtriton               2.3.0                     py310    pytorch
[conda] triton                    2.3.0                    pypi_0    pypi

Task information:

Task: codec1
Recipe: libritts
ESPnet2

To Reproduce
Steps to reproduce the behavior:

move to a recipe directory cd egs2/libritts/codec1
execute run.sh with stage==6 and download_model=="espnet/libritts_dac_16k"
e.g.,:

stage=6
stop_stage=6

download_model="espnet/libritts_dac_16k"

./codec.sh \
    --local_data_opts "--trim_all_silence false" \
    --fs ${fs} \
    --inference_config "${inference_config}" \
    --scoring_config "${score_config}" \
    --test_sets "${test_sets}" \
    --stage ${stage} \
    --stop_stage ${stop_stage} \
    --download_model ${download_model} ${opts} "$@"

Error logs

Traceback (most recent call last):
  File "/home/ashihara/miniconda3/envs/espnet_codec/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashihara/miniconda3/envs/espnet_codec/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ashihara/github/espnet_202410_codec/espnet2/bin/gan_codec_inference.py", line 403, in <module>
    main()
  File "/home/ashihara/github/espnet_202410_codec/espnet2/bin/gan_codec_inference.py", line 399, in main
    inference(**kwargs)
  File "/home/ashihara/github/espnet_202410_codec/espnet2/bin/gan_codec_inference.py", line 216, in inference
    audio_coding = AudioCoding.from_pretrained(
  File "/home/ashihara/github/espnet_202410_codec/espnet2/bin/gan_codec_inference.py", line 166, in from_pretrained
    return AudioCoding(**kwargs)
  File "/home/ashihara/github/espnet_202410_codec/espnet2/bin/gan_codec_inference.py", line 51, in __init__
    model, train_args = GANCodecTask.build_model_from_file(
  File "/home/ashihara/github/espnet_202410_codec/espnet2/tasks/abs_task.py", line 2301, in build_model_from_file
    model = cls.build_model(args)
  File "/home/ashihara/github/espnet_202410_codec/espnet2/tasks/gan_codec.py", line 144, in build_model
    codec = codec_class(**args.codec_conf)
  File "/home/ashihara/github/espnet_202410_codec/espnet2/gan_codec/dac/dac.py", line 152, in __init__
    self.discriminator = DACDiscriminator(**discriminator_params)
TypeError: DACDiscriminator.__init__() got an unexpected keyword argument 'scale_discriminator_params'

The text was updated successfully, but these errors were encountered:

ashi-ta · 2024-11-05T13:09:15Z

Additionally, I just found another bug (although it's very minor, so I'm commenting here instead of opening a new issue).
When executing stage==7, the following error occurs:

ModuleNotFoundError: No module named 'versa.bin.espnet_scorer'

This error seems to be caused by calling the wrong script name in the versa package.
https://github.com/espnet/espnet/blob/master/egs2/TEMPLATE/codec1/codec.sh#L574

Thanks!

ftshijt · 2024-11-05T15:26:18Z

Thanks for mentioning it! Yeah, I'm recently changing some of the interface for versa and plan to update the usage here after I have a converged one at versa. Will have a PR soon to fix.

ftshijt · 2024-11-05T15:38:13Z

Describe the bug I am encountering a decoding error while using the DAC model in conjunction with HuggingFace models. This issue seems to arise from discrepancies between

the dac.py configuration (i.e., https://github.com/espnet/espnet/blob/master/espnet2/gan_codec/dac/dac.py#L645) and
the config.yaml uploaded alongside the model (e.g., https://huggingface.co/espnet/libritts_dac_16k/blob/main/exp/codec_train_dac_16k_raw_fs16000/config.yaml#L184).

Therefore, this error occurred when using other HF models such as espnet/amuse_dac_16k (https://huggingface.co/espnet/amuse_dac_16k/blob/main/exp_16k/codec_train_dac_fs16000_raw_fs16000/config.yaml#L184).

Basic environments:

OS information: Linux 5.15.0-124-generic 134~20.04.1-Ubuntu SMP Tue Oct 1 15:27:33 UTC 2024 x86_64
python version: 3.10.15 (main, Oct 3 2024, 07:27:34) [GCC 11.2.0]
espnet version: espnet 202409
pytorch version: pytorch 2.3.0
Git hash: 4c55d6c9071fb36addcc8426f2befd8f9a1bd11e
- Commit date: Fri Nov 1 23:20:28 2024 +0200

Environments from torch.utils.collect_env:

Collecting environment information...
PyTorch version: 2.3.0
Is debug build: False
CUDA used to build PyTorch: 12.1

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
CMake version: version 3.16.3

Python version: 3.10.15 (main, Oct  3 2024, 07:27:34) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-124-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 12.6.77

Nvidia driver version: 560.35.03
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] pytorch-ranger==0.1.1
[pip3] torch==2.3.0
[pip3] torch-complex==0.4.4
[pip3] torch-optimizer==0.3.0
[pip3] torchaudio==2.3.0
[pip3] triton==2.3.0
[conda] blas                      1.0                         mkl
[conda] mkl                       2023.1.0         h213fc3f_46344
[conda] mkl-fft                   1.3.10                   pypi_0    pypi
[conda] mkl-random                1.2.7                    pypi_0    pypi
[conda] mkl-service               2.4.0                    pypi_0    pypi
[conda] mkl_fft                   1.3.10          py310h5eee18b_0
[conda] mkl_random                1.2.7           py310h1128e8f_0
[conda] numpy                     1.23.5                   pypi_0    pypi
[conda] numpy-base                1.26.4          py310hb5e798b_0
[conda] pytorch                   2.3.0           py3.10_cuda12.1_cudnn8.9.2_0    pytorch
[conda] pytorch-cuda              12.1                 ha16c6d3_6    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] pytorch-ranger            0.1.1                    pypi_0    pypi
[conda] torch                     2.3.0                    pypi_0    pypi
[conda] torch-complex             0.4.4                    pypi_0    pypi
[conda] torch-optimizer           0.3.0                    pypi_0    pypi
[conda] torchaudio                2.3.0                    pypi_0    pypi
[conda] torchtriton               2.3.0                     py310    pytorch
[conda] triton                    2.3.0                    pypi_0    pypi

Task information:

Task: codec1
Recipe: libritts
ESPnet2

To Reproduce Steps to reproduce the behavior:

move to a recipe directory cd egs2/libritts/codec1
execute run.sh with stage==6 and download_model=="espnet/libritts_dac_16k"
e.g.,:

stage=6
stop_stage=6

download_model="espnet/libritts_dac_16k"

./codec.sh \
    --local_data_opts "--trim_all_silence false" \
    --fs ${fs} \
    --inference_config "${inference_config}" \
    --scoring_config "${score_config}" \
    --test_sets "${test_sets}" \
    --stage ${stage} \
    --stop_stage ${stop_stage} \
    --download_model ${download_model} ${opts} "$@"

Error logs

Traceback (most recent call last):
  File "/home/ashihara/miniconda3/envs/espnet_codec/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashihara/miniconda3/envs/espnet_codec/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ashihara/github/espnet_202410_codec/espnet2/bin/gan_codec_inference.py", line 403, in <module>
    main()
  File "/home/ashihara/github/espnet_202410_codec/espnet2/bin/gan_codec_inference.py", line 399, in main
    inference(**kwargs)
  File "/home/ashihara/github/espnet_202410_codec/espnet2/bin/gan_codec_inference.py", line 216, in inference
    audio_coding = AudioCoding.from_pretrained(
  File "/home/ashihara/github/espnet_202410_codec/espnet2/bin/gan_codec_inference.py", line 166, in from_pretrained
    return AudioCoding(**kwargs)
  File "/home/ashihara/github/espnet_202410_codec/espnet2/bin/gan_codec_inference.py", line 51, in __init__
    model, train_args = GANCodecTask.build_model_from_file(
  File "/home/ashihara/github/espnet_202410_codec/espnet2/tasks/abs_task.py", line 2301, in build_model_from_file
    model = cls.build_model(args)
  File "/home/ashihara/github/espnet_202410_codec/espnet2/tasks/gan_codec.py", line 144, in build_model
    codec = codec_class(**args.codec_conf)
  File "/home/ashihara/github/espnet_202410_codec/espnet2/gan_codec/dac/dac.py", line 152, in __init__
    self.discriminator = DACDiscriminator(**discriminator_params)
TypeError: DACDiscriminator.__init__() got an unexpected keyword argument 'scale_discriminator_params'

There were some unused hyper-parameters set in the previous version of code where the model is pre-trained on. Thanks for pointing out this issue, will working on fix it.

ashi-ta · 2024-11-07T16:28:19Z

Thank you for your answer! (And apologies for the delayed reply; I've tried the one-pass operation.)
You may have already started making these adjustments, but I am sharing the modifications just in case.
To successfully run the one-pass operation, I applied the following changes:

Removed certain configurations (e.g., scale_discriminator_params) from config.yaml
Changed the script name from versa.bin.espnet_scorer to versa.bin.scorer
Enabled FLAC format support in the kaldiio package (for fs=16k only) by running:
- pip install --upgrade --no-deps --force-reinstall git+https://github.com/nttcslab-sp/kaldiio.git

Anyway, thank you for your work; it's been extremely helpful!

ashi-ta added the Bug bug should be fixed label Nov 5, 2024

sw005320 assigned ftshijt Nov 5, 2024

sw005320 added the Codec label Nov 5, 2024

ashi-ta closed this as completed Nov 5, 2024

ashi-ta reopened this Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoding error in DAC when using HuggingFace models #5944

Decoding error in DAC when using HuggingFace models #5944

ashi-ta commented Nov 5, 2024 •

edited

Loading

ashi-ta commented Nov 5, 2024

ftshijt commented Nov 5, 2024

ftshijt commented Nov 5, 2024

ashi-ta commented Nov 7, 2024

Decoding error in DAC when using HuggingFace models #5944

Decoding error in DAC when using HuggingFace models #5944

Comments

ashi-ta commented Nov 5, 2024 • edited Loading

ashi-ta commented Nov 5, 2024

ftshijt commented Nov 5, 2024

ftshijt commented Nov 5, 2024

ashi-ta commented Nov 7, 2024

ashi-ta commented Nov 5, 2024 •

edited

Loading