Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoding error in DAC when using HuggingFace models #5944

Open
ashi-ta opened this issue Nov 5, 2024 · 4 comments
Open

Decoding error in DAC when using HuggingFace models #5944

ashi-ta opened this issue Nov 5, 2024 · 4 comments
Assignees
Labels
Bug bug should be fixed Codec

Comments

@ashi-ta
Copy link

ashi-ta commented Nov 5, 2024

Describe the bug
I am encountering a decoding error while using the DAC model in conjunction with HuggingFace models.
This issue seems to arise from discrepancies between

Therefore, this error occurred when using other HF models such as espnet/amuse_dac_16k (https://huggingface.co/espnet/amuse_dac_16k/blob/main/exp_16k/codec_train_dac_fs16000_raw_fs16000/config.yaml#L184).

Basic environments:

  • OS information: Linux 5.15.0-124-generic 134~20.04.1-Ubuntu SMP Tue Oct 1 15:27:33 UTC 2024 x86_64
  • python version: 3.10.15 (main, Oct 3 2024, 07:27:34) [GCC 11.2.0]
  • espnet version: espnet 202409
  • pytorch version: pytorch 2.3.0
  • Git hash: 4c55d6c9071fb36addcc8426f2befd8f9a1bd11e
    • Commit date: Fri Nov 1 23:20:28 2024 +0200

Environments from torch.utils.collect_env:

Collecting environment information...
PyTorch version: 2.3.0
Is debug build: False
CUDA used to build PyTorch: 12.1

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
CMake version: version 3.16.3

Python version: 3.10.15 (main, Oct  3 2024, 07:27:34) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-124-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 12.6.77

Nvidia driver version: 560.35.03
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] pytorch-ranger==0.1.1
[pip3] torch==2.3.0
[pip3] torch-complex==0.4.4
[pip3] torch-optimizer==0.3.0
[pip3] torchaudio==2.3.0
[pip3] triton==2.3.0
[conda] blas                      1.0                         mkl
[conda] mkl                       2023.1.0         h213fc3f_46344
[conda] mkl-fft                   1.3.10                   pypi_0    pypi
[conda] mkl-random                1.2.7                    pypi_0    pypi
[conda] mkl-service               2.4.0                    pypi_0    pypi
[conda] mkl_fft                   1.3.10          py310h5eee18b_0
[conda] mkl_random                1.2.7           py310h1128e8f_0
[conda] numpy                     1.23.5                   pypi_0    pypi
[conda] numpy-base                1.26.4          py310hb5e798b_0
[conda] pytorch                   2.3.0           py3.10_cuda12.1_cudnn8.9.2_0    pytorch
[conda] pytorch-cuda              12.1                 ha16c6d3_6    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] pytorch-ranger            0.1.1                    pypi_0    pypi
[conda] torch                     2.3.0                    pypi_0    pypi
[conda] torch-complex             0.4.4                    pypi_0    pypi
[conda] torch-optimizer           0.3.0                    pypi_0    pypi
[conda] torchaudio                2.3.0                    pypi_0    pypi
[conda] torchtriton               2.3.0                     py310    pytorch
[conda] triton                    2.3.0                    pypi_0    pypi

Task information:

  • Task: codec1
  • Recipe: libritts
  • ESPnet2

To Reproduce
Steps to reproduce the behavior:

  1. move to a recipe directory cd egs2/libritts/codec1
  2. execute run.sh with stage==6 and download_model=="espnet/libritts_dac_16k"
    e.g.,:
stage=6
stop_stage=6

download_model="espnet/libritts_dac_16k"

./codec.sh \
    --local_data_opts "--trim_all_silence false" \
    --fs ${fs} \
    --inference_config "${inference_config}" \
    --scoring_config "${score_config}" \
    --test_sets "${test_sets}" \
    --stage ${stage} \
    --stop_stage ${stop_stage} \
    --download_model ${download_model} ${opts} "$@"

Error logs

Traceback (most recent call last):
  File "/home/ashihara/miniconda3/envs/espnet_codec/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashihara/miniconda3/envs/espnet_codec/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ashihara/github/espnet_202410_codec/espnet2/bin/gan_codec_inference.py", line 403, in <module>
    main()
  File "/home/ashihara/github/espnet_202410_codec/espnet2/bin/gan_codec_inference.py", line 399, in main
    inference(**kwargs)
  File "/home/ashihara/github/espnet_202410_codec/espnet2/bin/gan_codec_inference.py", line 216, in inference
    audio_coding = AudioCoding.from_pretrained(
  File "/home/ashihara/github/espnet_202410_codec/espnet2/bin/gan_codec_inference.py", line 166, in from_pretrained
    return AudioCoding(**kwargs)
  File "/home/ashihara/github/espnet_202410_codec/espnet2/bin/gan_codec_inference.py", line 51, in __init__
    model, train_args = GANCodecTask.build_model_from_file(
  File "/home/ashihara/github/espnet_202410_codec/espnet2/tasks/abs_task.py", line 2301, in build_model_from_file
    model = cls.build_model(args)
  File "/home/ashihara/github/espnet_202410_codec/espnet2/tasks/gan_codec.py", line 144, in build_model
    codec = codec_class(**args.codec_conf)
  File "/home/ashihara/github/espnet_202410_codec/espnet2/gan_codec/dac/dac.py", line 152, in __init__
    self.discriminator = DACDiscriminator(**discriminator_params)
TypeError: DACDiscriminator.__init__() got an unexpected keyword argument 'scale_discriminator_params'
@ashi-ta ashi-ta added the Bug bug should be fixed label Nov 5, 2024
@sw005320 sw005320 added the Codec label Nov 5, 2024
@ashi-ta
Copy link
Author

ashi-ta commented Nov 5, 2024

Additionally, I just found another bug (although it's very minor, so I'm commenting here instead of opening a new issue).
When executing stage==7, the following error occurs:

ModuleNotFoundError: No module named 'versa.bin.espnet_scorer'

This error seems to be caused by calling the wrong script name in the versa package.
https://github.com/espnet/espnet/blob/master/egs2/TEMPLATE/codec1/codec.sh#L574

Thanks!

@ashi-ta ashi-ta closed this as completed Nov 5, 2024
@ashi-ta ashi-ta reopened this Nov 5, 2024
@ftshijt
Copy link
Collaborator

ftshijt commented Nov 5, 2024

Thanks for mentioning it! Yeah, I'm recently changing some of the interface for versa and plan to update the usage here after I have a converged one at versa. Will have a PR soon to fix.

@ftshijt
Copy link
Collaborator

ftshijt commented Nov 5, 2024

Describe the bug I am encountering a decoding error while using the DAC model in conjunction with HuggingFace models. This issue seems to arise from discrepancies between

Therefore, this error occurred when using other HF models such as espnet/amuse_dac_16k (https://huggingface.co/espnet/amuse_dac_16k/blob/main/exp_16k/codec_train_dac_fs16000_raw_fs16000/config.yaml#L184).

Basic environments:

  • OS information: Linux 5.15.0-124-generic 134~20.04.1-Ubuntu SMP Tue Oct 1 15:27:33 UTC 2024 x86_64

  • python version: 3.10.15 (main, Oct 3 2024, 07:27:34) [GCC 11.2.0]

  • espnet version: espnet 202409

  • pytorch version: pytorch 2.3.0

  • Git hash: 4c55d6c9071fb36addcc8426f2befd8f9a1bd11e

    • Commit date: Fri Nov 1 23:20:28 2024 +0200

Environments from torch.utils.collect_env:

Collecting environment information...
PyTorch version: 2.3.0
Is debug build: False
CUDA used to build PyTorch: 12.1

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
CMake version: version 3.16.3

Python version: 3.10.15 (main, Oct  3 2024, 07:27:34) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-124-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 12.6.77

Nvidia driver version: 560.35.03
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] pytorch-ranger==0.1.1
[pip3] torch==2.3.0
[pip3] torch-complex==0.4.4
[pip3] torch-optimizer==0.3.0
[pip3] torchaudio==2.3.0
[pip3] triton==2.3.0
[conda] blas                      1.0                         mkl
[conda] mkl                       2023.1.0         h213fc3f_46344
[conda] mkl-fft                   1.3.10                   pypi_0    pypi
[conda] mkl-random                1.2.7                    pypi_0    pypi
[conda] mkl-service               2.4.0                    pypi_0    pypi
[conda] mkl_fft                   1.3.10          py310h5eee18b_0
[conda] mkl_random                1.2.7           py310h1128e8f_0
[conda] numpy                     1.23.5                   pypi_0    pypi
[conda] numpy-base                1.26.4          py310hb5e798b_0
[conda] pytorch                   2.3.0           py3.10_cuda12.1_cudnn8.9.2_0    pytorch
[conda] pytorch-cuda              12.1                 ha16c6d3_6    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] pytorch-ranger            0.1.1                    pypi_0    pypi
[conda] torch                     2.3.0                    pypi_0    pypi
[conda] torch-complex             0.4.4                    pypi_0    pypi
[conda] torch-optimizer           0.3.0                    pypi_0    pypi
[conda] torchaudio                2.3.0                    pypi_0    pypi
[conda] torchtriton               2.3.0                     py310    pytorch
[conda] triton                    2.3.0                    pypi_0    pypi

Task information:

  • Task: codec1
  • Recipe: libritts
  • ESPnet2

To Reproduce Steps to reproduce the behavior:

  1. move to a recipe directory cd egs2/libritts/codec1
  2. execute run.sh with stage==6 and download_model=="espnet/libritts_dac_16k"
    e.g.,:
stage=6
stop_stage=6

download_model="espnet/libritts_dac_16k"

./codec.sh \
    --local_data_opts "--trim_all_silence false" \
    --fs ${fs} \
    --inference_config "${inference_config}" \
    --scoring_config "${score_config}" \
    --test_sets "${test_sets}" \
    --stage ${stage} \
    --stop_stage ${stop_stage} \
    --download_model ${download_model} ${opts} "$@"

Error logs

Traceback (most recent call last):
  File "/home/ashihara/miniconda3/envs/espnet_codec/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashihara/miniconda3/envs/espnet_codec/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ashihara/github/espnet_202410_codec/espnet2/bin/gan_codec_inference.py", line 403, in <module>
    main()
  File "/home/ashihara/github/espnet_202410_codec/espnet2/bin/gan_codec_inference.py", line 399, in main
    inference(**kwargs)
  File "/home/ashihara/github/espnet_202410_codec/espnet2/bin/gan_codec_inference.py", line 216, in inference
    audio_coding = AudioCoding.from_pretrained(
  File "/home/ashihara/github/espnet_202410_codec/espnet2/bin/gan_codec_inference.py", line 166, in from_pretrained
    return AudioCoding(**kwargs)
  File "/home/ashihara/github/espnet_202410_codec/espnet2/bin/gan_codec_inference.py", line 51, in __init__
    model, train_args = GANCodecTask.build_model_from_file(
  File "/home/ashihara/github/espnet_202410_codec/espnet2/tasks/abs_task.py", line 2301, in build_model_from_file
    model = cls.build_model(args)
  File "/home/ashihara/github/espnet_202410_codec/espnet2/tasks/gan_codec.py", line 144, in build_model
    codec = codec_class(**args.codec_conf)
  File "/home/ashihara/github/espnet_202410_codec/espnet2/gan_codec/dac/dac.py", line 152, in __init__
    self.discriminator = DACDiscriminator(**discriminator_params)
TypeError: DACDiscriminator.__init__() got an unexpected keyword argument 'scale_discriminator_params'

There were some unused hyper-parameters set in the previous version of code where the model is pre-trained on. Thanks for pointing out this issue, will working on fix it.

@ashi-ta
Copy link
Author

ashi-ta commented Nov 7, 2024

Thank you for your answer! (And apologies for the delayed reply; I've tried the one-pass operation.)
You may have already started making these adjustments, but I am sharing the modifications just in case.
To successfully run the one-pass operation, I applied the following changes:

  • Removed certain configurations (e.g., scale_discriminator_params) from config.yaml
  • Changed the script name from versa.bin.espnet_scorer to versa.bin.scorer
  • Enabled FLAC format support in the kaldiio package (for fs=16k only) by running:
    • pip install --upgrade --no-deps --force-reinstall git+https://github.com/nttcslab-sp/kaldiio.git

Anyway, thank you for your work; it's been extremely helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug bug should be fixed Codec
Projects
None yet
Development

No branches or pull requests

3 participants