Skip to content

Commit

Permalink
update ckpts
Browse files Browse the repository at this point in the history
  • Loading branch information
yzGuu830 committed Jun 21, 2024
1 parent ffe6dde commit 5e72a75
Showing 1 changed file with 15 additions and 3 deletions.
18 changes: 15 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,24 @@
# ESC: High-Fidelity Speech Coding with Efficient Cross-Scale Vector Quantized Transformers

[arXiv] This is the code repository for the ESC presented in the [ESC: High-Fidelity Speech Coding with Efficient Cross-Scale Vector Quantized Transformers](https://arxiv.org/abs/2404.19441) paper.
[arXiv] This is the code repository for the ESC codec presented in the [ESC: High-Fidelity Speech Coding with Efficient Cross-Scale Vector Quantized Transformers](https://arxiv.org/abs/2404.19441) paper.
- Our neural speech codec, within only 30MB, can compress 16kHz speech to 1.5, 3, 4.5, 6, 7.5 and 9kbps efficiently while maintaining comparative reconstruction quality to Descript's audio codec.
- We provide [Model Checkpoints](https://drive.google.com/drive/folders/1wXYWwTJECYUt8VpN6ZO3CdXO36JTwvlE?usp=sharing) and a [Demo Page](https://western-spatula-93a.notion.site/Efficient-Speech-Codec-0e513f33cf104f799e16bcad015b03ef?pvs=4)
- We provide [Model Checkpoints](#model-checkpoints) for different ESC variants and DAC models, along with a [Demo Page](https://western-spatula-93a.notion.site/Efficient-Speech-Codec-0e513f33cf104f799e16bcad015b03ef?pvs=4) for multilingual speech audios.

![An illustration of ESC Architecture](assets/architecture.png)
## Usage

### Model Checkpoints

| Codec | Checkpoint | #Param. |
|--------|-------------------------------------------------|----------|
| ESC-Base(non-adv) | [Download](https://drive.google.com/file/d/1UuGUxzTwHtio4xcHyRSiCIYOsG6botD9/view?usp=sharing) | 8.39M |
| ESC-Base(adv) | [Download](https://drive.google.com/file/d/1Un4jCopf6EOKQug6A6kb4P33ZpUBJSNL/view?usp=sharing) | 8.39M |
| ESC-Large(non-adv) | [Download](https://drive.google.com/file/d/12BUPT6zcolAE6gW1AwrvpNtykwoksbxh/view?usp=sharing) | 15.58M |
| DAC-Base(adv) | [Download](https://drive.google.com/file/d/17GmTHYa_V6s-OBMmZfvP6Zm08LsDVUo-/view?usp=sharing) | 74.31M |
| DAC-Tiny(adv) | [Download](https://drive.google.com/file/d/13THnYCuboOBM9ULIQYR6TppMGf2ASUGH/view?usp=sharing) | 8.17M |
| DAC-Tiny(non-adv) | [Download](https://drive.google.com/file/d/1-GzuTph9FxCeRg0a0R-IaQ4Py27QXXBX/view?usp=sharing) | 8.17M |


### Install Dev Dependencies
```bash
pip install -r requirements.txt
Expand Down Expand Up @@ -53,4 +65,4 @@ This will run codec evaluation at all bandwidth on a test set folder. We provide
## Results

![Performance Evaluation](assets/results.png)
We provide a performance comparison with Descript's audio codec (DAC) at different scales of model sizes.
We provide a performance comparison with Descript's audio codec (DAC) at different scales of model sizes (w and w/o adversarial trainings).

0 comments on commit 5e72a75

Please sign in to comment.