Stars
Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Versatile audio super resolution (any -> 48kHz) with AudioSR.
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
Analyzing and Improving the Training Dynamics of Diffusion Models (EDM2)
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
Lightweight Armoury Crate alternative for Asus laptops and ROG Ally. Control tool for ROG Zephyrus G14, G15, G16, M16, Flow X13, Flow X16, TUF, Strix, Scar and other models
Boost LaTeX typesetting efficiency with preview, compile, autocomplete, colorize, and more.
Official PyTorch implementation of BigVGAN (ICLR 2023)
Lumina-T2X is a unified framework for Text to Any Modality Generation
A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..
Code and data for "Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue" (ACL 2024)
A generative speech model for daily dialogue.
Benchmark popular audio i/o packages
Karras et al. (2022) diffusion models for PyTorch
Manage scalable open LLM inference endpoints in Slurm clusters
Repository for training models for music source separation.
Official Code for Stable Cascade
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
A modern Python package and dependency manager supporting the latest PEP standards
A paper and project list about the cutting edge Speech Synthesis, Text-to-Speech (TTS), Singing Voice Synthesis (SVS), Voice Conversion (VC), Singing Voice Conversion (SVC), and related interesting…
An easy to understand TTS / SVS / SVC framework