Stars
A timeline of the latest AI models for audio generation, starting in 2023!
Community list of startups working with AI in audio and music technology
fishaudio / OpenUtau
Forked from xunmengshe/OpenUtauOpenUTAU renderer for diffsinger / 适用于diffsinger的OpenUTAU渲染器,使用方法:https://github.com/xunmengshe/OpenUtau/wiki/%E4%BD%BF%E7%94%A8%E6%96%B9%E6%B3%95%EF%BC%88%E4%B8%AD%E6%96%87%EF%BC%89
泠鸢yousa的Diffsinger模型v1版
Open-source file format designed for high-quality, customizable singing synthesis.
a guide to grapheme-to-phoneme conversion and phoneme list for ace singing voice synthesis engine
High-Resolution Image Synthesis with Latent Diffusion Models
A latent text-to-image diffusion model
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer
openvpi / DiffSinger
Forked from MoonInTheRiver/DiffSingerAn advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
A minimum inference engine for DiffSinger
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
Unofficial implementation of NaturalSpeech2 for Voice Conversion and Text to Speech
PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor
A pretrained model for "A Phoneme-informed Neural Network Model for Note-level Singing Transcription", ICASSP 2023
Singing Voice Conversion Challenge 2023 Starter Kit: FastSVC Reimplementation
singing voice change based on whisper, and lora for singing voice clone
PyTorch Implementation of NCSOFT's FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis
AudioLDM: Generate speech, sound effects, music and beyond, with text.
Official implementation of "Avocodo: Generative Adversarial Network for Artifact-Free Vocoder" (AAAI2023)
Official implementation of SawSing (ISMIR'22)
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs