Stars
Official implementation of "Separate Anything You Describe"
本项目使用了EcapaTdnn、ResNetSE、ERes2Net、CAM++等多种先进的声纹识别模型,同时本项目也支持了MelSpectrogram、Spectrogram、MFCC、Fbank等多种数据预处理方法
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Extension of ChatTTS, 3x Faster on Windows, Support Voice Cloning and Mobile Deployment
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
an extremely simple tool for separating vocals and background music, completely localized for web operation, using 2stems/4stems/5stems models 这是一个极简的人声和背景音乐分离工具,本地化网页操作,无需连接外网
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Official Implementation of Rectified Flow (ICLR2023 Spotlight)
An Open-source Streaming High-fidelity Neural Audio Codec
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
Offline Speaker Diarization with SenseVoice by Sherpa ONNX.
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
Indonesian speech/phoneme recognizer powered by Kaldi 2.0 (lhotse, icefall, sherpa).
Synchronized Translation for Videos. Video dubbing
Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation implemented by Pytorch
🍒 Cherry Studio is a desktop client that supports for multiple LLM providers
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models (2024 ICASSP)
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
通过LLM进行进行字幕断句分割,处理和优化字幕文件,将自动语音识别(ASR)数据的分段合并与拆分,