an extremely simple tool for separating vocals and background music, completely localized for web operation, using 2stems/4stems/5stems models 这是一个极简的人声和背景音乐分离工具，本地化网页操作，无需连接外网

Python 1,314 151 Updated Jul 29, 2024

ddlBoJack / emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 619 43 Updated Oct 27, 2024

gnobitab / RectifiedFlow

Official Implementation of Rectified Flow (ICLR2023 Spotlight)

Python 908 53 Updated Jul 20, 2024

facebookresearch / AudioDec

An Open-source Streaming High-fidelity Neural Audio Codec

Python 430 20 Updated Oct 28, 2024

langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…

TypeScript 50,338 7,214 Updated Nov 1, 2024

ming024 / FastSpeech2

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Python 1,812 532 Updated Oct 27, 2023

pengzhendong / speaker-diarization

Offline Speaker Diarization with SenseVoice by Sherpa ONNX.

Python 7 Updated Oct 29, 2024

gpt-omni / mini-omni2

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。

Python 1,321 157 Updated Oct 18, 2024

anliyuan / Ultralight-Digital-Human

一个超轻量级、可以在移动端实时运行的数字人模型

Python 711 116 Updated Oct 14, 2024

bookbot-hive / k2-indonesian-asr

Indonesian speech/phoneme recognizer powered by Kaldi 2.0 (lhotse, icefall, sherpa).

Python 7 3 Updated Jun 30, 2023

R3gm / SoniTranslate

Synchronized Translation for Videos. Video dubbing

Python 803 150 Updated Oct 23, 2024

yangjianxin1 / Firefly

Firefly: 大模型训练工具，支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型

Python 5,812 525 Updated Oct 24, 2024

THUDM / GLM-4-Voice

GLM-4-Voice | 端到端中英语音对话模型

Python 1,997 152 Updated Oct 31, 2024

wq2012 / awesome-diarization

A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

1,607 226 Updated Oct 16, 2024

WenmuZhou / PytorchOCR

基于Pytorch的OCR工具库，支持常用的文字检测和识别算法

Python 1,381 305 Updated Sep 2, 2024

JusperLee / Dual-Path-RNN-Pytorch

Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation implemented by Pytorch

Python 414 66 Updated Feb 14, 2023

kangfenmao / cherry-studio

🍒 Cherry Studio is a desktop client that supports for multiple LLM providers

TypeScript 1,183 65 Updated Nov 1, 2024

jishengpeng / TextrolSpeech

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models (2024 ICASSP)

Python 132 4 Updated Aug 29, 2024

JishengBai / AudioSetCaps

A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline

Python 44 2 Updated Oct 23, 2024

WEIFENG2333 / SubtitleSpliter

通过LLM进行进行字幕断句分割，处理和优化字幕文件，将自动语音识别（ASR）数据的分段合并与拆分，

Python 4 1 Updated Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lishaojie manbaaaa

Achievements

Achievements

Highlights

Block or report manbaaaa

Stars

JusperLee / SonicSim

Audio-AGI / AudioSep

yeyupiaoling / VoiceprintRecognition-PaddlePaddle

opendilab / CleanS2S

espressif / esp-sr

linto-ai / whisper-timestamped

warmshao / ChatTTSPlus

MahmoudAshraf97 / whisper-diarization

lifeiteng / NotebookTTS

cyz7758520 / Android_audio_talkback_demo_program

jianchang512 / vocal-separate