Skip to content

ASR Pipeline for vad, chunking and transcription of Indian languages

Notifications You must be signed in to change notification settings

wannasleepforlong/ASR-Pipeline-Abhinav

Repository files navigation

ASR-Pipeline-Abhinav

ASR Pipeline for vad, chunking and transcription of Indian languages.

This pipeline processes audio files through a series of stages:

  1. Voice Activity Detection (VAD): Removes silence and detects regions of speech and breaks audio by them.
  2. Audio Chunking: Splits audio into smaller chunks.
  3. Transcription with Force Alignment: Transcribes audio and aligns words with timestamps to a json.
  4. Speaker Diarization: Identifies unique speakers in the audio using embeddings and cosine similarity threshold.

Run the script asr_pipeline.py and enter the stage number when prompted to process audio files step by step. Enter 0 to exit.

About

ASR Pipeline for vad, chunking and transcription of Indian languages

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages