Transcription of Hindi audio using Whisper OpenAI and Whisper-medium model fine-tuned for Hindi
This repository contains the implementation of the Hindi Whisper Automatic Speech Recognition (ASR) model using OpenAI's Whisper. It includes the transcription of audio files and evaluation of performance using Word Error Rate (WER). The dataset used is the Kathbath dataset.
The project involves:
-
Transcription: Using the Whisper ASR model to transcribe Hindi audio files.
-
Evaluation: Calculating the Word Error Rate (WER) to assess the accuracy of the transcriptions.
-
Analysis: Analyzing errors such as substitutions, deletions, and insertions in the transcriptions.
-
Download and Prepare Data: The dataset is automatically downloaded. The dataset used is Kathbath dataset.
-
Transcription: The ASR model transcribes audio files from the Kathbath dataset.
-
Evaluation: The script calculates the WER and analyzes substitutions, deletions, and insertions.
We observe that as expected, whisper medium fine-tuned on hindi provides much better results than custom whisper openAI.