Web application for accurate speech-to-text conversion, powered by OpenAI Whisper and Pyannote.
- Speech-to-text with timestamps
- Speaker identification
Performance
Currently, the model can process 2 hours of audio in 12 minutes on an RTX 3060 graphics card.
TODO