This is a Speaker Recognition system with GUI, served as an SRT project for the course Signal Processing (2013Fall) in Tsinghua University.
For more details of this project, please see:
- Our presentation slides
- Our complete report
- SciPy
- scikit-learn
- scikits.talkbox
- bob
- pyssp
- PyQt
- PyAudio
- gcc >= 4.7
Voice Activity Detection(VAD):
Feature:
Model:
- Gaussian Mixture Model (GMM)
- Universal Background Model (UBM)
- Continuous Restricted Boltzman Machine (CRBM)
- Joint Factor Analysis (JFA)
Our GUI not only has basic functionality for recording, enrollment, training and testing, but also has a visualization of real-time speaker recognition:
See our demo video (in Chinese) for more details.
usage: speaker-recognition.py [-h] -t TASK -i INPUT -m MODEL
Speaker Recognition Command Line Tool
optional arguments:
-h, --help show this help message and exit
-t TASK, --task TASK Task to do. Either "enroll", "predict"
-i INPUT, --input INPUT
Input Files(to predict) or Directories(to enroll)
-m MODEL, --model MODEL
Model file to save(in enroll) or use(in predict)
Note that wildcard inputs should be *quoted*, and they will be sent to glob
Examples:
Train:
./speaker-recognition.py -t enroll -i "/tmp/person* ./mary" -m model.out
Predict:
./speaker-recognition.py -t predict -i "./*.wav" -m model.out