By default, SpeechRecognition's Sphinx functionality supports only US English. Additional language packs are also available, but not included due to the files being too large:
To install a language pack, download the ZIP archives and extract them directly into the module install directory (you can find the module install directory by running python -c "import speech_recognition as sr, os.path as p; print(p.dirname(sr.__file__))"
).
Here is a simple Bash script to install all of them, assuming you've downloaded all three ZIP files into your current directory:
#!/usr/bin/env bash
SR_LIB=$(python -c "import speech_recognition as sr, os.path as p; print(p.dirname(sr.__file__))")
sudo apt-get install --yes unzip
sudo unzip -o fr-FR.zip -d "$SR_LIB"
sudo chmod --recursive a+r "$SR_LIB/pocketsphinx-data/fr-FR/"
sudo unzip -o zh-CN.zip -d "$SR_LIB"
sudo chmod --recursive a+r "$SR_LIB/pocketsphinx-data/zh-CN/"
sudo unzip -o it-IT.zip -d "$SR_LIB"
sudo chmod --recursive a+r "$SR_LIB/pocketsphinx-data/it-IT/"
Once installed, you can simply specify the language using the language
parameter of recognizer_instance.recognize_sphinx
. For example, French would be specified with "fr-FR"
and Mandarin with "zh-CN"
.
For Windows, it is recommended to install the precompiled Wheel packages in the third-party
directory. These are provided because building Pocketsphinx on Windows requires a lot of work, and can take hours to download and install all the surrounding software.
For Linux and other POSIX systems (like OS X), you'll want to build from source. It should take less than two minutes on a fast machine.
- On any Debian-derived Linux distributions (like Ubuntu and Mint):
- Run
sudo apt-get install python3 python3-all-dev python3-pip build-essential swig git libpulse-dev libasound2-dev
for Python 3. - Run
pip3 install pocketsphinx
for Python 3.
- Run
- On OS X:
- Run
brew install swig git python3
for Python 3. - Install PocketSphinx-Python using Pip:
pip install pocketsphinx
. - If this gives errors when importing the library in your program, try running
brew link --overwrite python
.
- If this gives errors when importing the library in your program, try running
- Install PocketSphinx-Python using Pip:
- Run
- On Windows:
- Install Python, Pip, SWIG, and Git, preferably using a package manager.
- Install the necessary compilers suite (here's a PDF version in case the link goes down) for compiling modules for your particular Python version:
- Visual Studio 2015 Community Edition for Python 3.5.
- The installation process for Python 3.4 is outlined in the article above.
- Add the folders containing the Python, SWIG, and Git binaries to your
PATH
environment variable. - My
PATH
environment variable looks something like:C:\Users\Anthony\Desktop\swigwin-3.0.8;C:\Program Files\Git\cmd;(A BUNCH OF OTHER PATHS)
.
- My
- Add the folders containing the Python, SWIG, and Git binaries to your
- Reboot to apply changes.
- Download the full PocketSphinx-Python source code by running
git clone --recursive --depth 1 https://github.com/cmusphinx/pocketsphinx-python
(downloading the ZIP archive from GitHub will not work). - Run
python setup.py install
in the PocketSphinx-Python source code folder to compile and install PocketSphinx. - Side note: when I build the precompiled Wheel packages, I skip steps 5 and 6 and do the following instead:
- For Python 3.4:
C:\Python34\python.exe setup.py bdist_wheel
. - For Python 3.5:
C:\Users\Anthony\AppData\Local\Programs\Python\Python35\python.exe setup.py bdist_wheel
. - The resulting packages are located in the
dist
folder of the PocketSphinx-Python project directory.
- For Python 3.4:
- Every language has its own folder under
/speech_recognition/pocketsphinx-data/LANGUAGE_NAME/
, whereLANGUAGE_NAME
is the IETF language tag, like"en-US"
(US English) or"en-GB"
(UK English). - For example, the US English data is stored in
/speech_recognition/pocketsphinx-data/en-US/
. - The
language
parameter ofrecognizer_instance.recognize_sphinx
simply chooses the folder with the given name.
- For example, the US English data is stored in
- Every language has its own folder under
- Languages are composed of 3 parts:
- An acoustic model
/speech_recognition/pocketsphinx-data/LANGUAGE_NAME/acoustic-model/
, which describes how to interpret audio data. - Acoustic models can be downloaded from the CMU Sphinx files. These are pretty disorganized, but instructions for cleaning up specific versions are listed below.
- All of these should be 16 kHz (broadband) models, since that's what the library will assume is being used.
- An acoustic model
- A language model
/speech_recognition/pocketsphinx-data/LANGUAGE_NAME/language-model.lm.bin
(in CMU binary format). - A pronounciation dictionary
/speech_recognition/pocketsphinx-data/LANGUAGE_NAME/pronounciation-dictionary.dict
, which describes how words in the language are pronounced.
- All of the following points assume a Debian-derived Linux Distibution (like Ubuntu or Mint).
- To work with any complete, real-world languages, you will need quite a bit of RAM (16 GB recommended) and a fair bit of disk space (20 GB recommended).
- SphinxBase is needed for all language model file format conversions. We use it to convert between
*.dmp
DMP files (an obselete Sphinx binary format),*.lm
ARPA files, and Sphinx binary*.lm.bin
files: - Install all the SphinxBase build dependencies with
sudo apt-get install build-essential automake autotools-dev autoconf libtool
. - Download and extract the SphinxBase source code.
- Follow the instructions in the README to install SphinxBase. Basically, run
sh autogen.sh --force && ./configure && make && sudo make install
in the SphinxBase folder.
- Install all the SphinxBase build dependencies with
- SphinxBase is needed for all language model file format conversions. We use it to convert between
- Pruning (getting rid of less important information) is useful if language model files are too large. We can do this using IRSTLM:
- Install all the IRSTLM build dependencies with
sudo apt-get install build-essential automake autotools-dev autoconf libtool
- Download and extract the IRSTLM source code.
- Follow the instructions in the README to install IRSTLM. Basically, run
sh regenerate-makefiles.sh --force && ./configure && make && sudo make install
in the IRSTLM folder. - If the language model is not in ARPA format, convert it to the ARPA format. To do this, ensure that SphinxBase is installed and run
sphinx_lm_convert -i LANGUAGE_MODEL_FILE_GOES_HERE -o language-model.lm -ofmt arpa
. - Prune the model using IRSTLM: run
prune-lm --threshold=1e-8 t.lm pruned.lm
to prune with a threshold of 0.00000001. The higher the threshold, the smaller the resulting file. - Convert the model back into binary format if it was originally not in ARPA format. To do this, ensure that SphinxBase is installed and run
sphinx_lm_convert -i language-model.lm -o LANGUAGE_MODEL_FILE_GOES_HERE
.
- Install all the IRSTLM build dependencies with
- US English:
/speech_recognition/pocketsphinx-data/en-US/
is taken directly from the contents of PocketSphinx's US English model. - International French:
/speech_recognition/pocketsphinx-data/fr-FR/
: /speech_recognition/pocketsphinx-data/fr-FR/language-model.lm.bin
isfr-small.lm.bin
from the Sphinx French language model./speech_recognition/pocketsphinx-data/fr-FR/pronounciation-dictionary.dict
isfr.dict
from the Sphinx French language model./speech_recognition/pocketsphinx-data/fr-FR/acoustic-model/
contains all of the files extracted fromcmusphinx-fr-5.2.tar.gz
in the Sphinx French acoustic model.- To get better French recognition accuracy at the expense of higher disk space and RAM usage:
- Download
fr.lm.gmp
from the Sphinx French language model. - Convert from DMP (an obselete Sphinx binary format) to ARPA format:
sphinx_lm_convert -i fr.lm.gmp -o french.lm.bin
. - Replace
/speech_recognition/pocketsphinx-data/fr-FR/language-model.lm.bin
withfrench.lm.bin
created in the previous step.
- Download
- International French:
- Mandarin Chinese:
/speech_recognition/pocketsphinx-data/zh-CN/
: /speech_recognition/pocketsphinx-data/zh-CN/language-model.lm.bin
is generated as follows:- Download
zh_broadcastnews_64000_utf8.DMP
from the Sphinx Mandarin language model. - Convert from DMP (an obselete Sphinx binary format) to ARPA format:
sphinx_lm_convert -i zh_broadcastnews_64000_utf8.DMP -o chinese.lm -ofmt arpa
. - Prune with a threshold of 0.00000004 using
prune-lm --threshold=4e-8 chinese.lm chinese.lm
. - Convert from ARPA format to Sphinx binary format:
sphinx_lm_convert -i chinese.lm -o chinese.lm.bin
. - Replace
/speech_recognition/pocketsphinx-data/zh-CN/language-model.lm.bin
withchinese.lm.bin
created in the previous step.
- Download
/speech_recognition/pocketsphinx-data/zh-CN/pronounciation-dictionary.dict
iszh_broadcastnews_utf8.dic
from the Sphinx Mandarin language model./speech_recognition/pocketsphinx-data/zh-CN/acoustic-model/
contains all of the files extracted fromzh_broadcastnews_16k_ptm256_8000.tar.bz2
in the Sphinx Mandarin acoustic model.- To get better Chinese recognition accuracy at the expense of higher disk space and RAM usage, simply skip step 3 when preparing
zh_broadcastnews_64000_utf8.DMP
.
- Mandarin Chinese:
- Italian:
/speech_recognition/pocketsphinx-data/it-IT/
: /speech_recognition/pocketsphinx-data/it-IT/language-model.lm.bin
is generated as follows:- Download
cmusphinx-it-5.2.tar.gz
from the Sphinx Italian language model. - Extract
/etc/voxforge_it_sphinx.lm
fromcmusphinx-it-5.2.tar.gz
asitalian.lm
. - Convert from ARPA format to Sphinx binary format:
sphinx_lm_convert -i italian.lm -o italian.lm.bin
. - Replace
/speech_recognition/pocketsphinx-data/it-IT/language-model.lm.bin
withitalian.lm.bin
created in the previous step.
- Download
/speech_recognition/pocketsphinx-data/it-IT/pronounciation-dictionary.dict
is/etc/voxforge_it_sphinx.dic
fromcmusphinx-it-5.2.tar.gz
(from the Sphinx Italian language model)./speech_recognition/pocketsphinx-data/it-IT/acoustic-model/
contains all of the files in/model_parameters
extracted fromcmusphinx-it-5.2.tar.gz
(from the Sphinx Italian language model).
- Italian: