Skip to content

Latest commit

 

History

History
118 lines (102 loc) · 13.1 KB

pocketsphinx.rst

File metadata and controls

118 lines (102 loc) · 13.1 KB

Notes on using PocketSphinx

Installing other languages

By default, SpeechRecognition's Sphinx functionality supports only US English. Additional language packs are also available, but not included due to the files being too large:

To install a language pack, download the ZIP archives and extract them directly into the module install directory (you can find the module install directory by running python -c "import speech_recognition as sr, os.path as p; print(p.dirname(sr.__file__))").

Here is a simple Bash script to install all of them, assuming you've downloaded all three ZIP files into your current directory:

#!/usr/bin/env bash
SR_LIB=$(python -c "import speech_recognition as sr, os.path as p; print(p.dirname(sr.__file__))")
sudo apt-get install --yes unzip
sudo unzip -o fr-FR.zip -d "$SR_LIB"
sudo chmod --recursive a+r "$SR_LIB/pocketsphinx-data/fr-FR/"
sudo unzip -o zh-CN.zip -d "$SR_LIB"
sudo chmod --recursive a+r "$SR_LIB/pocketsphinx-data/zh-CN/"
sudo unzip -o it-IT.zip -d "$SR_LIB"
sudo chmod --recursive a+r "$SR_LIB/pocketsphinx-data/it-IT/"

Once installed, you can simply specify the language using the language parameter of recognizer_instance.recognize_sphinx. For example, French would be specified with "fr-FR" and Mandarin with "zh-CN".

Building PocketSphinx-Python from source

For Windows, it is recommended to install the precompiled Wheel packages in the third-party directory. These are provided because building Pocketsphinx on Windows requires a lot of work, and can take hours to download and install all the surrounding software.

For Linux and other POSIX systems (like OS X), you'll want to build from source. It should take less than two minutes on a fast machine.

  • On any Debian-derived Linux distributions (like Ubuntu and Mint):
    1. Run sudo apt-get install python3 python3-all-dev python3-pip build-essential swig git libpulse-dev libasound2-dev for Python 3.
    2. Run pip3 install pocketsphinx for Python 3.
  • On OS X:
    1. Run brew install swig git python3 for Python 3.
    2. Install PocketSphinx-Python using Pip: pip install pocketsphinx.
      • If this gives errors when importing the library in your program, try running brew link --overwrite python.
  • On other POSIX-based systems:
    1. Install Python, Pip, SWIG, and Git, preferably using a package manager.
    2. Install PocketSphinx-Python using Pip: pip install pocketsphinx.
  • On Windows:
    1. Install Python, Pip, SWIG, and Git, preferably using a package manager.
    2. Install the necessary compilers suite (here's a PDF version in case the link goes down) for compiling modules for your particular Python version:
    3. Add the folders containing the Python, SWIG, and Git binaries to your PATH environment variable.
      • My PATH environment variable looks something like: C:\Users\Anthony\Desktop\swigwin-3.0.8;C:\Program Files\Git\cmd;(A BUNCH OF OTHER PATHS).
    4. Reboot to apply changes.
    5. Download the full PocketSphinx-Python source code by running git clone --recursive --depth 1 https://github.com/cmusphinx/pocketsphinx-python (downloading the ZIP archive from GitHub will not work).
    6. Run python setup.py install in the PocketSphinx-Python source code folder to compile and install PocketSphinx.
    7. Side note: when I build the precompiled Wheel packages, I skip steps 5 and 6 and do the following instead:
      • For Python 3.4: C:\Python34\python.exe setup.py bdist_wheel.
      • For Python 3.5: C:\Users\Anthony\AppData\Local\Programs\Python\Python35\python.exe setup.py bdist_wheel.
      • The resulting packages are located in the dist folder of the PocketSphinx-Python project directory.

Notes on the structure of the language data

  • Every language has its own folder under /speech_recognition/pocketsphinx-data/LANGUAGE_NAME/, where LANGUAGE_NAME is the IETF language tag, like "en-US" (US English) or "en-GB" (UK English).
    • For example, the US English data is stored in /speech_recognition/pocketsphinx-data/en-US/.
    • The language parameter of recognizer_instance.recognize_sphinx simply chooses the folder with the given name.
  • Languages are composed of 3 parts:
    • An acoustic model /speech_recognition/pocketsphinx-data/LANGUAGE_NAME/acoustic-model/, which describes how to interpret audio data.
      • Acoustic models can be downloaded from the CMU Sphinx files. These are pretty disorganized, but instructions for cleaning up specific versions are listed below.
      • All of these should be 16 kHz (broadband) models, since that's what the library will assume is being used.
    • A language model /speech_recognition/pocketsphinx-data/LANGUAGE_NAME/language-model.lm.bin (in CMU binary format).
    • A pronounciation dictionary /speech_recognition/pocketsphinx-data/LANGUAGE_NAME/pronounciation-dictionary.dict, which describes how words in the language are pronounced.

Notes on building the language data from source

  • All of the following points assume a Debian-derived Linux Distibution (like Ubuntu or Mint).
  • To work with any complete, real-world languages, you will need quite a bit of RAM (16 GB recommended) and a fair bit of disk space (20 GB recommended).
  • SphinxBase is needed for all language model file format conversions. We use it to convert between *.dmp DMP files (an obselete Sphinx binary format), *.lm ARPA files, and Sphinx binary *.lm.bin files:
    • Install all the SphinxBase build dependencies with sudo apt-get install build-essential automake autotools-dev autoconf libtool.
    • Download and extract the SphinxBase source code.
    • Follow the instructions in the README to install SphinxBase. Basically, run sh autogen.sh --force && ./configure && make && sudo make install in the SphinxBase folder.
  • Pruning (getting rid of less important information) is useful if language model files are too large. We can do this using IRSTLM:
    • Install all the IRSTLM build dependencies with sudo apt-get install build-essential automake autotools-dev autoconf libtool
    • Download and extract the IRSTLM source code.
    • Follow the instructions in the README to install IRSTLM. Basically, run sh regenerate-makefiles.sh --force && ./configure && make && sudo make install in the IRSTLM folder.
    • If the language model is not in ARPA format, convert it to the ARPA format. To do this, ensure that SphinxBase is installed and run sphinx_lm_convert -i LANGUAGE_MODEL_FILE_GOES_HERE -o language-model.lm -ofmt arpa.
    • Prune the model using IRSTLM: run prune-lm --threshold=1e-8 t.lm pruned.lm to prune with a threshold of 0.00000001. The higher the threshold, the smaller the resulting file.
    • Convert the model back into binary format if it was originally not in ARPA format. To do this, ensure that SphinxBase is installed and run sphinx_lm_convert -i language-model.lm -o LANGUAGE_MODEL_FILE_GOES_HERE.
  • US English: /speech_recognition/pocketsphinx-data/en-US/ is taken directly from the contents of PocketSphinx's US English model.
  • International French: /speech_recognition/pocketsphinx-data/fr-FR/:
    • /speech_recognition/pocketsphinx-data/fr-FR/language-model.lm.bin is fr-small.lm.bin from the Sphinx French language model.
    • /speech_recognition/pocketsphinx-data/fr-FR/pronounciation-dictionary.dict is fr.dict from the Sphinx French language model.
    • /speech_recognition/pocketsphinx-data/fr-FR/acoustic-model/ contains all of the files extracted from cmusphinx-fr-5.2.tar.gz in the Sphinx French acoustic model.
    • To get better French recognition accuracy at the expense of higher disk space and RAM usage:
      1. Download fr.lm.gmp from the Sphinx French language model.
      2. Convert from DMP (an obselete Sphinx binary format) to ARPA format: sphinx_lm_convert -i fr.lm.gmp -o french.lm.bin.
      3. Replace /speech_recognition/pocketsphinx-data/fr-FR/language-model.lm.bin with french.lm.bin created in the previous step.
  • Mandarin Chinese: /speech_recognition/pocketsphinx-data/zh-CN/:
    • /speech_recognition/pocketsphinx-data/zh-CN/language-model.lm.bin is generated as follows:
      1. Download zh_broadcastnews_64000_utf8.DMP from the Sphinx Mandarin language model.
      2. Convert from DMP (an obselete Sphinx binary format) to ARPA format: sphinx_lm_convert -i zh_broadcastnews_64000_utf8.DMP -o chinese.lm -ofmt arpa.
      3. Prune with a threshold of 0.00000004 using prune-lm --threshold=4e-8 chinese.lm chinese.lm.
      4. Convert from ARPA format to Sphinx binary format: sphinx_lm_convert -i chinese.lm -o chinese.lm.bin.
      5. Replace /speech_recognition/pocketsphinx-data/zh-CN/language-model.lm.bin with chinese.lm.bin created in the previous step.
    • /speech_recognition/pocketsphinx-data/zh-CN/pronounciation-dictionary.dict is zh_broadcastnews_utf8.dic from the Sphinx Mandarin language model.
    • /speech_recognition/pocketsphinx-data/zh-CN/acoustic-model/ contains all of the files extracted from zh_broadcastnews_16k_ptm256_8000.tar.bz2 in the Sphinx Mandarin acoustic model.
    • To get better Chinese recognition accuracy at the expense of higher disk space and RAM usage, simply skip step 3 when preparing zh_broadcastnews_64000_utf8.DMP.
  • Italian: /speech_recognition/pocketsphinx-data/it-IT/:
    • /speech_recognition/pocketsphinx-data/it-IT/language-model.lm.bin is generated as follows:
      1. Download cmusphinx-it-5.2.tar.gz from the Sphinx Italian language model.
      2. Extract /etc/voxforge_it_sphinx.lm from cmusphinx-it-5.2.tar.gz as italian.lm.
      3. Convert from ARPA format to Sphinx binary format: sphinx_lm_convert -i italian.lm -o italian.lm.bin.
      4. Replace /speech_recognition/pocketsphinx-data/it-IT/language-model.lm.bin with italian.lm.bin created in the previous step.
    • /speech_recognition/pocketsphinx-data/it-IT/pronounciation-dictionary.dict is /etc/voxforge_it_sphinx.dic from cmusphinx-it-5.2.tar.gz (from the Sphinx Italian language model).
    • /speech_recognition/pocketsphinx-data/it-IT/acoustic-model/ contains all of the files in /model_parameters extracted from cmusphinx-it-5.2.tar.gz (from the Sphinx Italian language model).