A script for the MPV media player that allows you to play text files and ebooks using text-to-speech (TTS).
Currently supported TTS engines are say
(MacOS) and espeak
. Supported formats are txt, epub, mobi, azw3, azw4, pdf, docx, odt, and many more.
- MacOS, Linux, *BSD, etc (Windows is NOT currently supported)
- python3
- ffmpeg
- espeak (not required on MacOS)
- Calibre (OPTIONAL: only required for ebook support)
make install
or:
cp txt_hook.lua ~/.config/mpv/scripts/
cp text2media.py ~/.config/mpv/
chmod +x ~/.config/mpv/text2media.py
optional:
cp txt_hook.conf ~/.config/mpv/lua-settings/
mpv-txt works by splitting the input file into individual sentences, creating a TTS audio file from each sentence, generating an SRT subtitle file from each sentence, then using ffmpeg to combine those audio and SRT files into one mp4 per sentence (there is no video stream). Then ffmpeg is again used to combine all the per-sentence mp4 files into a single mp4. All resulting mp4 files will be located in /tmp/mpv-txt/
.
If the input file is an ebook, it is first run through the ebook-convert
tool from Calibre to create a text document.
The output for a reasonably average book weighs in at around 100MB. On modest processors, a large book like Moby Dick may take an hour to run through TTS (Moby Dick takes about 30 minutes on my 1.6GHz i5, using MacOS's say
and threads=4 and produces a 174MB mp4). If the product files are still present in /tmp/mpv-txt/
then mpv-txt will not regenerate them. However you may want to copy the final product mp4 out of /tmp/mpv-txt/
and store it someplace more permanent, so you don't need to waste your time recreating it in the future.
If you quit MPV while mpv-txt is in the middle of processing a file, the next time you try to play that file mpv-txt will resume where you left off.
(some of these plans may prove to be mutually exclusive)
-
epub/mobi support (using Calibre'sebook-convert
to generate a text file) - give attention to subtitle formatting (currently all default settings are used)
- add a text substitution feature, allowing users to manipulate TTS pronounciations of difficult words
-
parallel TTS - remove python3 dependancy(?)
- windows support (low priority, but reasonable pull requests are welcome ;) )
- support for 'cloud' TTS services (VERY low priority)