-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
f4e17a9
commit f130a95
Showing
4 changed files
with
327 additions
and
60 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,217 @@ | ||
# Pitch-shifting and time-stretching with TD-PSOLA | ||
# Time-domain pitch-synchronous overlap-add (TD-PSOLA) | ||
[](https://pypi.python.org/pypi/psola) | ||
[](https://www.gnu.org/licenses/gpl-3.0) | ||
<!-- [](https://pepy.tech/project/psola) --> | ||
|
||
This module permits contant- and variable-rate pitch-shifting and | ||
time-stretching of speech. It is a wrapper around the `parselmouth` [1] | ||
wrapper around the Praat [2] implementation of TD-PSOLA [3]. Pitch-shifting | ||
is performed by providing a numpy array of target pitch values equally spaced | ||
over time. Variable-rate time stretching uses forced phoneme alignment via | ||
[`pypar`](https://github.com/maxrmorrison/pypar). | ||
|
||
If you need to extract pitch features or phoneme alignments, see | ||
[`torchcrepe`](https://github.com/maxrmorrison/torchcrepe) for pitch estimation | ||
and [`pyfoal`](https://github.com/maxrmorrison/pyfoal) for forced alignment. | ||
If you only want to perform pitch-shifting, you do not need to extract | ||
forced alignments. If you want to do variable-rate time stretching, you do not | ||
need to perform pitch estimation. | ||
|
||
|
||
## Installation | ||
|
||
`pip install psola` | ||
|
||
|
||
## Usage | ||
|
||
If you want to perform pitch-shifting or time-stretching on audio already | ||
loaded into memory, use `psola.vocode`. If you want to do this with audio | ||
saved in a file, use `psola.from_file`. You can use `psola.to_file` or | ||
`psola.from_file_to_file` to save the results to a file. To process many | ||
files at once with multiprocessing, use `psola.from_files_to_files`. | ||
Each of these functions is documented below. | ||
|
||
|
||
### `psola.vocode` | ||
|
||
```python | ||
"""Performs pitch vocoding using Praat | ||
Arguments | ||
audio : np.array(shape=(samples,)) | ||
The speech signal to process | ||
sample_rate : int | ||
The audio sampling rate. | ||
source_alignment : pypar.Alignment | ||
The current alignment if performing time-stretching | ||
target_alignment : pypar.Alignment | ||
The target alignment if performing time-stretching | ||
target_pitch : np.array(shape=(frames,)) | ||
The target pitch contour | ||
constant_stretch : float or None | ||
A constant value for time-stretching | ||
fmin : int | ||
The minimum allowable frequency in Hz. | ||
fmax : int | ||
The maximum allowable frequency in Hz. | ||
Returns | ||
audio : np.array(shape=(samples,)) | ||
The vocoded audio | ||
""" | ||
``` | ||
|
||
|
||
### `psola.from_file` | ||
|
||
```python | ||
"""Performs vocoding using Praat | ||
Arguments | ||
audio_file : string | ||
The file containing the speech signal to process | ||
source_alignment_file : string or None | ||
The file containing the original alignment | ||
target_alignment_file : string or None | ||
The file containing the target alignment | ||
target_pitch_file : string or None | ||
The file containing the target pitch | ||
constant_stretch : float or None | ||
A constant value for time-stretching | ||
fmin : int | ||
The minimum allowable frequency in Hz. | ||
fmax : int | ||
The maximum allowable frequency in Hz. | ||
Returns | ||
audio : np.array(shape=(samples,)) | ||
The vocoded audio | ||
sample_rate : int | ||
The audio sampling rate | ||
""" | ||
``` | ||
|
||
|
||
### `psola.to_file` | ||
|
||
```python | ||
"""Performs pitch vocoding and saves audio to disk | ||
Arguments | ||
audio : np.array(shape=(samples,)) | ||
The speech signal to process | ||
sample_rate : int | ||
The audio sampling rate | ||
output_file : string | ||
The file to save the vocoded speech | ||
source_alignment : pypar.Alignment | ||
The current alignment if performing time-stretching | ||
target_alignment : pypar.Alignment | ||
The target alignment if performing time-stretching | ||
target_pitch : np.array(shape=(frames,)) | ||
The target pitch contour | ||
constant_stretch : float or None | ||
A constant value for time-stretching | ||
fmin : int | ||
The minimum allowable frequency in Hz. | ||
fmax : int | ||
The maximum allowable frequency in Hz. | ||
""" | ||
``` | ||
|
||
|
||
### `psola.from_file_to_file` | ||
|
||
```python | ||
"""Performs vocoding using Praat and save to disk | ||
Arguments | ||
audio_file : string | ||
The file containing the speech signal to process | ||
output_file : string | ||
The file to save the vocoded speech | ||
source_alignment_file : string or None | ||
The file containing the original alignment | ||
target_alignment_file : string or None | ||
The file containing the target alignment | ||
target_pitch_file : string or None | ||
The file containing the target pitch | ||
constant_stretch : float or None | ||
A constant value for time-stretching | ||
fmin : int | ||
The minimum allowable frequency in Hz. | ||
fmax : int | ||
The maximum allowable frequency in Hz. | ||
""" | ||
``` | ||
|
||
|
||
### `psola.from_files_to_files` | ||
|
||
```python | ||
"""Performs vocoding using Praat and save to disk | ||
Arguments | ||
audio_files : list | ||
The files containing the speech signals to process | ||
output_files : list | ||
The files to save the vocoded speech | ||
source_alignment_files : string or None | ||
The files containing the original alignments | ||
target_alignment_files : list or None | ||
The files containing the target alignments | ||
target_pitch_files : list or None | ||
The files containing the target pitch | ||
constant_stretch : float or None | ||
A constant value for time-stretching | ||
fmin : int | ||
The minimum allowable frequency in Hz. | ||
fmax : int | ||
The maximum allowable frequency in Hz. | ||
""" | ||
``` | ||
|
||
|
||
## Command-line interface | ||
|
||
The command-line interface wraps the arguments of `psola.from_files_to_files`. | ||
|
||
``` | ||
usage: python -m psola | ||
[-h] | ||
[--audio_files AUDIO_FILES [AUDIO_FILES ...]] | ||
[--source_alignment_files SOURCE_ALIGNMENT_FILES [SOURCE_ALIGNMENT_FILES ...]] | ||
[--target_alignment_files TARGET_ALIGNMENT_FILES [TARGET_ALIGNMENT_FILES ...]] | ||
[--constant_stretch CONSTANT_STRETCH] | ||
[--target_pitch_files TARGET_PITCH_FILES [TARGET_PITCH_FILES ...]] | ||
[--fmin FMIN] | ||
[--fmax FMAX] | ||
[--output_files OUTPUT_FILES [OUTPUT_FILES ...]] | ||
optional arguments: | ||
-h, --help show this help message and exit | ||
--audio_files AUDIO_FILES [AUDIO_FILES ...] | ||
The speech signal to process | ||
--source_alignment_files SOURCE_ALIGNMENT_FILES [SOURCE_ALIGNMENT_FILES ...] | ||
The files containing the original alignments | ||
--target_alignment_files TARGET_ALIGNMENT_FILES [TARGET_ALIGNMENT_FILES ...] | ||
The files containing the target alignments | ||
--constant_stretch CONSTANT_STRETCH | ||
A constant value for time-stretching | ||
--target_pitch_files TARGET_PITCH_FILES [TARGET_PITCH_FILES ...] | ||
The target pitch contour | ||
--fmin FMIN The minimum allowable frequency in Hz | ||
--fmax FMAX The maximum allowable frequency in Hz | ||
--output_files OUTPUT_FILES [OUTPUT_FILES ...] | ||
Where to save the vocoded audio | ||
``` | ||
|
||
|
||
## References | ||
|
||
[1] Y. Jadoul, B. Thompson, and B. De Boer, "Introducing parselmouth: A python interface to praat," Journal of Phonetics, vol. 71, pp. 1–15, 2018. | ||
|
||
[2] P. Boersma, "Praat: doing phonetics by computer", http://www.praat.org/, 2006. | ||
|
||
[3] E. Moulines and F. Charpentier, "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones," Speech communication, 1990. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.