This repository provides an official PyTorch implementation of "Hierarchical Timbre-Painting and Articulation Generation"
Our method generates high-fidelity audio for a target instrument, based f0 and loudness signal.
During training, loudness and f0 signal are extracted from ground-truth signal, which enables us to convert the melody of any input instrument to the trained instrument - task also known as Timbre Transfer
Audio Samples | Paper | Pretrained Models | Timbre Transfer Colab Demo
We suggest separating the generation process into two consecutive phases:
- Articulation - We generate the backbone of the audio and the transition between notes. This is done on a low sample rate from the given condition, loudness and f0 inputs. We use a sine excitation based on the extracted f0 signal, hence using the generator as a Neural-Source-Filtering network rather than a classic GAN generator which is condition on random noise.
- Timbre Painting - The next phase is composed of timbre painting networks: each network gets as input the previously generated audio and serves as a learnable upsample network. Each timbre-painting networks adds sample-rate specific details to the audio clip.
The needed packages are given in requirements.txt
Using a virtual enviroment is recommended:
virtualenv -p python3 .venv
source .venv/bin/activate
pip install -r requirements.txt
To use distributed runs, please install apex
Hydra is used for configuration and experiments mangement, for more info refer
$ git clone
$ cd timbre_painting
To download the URMP dataset used in our paper please fill the form
After download extract the content of the file to a folder named urmp
and run the following script to preprocess the data:
To train the model on any other datasets of monophonic instruments, copy the audio files to data_tmp
each instrument in a different folder, and run:
python urmp=null
Default parameters are given at conf/data_config.yaml
, overrides should be given in command line.
Please note the default parameters are defined for URMP dataset, for other datasets tuning might be needed (especially the data_processor.params.confidence_threshold
and data_processor.params.silence_thresh_dB
To Train with the original paper's parameters run:
Default parameters are given at conf/runs/main.yaml
, overrides should be given in command line.
for example, the following line runs an experiment on a dataset folder named 'flute' for 400 epochs and batch_size of 4:
python paths.input_data=data.flute optim.epochs=400 optim.batch_size=4
results are saved in the folder outputs/main/${%Y-%m-%d_%H-%M-%S}
DDP is supported in the code by Apex package. To run in distributed mode, use the following template:
python -m torch.distributed.launch --use_env --nproc_per_node {# of gpus} {argument overrides}
It's possible to use CUDA_VISIBLE_DEVICES=0,1
to choose the gpus to run on, in this example gpus 0,1 on the machine.
To transfer the timbre of your files using a trained network, run:
python trained_dirpath={path/to/trained_model} input_dirpath={path/to/audio_sample_folder}
Default parameters are given at conf/transfer_config.yaml
The generated files are saved in the experiment folder, in subdirectory generation
Each input is generated in 5 versions with varying octave range from [-2, 2]
Pretrained models of instruments from URMP dataset are summarazied in the table. The models can be downloaded from the google drive links attached. Download the model, extract and follow timbre transfer to generate audio.
Instrument |
Violin |
Saxophone |
Trumpet |
Cello |
If you found this code useful, please cite the following paper:
title={Hirearchical Timbre-Painting and Articulation Generation},
author={Michael Michelashvili and Lior Wolf},
journal={21st International Society for Music Information Retrieval (ISMIR2020)},
Credit to Adam Polyak for PyTorch CREPE pitch-extraction implementation and helpful discussions.