| Audio Samples 🔈 | DAFx Conference Paper 📄 | JAES Article 📄 |
DDSP-Piano is a piano sound synthesizer from MIDI based on DDSP.
This code relies on the official Tensorflow implementation of DDSP (tested on v3.2.0 and v3.7.0) without additional package required.
pip install --upgrade ddsp==3.7.0
A piano MIDI file can be synthesized using the command:
python synthesize_midi_file.py <input_midi_file.mid> <output_file.wav>
Additional arguments for the inference script include:
-c
,--config
: a.gin
configuration file of a DDSP-Piano model architecture. You can chose one of the configs in theddsp_piano/configs/
folder.--ckpt
: a checkpoint folder with your own model weights.--piano_type
: the desired model among the 10 piano years learned from the MAESTRO dataset (0
to9
).-d
,--duration
: the maximum duration of the synthesized file. It is set by default toNone
, which will synthesize the whole file.-wu
,--warm_up
: duration of recurrent layers warm-up (to avoid undesirable noise at the beginning of the synthesized audio).-u
,--unreverbed
: toggle it to also get the dry piano sound, without reverb applying.-n
,--normalize
: set the loudness of the output file to this amount of dBFS. Set by default toNone
, which does not apply any gain modification.
The default arguments will synthesize using the most recent version of DDSP-Piano. If you want to use the default model presented in the published papers, the inference script should look like:
python synthesize_midi_file.py \
--config ddsp_piano/configs/dafx22.gin \
--ckpt ddsp_piano/model_weights/dafx22/ \
<input_midi_file.mid> <output_file.wav>
If you want to synthesize multiple performances from MAESTRO at once, you can gather their information into a .csv
file (see assets/tracks_listening_test.csv
for example) and use this script:
python synthesize_from_csv.py <path/to/maestro-v3.0.0/> <your/file.csv> <output/directory/>
It has the same additional arguments as the synthesize_midi_file.py
script, with the exception of -dc
replacing the -u
flag in order to get the dry audio, but also the isolated filtered noise and additive synthesizers outputs.
Evaluation of the model can be conducted on the full MAESTRO test set with the corresponding script:
python evaluate_model.py <path/to/maestro-v3.0.0/> <output-directory/>
Additional arguments include:
-c
,--config
: the.gin
model config file.--ckpt
: checkpoint to load weights from.-wu
,--warm_up
: the warm-up duration.-w
,--get_wav
: if toggled, will also save the audio of all synthesis examples.
The paper model is trained and evaluated on the MAESTRO dataset (v3.0.0). After following the instructions for downloading it, a DDSP-Piano model can be trained using one of the scripts presented below.
The model uses a particular encoding for handling MIDI data. During training, conversion on the fly can take some time, on top of resampling audio data.
The following script can be used to preprocess the MIDI and audio data of MAESTRO, and store them in TFRecord format for faster data pipeline processing:
python preprocess_maestro.py <path/to/maestro-v3.0.0/> <store/tfrecords/in/this/folder/>
Additional arguments include:
-sr
: the desired audio sampling rate, to adjust accordingly to the model configuration. Set by default to 24kHz.-fr
: the MIDI control frame rate, set by default to 250Hz.-p
: the polyphonic capacity of the model, or maximum number of simultaneous notes handlable.
According to our conducted listening test, decent synthesis quality can be achieved with only a single training phase, using the following python script:
python train_single_phase.py <path/to/maestro-v3.0.0/> <experiment-directory/>
Additional arguments include:
-c
,--config
: a.gin
model configuration file.--val_path
: optional path to the.tfrecord
file containing the preprocessed validation data (see above).--batch_size
,--steps_per_epoch
,--epochs
,--lr
: your usual training hyper-parameters.-p
,--phase
: the current training phase (which toggles the trainability of the corresponding layers).-r
,--restore
: a checkpoint folder to restore weights from.
Note that the path/to/maestrov3.0.0/
can either be the extracted Maestro dataset as is, or the maestro_training.tfrecord
preprocessed version obtained from the previous section.
During training, the Tensorboard logs are saved under <experiment-directory>/logs/
.‡
This script reproduces the full training of the default model presented in the papers:
source train_ddsp_piano.sh <path-to-maestro-v3.0.0/> <experiment-directory/>
It alternates between 2 training phases (one for the layers involved in the partial frequencies computation and the other for the remaining layers).
The final model checkpoint should be located in <experiment-directory>/phase_3/last_iter/
.
However, as frequency estimation with differentiable oscillators is still an unsolved issue (see here and here), the second training phase does not improve the model quality and we recommend to just use the single training phase script above for simplicity.
- Format code for FDN-based reverb.
- Use filtered noise synth with dynamic size on all model configs + adapt all model buildings.
- Release script for extracting single note partials estimation.
- Remove training phase related code.
If you use this code for your research, please cite it as:
@article{renault2023ddsp_piano,
title={DDSP-Piano: A Neural Sound Synthesizer Informed by Instrument Knowledge},
author={Renault, Lenny and Mignot, Rémi and Roebel, Axel},
journal={Journal of the Audio Engineering Society},
volume={71},
number={9},
pages={552--565}
year={2023},
month={September}
}
or
@inproceedings{renault2022diffpiano,
title={Differentiable Piano Model for MIDI-to-Audio Performance Synthesis},
author={Renault, Lenny and Mignot, Rémi and Roebel, Axel},
booktitle={Proceedings of the 25th International Conference on Digital Audio Effects},
year={2022}
}
This project is conducted at IRCAM and has been funded by the European Project AI4Media (grant number 951911).
Thanks to @phvial for its implementation of the FDN-based reverb, in the context of the AQUA-RIUS ANR project.