Author: | Frank Cwitkowitz <fcwitkow@ur.rochester.edu> |
---|
This directory contains the resources necessary to test convolutional sparse coding with lateral inhibition on a characteristic example where an audio signal representing a small piano excerpt is encoded using the activations of individual keys. The encoding provides the information necessary to perform automatic music transcription (AMT), whereby the pitch, onset, and offset of each note in a music signal is estimated. Here, we encode an audio excerpt from the MAPS dataset, and draw a comparison to its ground-truth note activity. The MAPS dataset is only required if ones wishes to regenerate the dictionary or test out different audio excerpts. We have provided the necessary resources to complete the simple example outlined here.
- maps_dict_gen.py
Script that generates the dictionary data/pianodict.npz from scratch. The ENSTDkCL partition of the MAPS dataset is required to regenerate data/pianodict.npz from scratch using this script. Here, the dictionary elements are obtained from the recordings of isolated piano notes, which are truncated to different lengths, in order to represent different possible durations of the same note. This is exactly why we need lateral inhibition - because the same pitch can only ever be active once on the piano, but we have multiple elements representing the same pitch.
If the MAPS dataset is downloaded and properly referenced at the top of the script, the script can simply be run. Some dictionary generation parameters, such as the number of elements per pitch, the resolution of time duration of the elements, and the MIDI range of the dictionary, can also be modified at the top of the script.
- cbpdnin_msc.py
- Example script that encodes the audio with convolutional sparse coding with lateral inhibition using the dictionary data/pianodict.npz. The elements are grouped by pitch, such that a solution with multiple concurrent activations of the same pitch is highly discouraged. The script also displays the activations of each element over time (a proxy for our transcription), along with the ground truth piano-key activations. Note that this script requires installation of the librosa package in addition to sporco and its dependencies.
- data/MAPS_MUS-deb_clai_ENSTDkCL_excerpt.wav
- Audio we encode to exemplify convolutional sparse coding with lateral inhibition.
- data/MAPS_MUS-deb_clai_ENSTDkCL_excerpt.txt
- Ground-truth containing the pitch, onset, and offset of each piano note present in the audio excerpt.
- data/pianodict.npz
- Dictionary generated using the default parameters (hard-coded in maps_dict_gen.py), provided for convenience.