Master thesis. Report and code available in the Github repository.
Presentation at EuroPython2019. Video recording, notes
Report and lecture at NMBU Data Science.
With example code in Python
- Loading Youtube audio data with youtube-dl and librosa
- Extracting fixed-size analysis windows from audio
- Classifying an audio clip of many analysis windows using Keras Timedistributed and GlobalAveragePooling
- Classifying an audio clip by voting over analysis windows. Mean/majority voting.
- Annotating/labeling audio data using Audacity
- Preprocessing audio into mel-spectrograms
- Multi-core preprocessing of audio files using joblib
- Compute MFCC or mel-spectrogram from existing STFT spectrograms
- Converting mel-spectrograms into PNG images
Rough notes on various topics.
- Applications. Practical applications of Machine Hearing
- Tasks. Established problem formulations
- Features. Feature representations
- Preprocessing. Preprocessing techniques
- DCASE2018. Notes from DCASE2018 challenge and conference
- Commercial solutions. Companies and products in Machine Hearing
- Compressive Sensing.
Useful resources to learn more.
- Computational Analysis of Sound Scenes and Events. Tuomas Virtanen, Mark D. Plumbley, Dan Ellis. 2018.
- Human and Machine Hearing - Extracting Meaning from Sound. Richard F. Lyon. 2017, revised 2018.
- An Introduction to Audio Content Analysis - Applications in Signal Processing and Music Informatics. Alexander Lerch. 2012. Companion website: https://www.audiocontentanalysis.org/
- Machine Learning for Audio, Image and Video Analysis: Theory and Applications (Advanced Information and Knowledge Processing). Francesco Camastra, 3 sections. From Perception to Computation, Machine Learning, Applications.
- CSC 83060: Speech and Audio Understanding. http://mr-pc.org/t/csc83060/ Brooklyn College (CUNY).
Feature extraction
- librosa. The go-to Python module.
- essentia. C++ library, with Python bindings. Lots of Music Analysis extractors. Used by FreeSound and Acousticbrainz.
- kapre. On-demand GPU computation of melspectrograms, for Keras
- torchaudio.
Data Augmentation
- Audio Classification. http://www.cs.tut.fi/~sgn24006/PDF/L04-audio-classification.pdf Covers low-level features, MFCC. Classification by distance metrics. GMM. HMM.
- Speech Signal Analysis, Lecture 2. January 2017, Hiroshi Shimodaira and Steve Renals. ! great diagrams of audio discretization, mel filters, wide versus narrow-band spectrograms.
- Kaggle Whale detection
- Kaggle FreeSound tagging 2018
- Kaggle FreeSound
- DCASE2014
- DCASE2018
- DCASE2019
- https://mircommunity.slack.com/ - Music Information Retrieval
- Awesome Deep Learning Music
- Fast.ai forums: Deep Learning with Audio. Large lists of resources, both in first post and "popular links". Feb 2019, 315 replies over 4 months.