2.4.2 Feature Extraction

Toolboxes for acoustic feature extraction:

Toolbox Language  Description
Essentia C++ / Python Mostly developed for MIR community, but provides many tools and feature extractors applicable for environmental audio research.
librosa Python Mostly developed for MIR community, but provides many tools and feature extractors applicable for environmental audio research.
rastamat Matlab Versatile MFCC implementation
VOICEBOX Matlab Widely used MFCC implementation

2.5 Supervised Learning and Recognition

Toolboxes and libraries for machine learning

Toolbox Language  Description
Keras Python Neural Network front-end library for fast experimentation with deep neural networks. Can used Theano, TensorFlow and Microsoft Cognitive Toolkit as backend.
TensorFlow Python A symbolic math library, which is widely used for machine learning applications such as deep neural networks.
Theano Python Numerical computation library. Active development ceased on 15th November 2017.
Microsoft Cognitive Toolkit C++ Deep learning framework.
Torch Lua Library for deep machine learning.
PyTorch Python Library for deep machine learning.
Caffe C++ / Python / Matlab Deep learning framework. Originally develop by UC Berkeley.
scikit-learn Python Machine learning library.

2.6 An Example Approach Based on Neural Networks

Example systems presented in the chapter are released in separate code repository.

2.6.1 Sound Classification

Single-label classification

single_label_classification.py

This is an example application to demonstrate single-label classification. Acoustic scene classification application is used as an example application, and TUT Sound Scenes 2017, development dataset is used as test data. The dataset contains 10 second long audio excerpts from 15 different acoustic scene classes.

Multi-label classification

multi_label_classification.py

This is an example application to demonstrate multi-label classification. Audio tagging is used as an example application, and CHiME-Home, development & evaluation dataset is used as test data. The dataset contains 4 second long audio excerpts with varying amount of tags assigned to them. Total of seven tag classes are used in the dataset.

2.6.2 Sound Event Detection

sound_event_detection.py

This is an example application to demonstrate detection. Sound event detection is used as an example application, and TUT Sound events 2017, development dataset is used as test data.

Other helpful toolboxes

Toolbox Language  Description
sed_eval Python Evaluation toolbox for Sound Event Detection. Documentation
dcase_util Python A collection of utilities for Detection and Classification of Acoustic Scenes and Events. Documentation