The Machine Learning Approach for Analysis of Sound Scenes and Events |

2.4.2 Feature Extraction

Toolboxes for acoustic feature extraction:

Toolbox	Language	Description
Essentia	C++ / Python	Mostly developed for MIR community, but provides many tools and feature extractors applicable for environmental audio research.
librosa	Python	Mostly developed for MIR community, but provides many tools and feature extractors applicable for environmental audio research.
rastamat	Matlab	Versatile MFCC implementation
VOICEBOX	Matlab	Widely used MFCC implementation

2.5 Supervised Learning and Recognition

Toolboxes and libraries for machine learning

Toolbox	Language	Description
Keras	Python	Neural Network front-end library for fast experimentation with deep neural networks. Can used Theano, TensorFlow and Microsoft Cognitive Toolkit as backend.
TensorFlow	Python	A symbolic math library, which is widely used for machine learning applications such as deep neural networks.
Theano	Python	Numerical computation library. Active development ceased on 15th November 2017.
Microsoft Cognitive Toolkit	C++	Deep learning framework.
Torch	Lua	Library for deep machine learning.
PyTorch	Python	Library for deep machine learning.
Caffe	C++ / Python / Matlab	Deep learning framework. Originally develop by UC Berkeley.
scikit-learn	Python	Machine learning library.

2.6 An Example Approach Based on Neural Networks

Example systems presented in the chapter are released in separate code repository.

2.6.1 Sound Classification

Single-label classification

single_label_classification.py

This is an example application to demonstrate single-label classification. Acoustic scene classification application is used as an example application, and TUT Sound Scenes 2017, development dataset is used as test data. The dataset contains 10 second long audio excerpts from 15 different acoustic scene classes.

Multi-label classification

multi_label_classification.py

This is an example application to demonstrate multi-label classification. Audio tagging is used as an example application, and CHiME-Home, development & evaluation dataset is used as test data. The dataset contains 4 second long audio excerpts with varying amount of tags assigned to them. Total of seven tag classes are used in the dataset.

2.6.2 Sound Event Detection

sound_event_detection.py

This is an example application to demonstrate detection. Sound event detection is used as an example application, and TUT Sound events 2017, development dataset is used as test data.

Other helpful toolboxes

Toolbox	Language	Description
sed_eval	Python	Evaluation toolbox for Sound Event Detection. Documentation
dcase_util	Python	A collection of utilities for Detection and Classification of Acoustic Scenes and Events. Documentation

Chapter 2

The Machine Learning Approach for Analysis of Sound Scenes and Events