2.4.2 Feature Extraction
Toolboxes for acoustic feature extraction:
Toolbox | Language | Description |
---|---|---|
Essentia | C++ / Python | Mostly developed for MIR community, but provides many tools and feature extractors applicable for environmental audio research. |
librosa | Python | Mostly developed for MIR community, but provides many tools and feature extractors applicable for environmental audio research. |
rastamat | Matlab | Versatile MFCC implementation |
VOICEBOX | Matlab | Widely used MFCC implementation |
2.5 Supervised Learning and Recognition
Toolboxes and libraries for machine learning
Toolbox | Language | Description |
---|---|---|
Keras | Python | Neural Network front-end library for fast experimentation with deep neural networks. Can used Theano, TensorFlow and Microsoft Cognitive Toolkit as backend. |
TensorFlow | Python | A symbolic math library, which is widely used for machine learning applications such as deep neural networks. |
Theano | Python | Numerical computation library. Active development ceased on 15th November 2017. |
Microsoft Cognitive Toolkit | C++ | Deep learning framework. |
Torch | Lua | Library for deep machine learning. |
PyTorch | Python | Library for deep machine learning. |
Caffe | C++ / Python / Matlab | Deep learning framework. Originally develop by UC Berkeley. |
scikit-learn | Python | Machine learning library. |
2.6 An Example Approach Based on Neural Networks
Example systems presented in the chapter are released in separate code repository.
2.6.1 Sound Classification
Single-label classification
single_label_classification.py
This is an example application to demonstrate single-label classification. Acoustic scene classification application is used as an example application, and TUT Sound Scenes 2017, development dataset is used as test data. The dataset contains 10 second long audio excerpts from 15 different acoustic scene classes.
Multi-label classification
This is an example application to demonstrate multi-label classification. Audio tagging is used as an example application, and CHiME-Home, development & evaluation dataset is used as test data. The dataset contains 4 second long audio excerpts with varying amount of tags assigned to them. Total of seven tag classes are used in the dataset.
2.6.2 Sound Event Detection
This is an example application to demonstrate detection. Sound event detection is used as an example application, and TUT Sound events 2017, development dataset is used as test data.
Other helpful toolboxes
Toolbox | Language | Description |
---|---|---|
sed_eval | Python | Evaluation toolbox for Sound Event Detection. Documentation |
dcase_util | Python | A collection of utilities for Detection and Classification of Acoustic Scenes and Events. Documentation |