6.3 Obtaining Reference Annotations

Tools for annotating environmental audio:

Audacity  Audio software with basic annotation capabilities. Use label tracks for the annotations, see more info here.
 ELAN  A linguistic annotation tool to create the textual annotations for audio and video files
 I-SED An interactive sound event detector, see Kim2017
 Soundscape annotation tool  A tool for soundscape annotation
BAT  BMAT Annotation Tool, see Melendez-Catalan2017
 audio-annotator  Audio-annotator, see Cartwright2017

6.4 Datasets for Environmental Sound Classification and Detection

6.4.2 Available Datasets

Freely available datasets for sound classification and tagging and sound event detection:

Dataset name Type Classes Examples Size (min) Usage, publications
Sound Scenes          
Dares G1 recorded 28 123 123 Grootel2009, Mesaros2013
DCASE 2013 Scenes recorded 10 100 50 Stowell2015
LITIS Rouen recorded 19 3026 1513 Bisot2015, Rakotomamonjy2015
TUT Sound Scenes 2016 recorded 15 1170 585 DCASE2016, Mesaros2016
Environmental Sounds          
ESC-10 collected 10 400 33 Piczak2015a, Hertel2016
ESC-50 collected 50 2000 166 Piczak2015a, Piczak2015b
NYU Urban Sound8K collected 10 8732 525 Salamon2014
CHIME-Home recorded 7 6137 409 DCASE2016, Foster2015
Freefield 1010 collected 7 400 33 Stowell2014a
CICESE Sound Events collected 20 1367 92 Beltran2015
AudioSet collected 632 >2 mil > 340k Gemmeke2017
Sound Events          
Dares G1 recorded 761 3214 123 Grootel2009, Mesaros2013
DCASE 2013 Office Live recorded 16 320 19 DCASE2013, Stowell2015
DCASE 2013 Office Synthetic recorded 16 320 19 DCASE2013, Stowell2015
TUT Sound Events 2016 recorded 18 954 78 DCASE2016, Mesaros2016b
TUT Sound Events 2017 recorded 6 729 92 DCASE2017
NYU Urban Sound collected 10 3075 1620 Salamon2014, Salamon2015a, Salomon2015b
TU Dortmund Multichannel recorded 15 1170 585 Kuerby2016

6.4.3 Data Augmentation

Data augmentation refers to methods for increasing the amount of development data available without additional recordings.

Here are a few tools for modifying existing audio material:

Toolbox Language  Description
muda Python  Annotation-aware musical data augmentation, partly applicable for environmental audio (pitch shifting, time stretching). Documentation
librosa Python See time stretching and pitch shifting effects.
TSM toolbox Matlab MATLAB implementations of various classical time-scale modification (TSM) algorithm.

6.5 Evaluation Metrics

Toolbox Language   Description
sklearn.metrics Python Basic score functions, performance metrics and pairwise metrics and distance computations for machine learning development.
sed_eval Python  Evaluation toolbox for Sound Event Detection. Documentation

Usage examples for different tasks by using basic Python, and both sed_eval and sklearn toolboxes.

Acoustic Scene Classification

  example code  
sklearn.metrics ac_evaluation_sklearn.py (download  
sed_eval ac_evaluation_sedeval.py (download)  

Sound Event Detection

  example code  
basic Python sed_evaluation.py (download)  
 sed_eval sed_evaluation_sedeval.py (download)  

Audio Tagging

  example code  
sed_eval tag_evaluation_sedeval.py (download)