Datasets and evaluation |

6.3 Obtaining Reference Annotations

Tools for annotating environmental audio:

	Description
Software
Audacity	Audio software with basic annotation capabilities. Use label tracks for the annotations, see more info here.
ELAN	A linguistic annotation tool to create the textual annotations for audio and video files
Prototypes
I-SED	An interactive sound event detector, see Kim2017
Soundscape annotation tool	A tool for soundscape annotation
BAT	BMAT Annotation Tool, see Melendez-Catalan2017
audio-annotator	Audio-annotator, see Cartwright2017

Freely available datasets for sound classification and tagging and sound event detection:

Dataset name	Type	Classes	Examples	Size (min)	Usage, publications
Sound Scenes
Dares G1	recorded	28	123	123	Grootel2009, Mesaros2013
DCASE 2013 Scenes	recorded	10	100	50	Stowell2015
LITIS Rouen	recorded	19	3026	1513	Bisot2015, Rakotomamonjy2015
TUT Sound Scenes 2016	recorded	15	1170	585	DCASE2016, Mesaros2016
Environmental Sounds
ESC-10	collected	10	400	33	Piczak2015a, Hertel2016
ESC-50	collected	50	2000	166	Piczak2015a, Piczak2015b
NYU Urban Sound8K	collected	10	8732	525	Salamon2014
CHIME-Home	recorded	7	6137	409	DCASE2016, Foster2015
Freefield 1010	collected	7	400	33	Stowell2014a
CICESE Sound Events	collected	20	1367	92	Beltran2015
AudioSet	collected	632	>2 mil	> 340k	Gemmeke2017
Sound Events
Dares G1	recorded	761	3214	123	Grootel2009, Mesaros2013
DCASE 2013 Office Live	recorded	16	320	19	DCASE2013, Stowell2015
DCASE 2013 Office Synthetic	recorded	16	320	19	DCASE2013, Stowell2015
TUT Sound Events 2016	recorded	18	954	78	DCASE2016, Mesaros2016b
TUT Sound Events 2017	recorded	6	729	92	DCASE2017
NYU Urban Sound	collected	10	3075	1620	Salamon2014, Salamon2015a, Salomon2015b
TU Dortmund Multichannel	recorded	15	1170	585	Kuerby2016

Data augmentation refers to methods for increasing the amount of development data available without additional recordings.

Here are a few tools for modifying existing audio material:

Toolbox	Language	Description
muda	Python	Annotation-aware musical data augmentation, partly applicable for environmental audio (pitch shifting, time stretching). Documentation
librosa	Python	See time stretching and pitch shifting effects.
TSM toolbox	Matlab	MATLAB implementations of various classical time-scale modification (TSM) algorithm.

Toolbox	Language	Description
sklearn.metrics	Python	Basic score functions, performance metrics and pairwise metrics and distance computations for machine learning development.
sed_eval	Python	Evaluation toolbox for Sound Event Detection. Documentation

Usage examples for different tasks by using basic Python, and both sed_eval and sklearn toolboxes.

Acoustic Scene Classification

	example code
sklearn.metrics	ac_evaluation_sklearn.py (download)
sed_eval	ac_evaluation_sedeval.py (download)

Sound Event Detection

	example code
basic Python	sed_evaluation.py (download)
sed_eval	sed_evaluation_sedeval.py (download)

Audio Tagging

	example code
sed_eval	tag_evaluation_sedeval.py (download)