• Description:

The Free Universal Sound Separation (FUSS) Dataset is a database of arbitrary sound mixtures and source-level references, for use in experiments on arbitrary sound separation.

This is the official sound separation data for the DCASE2020 Challenge Task 4: Sound Event Detection and Separation in Domestic Environments.

Overview: FUSS audio data is sourced from a pre-release of Freesound dataset known as (FSD50k), a sound event dataset composed of Freesound content annotated with labels from the AudioSet Ontology. Using the FSD50K labels, these source files have been screened such that they likely only contain a single type of sound. Labels are not provided for these source files, and are not considered part of the challenge. For the purpose of the DCASE Task4 Sound Separation and Event Detection challenge, systems should not use FSD50K labels, even though they may become available upon FSD50K release.

To create mixtures, 10 second clips of sources are convolved with simulated room impulse responses and added together. Each 10 second mixture contains between 1 and 4 sources. Source files longer than 10 seconds are considered "background" sources. Every mixture contains one background source, which is active for the entire duration. We provide: a software recipe to create the dataset, the room impulse responses, and the original source audio.

Split Examples
'test' 1,000
'train' 20,000
'validation' 1,000
  • Feature structure:
    'id': string,
    'jams': string,
    'mixture_audio': Audio(shape=(160000,), dtype=int16),
    'segments': Sequence({
        'end_time_seconds': float32,
        'label': string,
        'start_time_seconds': float32,
    'sources': Sequence({
        'audio': Audio(shape=(160000,), dtype=int16),
        'label': ClassLabel(shape=(), dtype=int64, num_classes=4),
  • Feature documentation:
Feature Class Shape Dtype Description
id Tensor string
jams Tensor string
mixture_audio Audio (160000,) int16
segments Sequence
segments/end_time_seconds Tensor float32
segments/label Tensor string
segments/start_time_seconds Tensor float32
sources Sequence
sources/audio Audio (160000,) int16
sources/label ClassLabel int64
  title = {What's All the {FUSS} About Free Universal Sound Separation Data?},
  author = {Scott Wisdom and Hakan Erdogan and Daniel P. W. Ellis and Romain Serizel and Nicolas Turpault and Eduardo Fonseca and Justin Salamon and Prem Seetharaman and John R. Hershey},
  year = {2020},
  url = {https://arxiv.org/abs/2011.00803},

  author = {Eduardo Fonseca and Xavier Favory and Jordi Pons and Frederic Font Corbera and Xavier Serra},
  title = { {FSD}50k: an open dataset of human-labeled sound events},
  year = {2020},
  url = {https://arxiv.org/abs/2010.00475},

fuss/reverberant (default config)

  • Config description: Default reverberated audio.

  • Download size: 7.35 GiB

  • Dataset size: 43.20 GiB

  • Examples (tfds.as_dataframe):


  • Config description: Unprocessed audio without additional reverberation.

  • Download size: 8.28 GiB

  • Dataset size: 45.58 GiB

  • Examples (tfds.as_dataframe):