TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

fuss

Description:

The Free Universal Sound Separation (FUSS) Dataset is a database of arbitrary sound mixtures and source-level references, for use in experiments on arbitrary sound separation.

This is the official sound separation data for the DCASE2020 Challenge Task 4: Sound Event Detection and Separation in Domestic Environments.

Overview: FUSS audio data is sourced from a pre-release of Freesound dataset known as (FSD50k), a sound event dataset composed of Freesound content annotated with labels from the AudioSet Ontology. Using the FSD50K labels, these source files have been screened such that they likely only contain a single type of sound. Labels are not provided for these source files, and are not considered part of the challenge. For the purpose of the DCASE Task4 Sound Separation and Event Detection challenge, systems should not use FSD50K labels, even though they may become available upon FSD50K release.

To create mixtures, 10 second clips of sources are convolved with simulated room impulse responses and added together. Each 10 second mixture contains between 1 and 4 sources. Source files longer than 10 seconds are considered "background" sources. Every mixture contains one background source, which is active for the entire duration. We provide: a software recipe to create the dataset, the room impulse responses, and the original source audio.

Additional Documentation: Explore on Papers With Code
Homepage: https://github.com/google-research/sound-separation/blob/master/datasets/fuss/FUSS_license_doc/README.md
Source code: tfds.audio.Fuss
Versions:
- 1.2.0 (default): No release notes.
Auto-cached (documentation): No
Splits:

Split	Examples
`'test'`	1,000
`'train'`	20,000
`'validation'`	1,000

Feature structure:

FeaturesDict({
    'id': string,
    'jams': string,
    'mixture_audio': Audio(shape=(160000,), dtype=int16),
    'segments': Sequence({
        'end_time_seconds': float32,
        'label': string,
        'start_time_seconds': float32,
    }),
    'sources': Sequence({
        'audio': Audio(shape=(160000,), dtype=int16),
        'label': ClassLabel(shape=(), dtype=int64, num_classes=4),
    }),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
id	Tensor		string
jams	Tensor		string
mixture_audio	Audio	(160000,)	int16
segments	Sequence
segments/end_time_seconds	Tensor		float32
segments/label	Tensor		string
segments/start_time_seconds	Tensor		float32
sources	Sequence
sources/audio	Audio	(160000,)	int16
sources/label	ClassLabel		int64

Supervised keys (See as_supervised doc): ('mixture_audio', 'sources')
Figure (tfds.show_examples): Not supported.
Citation:

\
@inproceedings{wisdom2020fuss,
  title = {What's All the {FUSS} About Free Universal Sound Separation Data?},
  author = {Scott Wisdom and Hakan Erdogan and Daniel P. W. Ellis and Romain Serizel and Nicolas Turpault and Eduardo Fonseca and Justin Salamon and Prem Seetharaman and John R. Hershey},
  year = {2020},
  url = {https://arxiv.org/abs/2011.00803},
}

@inproceedings{fonseca2020fsd50k,
  author = {Eduardo Fonseca and Xavier Favory and Jordi Pons and Frederic Font Corbera and Xavier Serra},
  title = { {FSD}50k: an open dataset of human-labeled sound events},
  year = {2020},
  url = {https://arxiv.org/abs/2010.00475},
}

fuss/reverberant (default config)

Config description: Default reverberated audio.
Download size: 7.35 GiB
Dataset size: 43.20 GiB
Examples (tfds.as_dataframe):

fuss/unprocessed

Config description: Unprocessed audio without additional reverberation.
Download size: 8.28 GiB
Dataset size: 45.58 GiB
Examples (tfds.as_dataframe):

fuss Stay organized with collections Save and categorize content based on your preferences.

fuss/reverberant (default config)

fuss/unprocessed

fuss