fuss
Stay organized with collections
Save and categorize content based on your preferences.
The Free Universal Sound Separation (FUSS) Dataset is a database of arbitrary
sound mixtures and source-level references, for use in experiments on arbitrary
sound separation.
This is the official sound separation data for the DCASE2020 Challenge Task 4:
Sound Event Detection and Separation in Domestic Environments.
Overview: FUSS audio data is sourced from a pre-release of Freesound dataset
known as (FSD50k), a sound event dataset composed of Freesound content annotated
with labels from the AudioSet Ontology. Using the FSD50K labels, these source
files have been screened such that they likely only contain a single type of
sound. Labels are not provided for these source files, and are not considered
part of the challenge. For the purpose of the DCASE Task4 Sound Separation and
Event Detection challenge, systems should not use FSD50K labels, even though
they may become available upon FSD50K release.
To create mixtures, 10 second clips of sources are convolved with simulated room
impulse responses and added together. Each 10 second mixture contains between 1
and 4 sources. Source files longer than 10 seconds are considered "background"
sources. Every mixture contains one background source, which is active for the
entire duration. We provide: a software recipe to create the dataset, the room
impulse responses, and the original source audio.
Split |
Examples |
'test' |
1,000 |
'train' |
20,000 |
'validation' |
1,000 |
FeaturesDict({
'id': string,
'jams': string,
'mixture_audio': Audio(shape=(160000,), dtype=int16),
'segments': Sequence({
'end_time_seconds': float32,
'label': string,
'start_time_seconds': float32,
}),
'sources': Sequence({
'audio': Audio(shape=(160000,), dtype=int16),
'label': ClassLabel(shape=(), dtype=int64, num_classes=4),
}),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
id |
Tensor |
|
string |
|
jams |
Tensor |
|
string |
|
mixture_audio |
Audio |
(160000,) |
int16 |
|
segments |
Sequence |
|
|
|
segments/end_time_seconds |
Tensor |
|
float32 |
|
segments/label |
Tensor |
|
string |
|
segments/start_time_seconds |
Tensor |
|
float32 |
|
sources |
Sequence |
|
|
|
sources/audio |
Audio |
(160000,) |
int16 |
|
sources/label |
ClassLabel |
|
int64 |
|
\
@inproceedings{wisdom2020fuss,
title = {What's All the {FUSS} About Free Universal Sound Separation Data?},
author = {Scott Wisdom and Hakan Erdogan and Daniel P. W. Ellis and Romain Serizel and Nicolas Turpault and Eduardo Fonseca and Justin Salamon and Prem Seetharaman and John R. Hershey},
year = {2020},
url = {https://arxiv.org/abs/2011.00803},
}
@inproceedings{fonseca2020fsd50k,
author = {Eduardo Fonseca and Xavier Favory and Jordi Pons and Frederic Font Corbera and Xavier Serra},
title = { {FSD}50k: an open dataset of human-labeled sound events},
year = {2020},
url = {https://arxiv.org/abs/2010.00475},
}
fuss/reverberant (default config)
fuss/unprocessed
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-06 UTC.
[null,null,["Last updated 2022-12-06 UTC."],[],[],null,["# fuss\n\n\u003cbr /\u003e\n\n- **Description**:\n\nThe Free Universal Sound Separation (FUSS) Dataset is a database of arbitrary\nsound mixtures and source-level references, for use in experiments on arbitrary\nsound separation.\n\nThis is the official sound separation data for the DCASE2020 Challenge Task 4:\nSound Event Detection and Separation in Domestic Environments.\n\nOverview: FUSS audio data is sourced from a pre-release of Freesound dataset\nknown as (FSD50k), a sound event dataset composed of Freesound content annotated\nwith labels from the AudioSet Ontology. Using the FSD50K labels, these source\nfiles have been screened such that they likely only contain a single type of\nsound. Labels are not provided for these source files, and are not considered\npart of the challenge. For the purpose of the DCASE Task4 Sound Separation and\nEvent Detection challenge, systems should not use FSD50K labels, even though\nthey may become available upon FSD50K release.\n\nTo create mixtures, 10 second clips of sources are convolved with simulated room\nimpulse responses and added together. Each 10 second mixture contains between 1\nand 4 sources. Source files longer than 10 seconds are considered \"background\"\nsources. Every mixture contains one background source, which is active for the\nentire duration. We provide: a software recipe to create the dataset, the room\nimpulse responses, and the original source audio.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/fuss)\n\n- **Homepage** :\n \u003chttps://github.com/google-research/sound-separation/blob/master/datasets/fuss/FUSS_license_doc/README.md\u003e\n\n- **Source code** :\n [`tfds.audio.Fuss`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/audio/fuss.py)\n\n- **Versions**:\n\n - **`1.2.0`** (default): No release notes.\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 1,000 |\n| `'train'` | 20,000 |\n| `'validation'` | 1,000 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'id': string,\n 'jams': string,\n 'mixture_audio': Audio(shape=(160000,), dtype=int16),\n 'segments': Sequence({\n 'end_time_seconds': float32,\n 'label': string,\n 'start_time_seconds': float32,\n }),\n 'sources': Sequence({\n 'audio': Audio(shape=(160000,), dtype=int16),\n 'label': ClassLabel(shape=(), dtype=int64, num_classes=4),\n }),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|-----------------------------|--------------|-----------|---------|-------------|\n| | FeaturesDict | | | |\n| id | Tensor | | string | |\n| jams | Tensor | | string | |\n| mixture_audio | Audio | (160000,) | int16 | |\n| segments | Sequence | | | |\n| segments/end_time_seconds | Tensor | | float32 | |\n| segments/label | Tensor | | string | |\n| segments/start_time_seconds | Tensor | | float32 | |\n| sources | Sequence | | | |\n| sources/audio | Audio | (160000,) | int16 | |\n| sources/label | ClassLabel | | int64 | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('mixture_audio', 'sources')`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Citation**:\n\n \\\n @inproceedings{wisdom2020fuss,\n title = {What's All the {FUSS} About Free Universal Sound Separation Data?},\n author = {Scott Wisdom and Hakan Erdogan and Daniel P. W. Ellis and Romain Serizel and Nicolas Turpault and Eduardo Fonseca and Justin Salamon and Prem Seetharaman and John R. Hershey},\n year = {2020},\n url = {https://arxiv.org/abs/2011.00803},\n }\n\n @inproceedings{fonseca2020fsd50k,\n author = {Eduardo Fonseca and Xavier Favory and Jordi Pons and Frederic Font Corbera and Xavier Serra},\n title = { {FSD}50k: an open dataset of human-labeled sound events},\n year = {2020},\n url = {https://arxiv.org/abs/2010.00475},\n }\n\nfuss/reverberant (default config)\n---------------------------------\n\n- **Config description**: Default reverberated audio.\n\n- **Download size** : `7.35 GiB`\n\n- **Dataset size** : `43.20 GiB`\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nfuss/unprocessed\n----------------\n\n- **Config description**: Unprocessed audio without additional reverberation.\n\n- **Download size** : `8.28 GiB`\n\n- **Dataset size** : `45.58 GiB`\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]