- Description:
WebVid is a large-scale dataset of short videos with textual descriptions sourced from the web. The videos are diverse and rich in their content.
WebVid-10M contains:
10.7M video-caption pairs. 52K total video hours.
Homepage: https://m-bain.github.io/webvid-dataset/
Source code:
tfds.datasets.webvid.Builder
Versions:
1.0.0
(default): Initial release.
Download size:
Unknown size
Dataset size:
Unknown size
Manual download instructions: This dataset requires you to download the source data manually into
download_config.manual_dir
(defaults to~/tensorflow_datasets/downloads/manual/
):
Follow the download instructions in https://m-bain.github.io/webvid-dataset/ to get the data. Place the csv files and the video directories inmanual_dir/webvid
, such that mp4 files are placed inmanual_dir/webvid/*/*_*/*.mp4
.
First directory typically being an arbitrary part directory (for sharded downloading), second directory is the page directory (two numbers around underscore), inside of which there is one or more mp4 files.
Auto-cached (documentation): Unknown
Splits:
Split | Examples |
---|
- Feature structure:
FeaturesDict({
'caption': Text(shape=(), dtype=string),
'id': Text(shape=(), dtype=string),
'url': Text(shape=(), dtype=string),
'video': Video(Image(shape=(360, 640, 3), dtype=uint8)),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
caption | Text | string | ||
id | Text | string | ||
url | Text | string | ||
video | Video(Image) | (None, 360, 640, 3) | uint8 |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe): Missing.
Citation:
@misc{bain2021frozen,
title={Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval},
author={Max Bain and Arsha Nagrani and Gül Varol and Andrew Zisserman},
year={2021},
eprint={2104.00650},
archivePrefix={arXiv},
primaryClass={cs.CV}
}