- Description:
WebVid is a large-scale dataset of short videos with textual descriptions sourced from the web. The videos are diverse and rich in their content.
WebVid-10M contains:
10.7M video-caption pairs. 52K total video hours.
Homepage: https://m-bain.github.io/webvid-dataset/
Source code:
tfds.datasets.webvid.BuilderVersions:
1.0.0(default): Initial release.
Download size:
Unknown sizeDataset size:
Unknown sizeManual download instructions: This dataset requires you to download the source data manually into
download_config.manual_dir(defaults to~/tensorflow_datasets/downloads/manual/):
Follow the download instructions in https://m-bain.github.io/webvid-dataset/ to get the data. Place the csv files and the video directories inmanual_dir/webvid, such that mp4 files are placed inmanual_dir/webvid/*/*_*/*.mp4.
First directory typically being an arbitrary part directory (for sharded downloading), second directory is the page directory (two numbers around underscore), inside of which there is one or more mp4 files.
Auto-cached (documentation): Unknown
Splits:
| Split | Examples |
|---|
- Feature structure:
FeaturesDict({
'caption': Text(shape=(), dtype=string),
'id': Text(shape=(), dtype=string),
'url': Text(shape=(), dtype=string),
'video': Video(Image(shape=(360, 640, 3), dtype=uint8)),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| FeaturesDict | ||||
| caption | Text | string | ||
| id | Text | string | ||
| url | Text | string | ||
| video | Video(Image) | (None, 360, 640, 3) | uint8 |
Supervised keys (See
as_superviseddoc):NoneFigure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe): Missing.
Citation:
@misc{bain2021frozen,
title={Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval},
author={Max Bain and Arsha Nagrani and Gül Varol and Andrew Zisserman},
year={2021},
eprint={2104.00650},
archivePrefix={arXiv},
primaryClass={cs.CV}
}