deep1b

  • Description:

Pre-trained embeddings for approximate nearest neighbor search using the cosine distance. This dataset consists of two splits:

  1. 'database': consists of 9,990,000 data points, each has features: 'embedding' (96 floats), 'index' (int64), 'neighbors' (empty list).
  2. 'test': consists of 10,000 data points, each has features: 'embedding' (96 floats), 'index' (int64), 'neighbors' (list of 'index' and 'distance' of the nearest neighbors in the database.)
Split Examples
'database' 9,990,000
'test' 10,000
  • Feature structure:
FeaturesDict({
    'embedding': Tensor(shape=(96,), dtype=float32),
    'index': Scalar(shape=(), dtype=int64),
    'neighbors': Sequence({
        'distance': Scalar(shape=(), dtype=float32),
        'index': Scalar(shape=(), dtype=int64),
    }),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
embedding Tensor (96,) float32
index Scalar int64 Index within the split.
neighbors Sequence The computed neighbors, which is only available for the test split.
neighbors/distance Scalar float32 Neighbor distance.
neighbors/index Scalar int64 Neighbor index.
  • Citation:
@inproceedings{babenko2016efficient,
  title={Efficient indexing of billion-scale datasets of deep descriptors},
  author={Babenko, Artem and Lempitsky, Victor},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={2055--2063},
  year={2016}
}