hellaswag

  • Description:

The HellaSwag dataset is a benchmark for Commonsense NLI. It includes a context and some endings which complete the context.

Split Examples
'test' 10,003
'test_ind_activitynet' 1,870
'test_ind_wikihow' 3,132
'test_ood_activitynet' 1,651
'test_ood_wikihow' 3,350
'train' 39,905
'train_activitynet' 14,740
'train_wikihow' 25,165
'validation' 10,042
'validation_ind_activitynet' 1,809
'validation_ind_wikihow' 3,192
'validation_ood_activitynet' 1,434
'validation_ood_wikihow' 3,607
  • Feature structure:
FeaturesDict({
    'activity_label': Text(shape=(), dtype=string),
    'context': Text(shape=(), dtype=string),
    'endings': Sequence(Text(shape=(), dtype=string)),
    'label': int32,
    'source_id': Text(shape=(), dtype=string),
    'split_type': Text(shape=(), dtype=string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
activity_label Text string
context Text string
endings Sequence(Text) (None,) string
label Tensor int32
source_id Text string
split_type Text string
  • Citation:
@inproceedings{zellers2019hellaswag,
    title={HellaSwag: Can a Machine Really Finish Your Sentence?},
    author={Zellers, Rowan and Holtzman, Ari and Bisk, Yonatan and Farhadi, Ali and Choi, Yejin},
    booktitle ={Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
    year={2019}
}