구아크

참조:

다음 명령을 사용하여 TFDS에서 이 데이터세트를 로드합니다.

ds = tfds.load('huggingface:gooaq')
  • 설명 :
GooAQ is a large-scale dataset with a variety of answer types. This dataset contains over
5 million questions and 3 million answers collected from Google. GooAQ questions are collected
semi-automatically from the Google search engine using its autocomplete feature. This results in
naturalistic questions of practical interest that are nonetheless short and expressed using simple
language. GooAQ answers are mined from Google's responses to our collected questions, specifically from
the answer boxes in the search results. This yields a rich space of answer types, containing both
textual answers (short and long) as well as more structured ones such as collections.
  • 라이선스 : Apache 라이선스 버전 2.0에 따라 라이선스가 부여됨
  • 버전 : 1.2.0
  • 분할 :
나뉘다
'test' 2500
'train' 3112679
'validation' 2500
  • 특징 :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "short_answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer_type": {
        "num_classes": 6,
        "names": [
            "feat_snip",
            "collection",
            "knowledge",
            "unit_conv",
            "time_conv",
            "curr_conv"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}