break_data

参考:

QDMR-high-level

使用以下命令在 TFDS 中加载此数据集:

ds = tfds.load('huggingface:break_data/QDMR-high-level')
  • 说明
Break is a human annotated dataset of natural language questions and their Question Decomposition Meaning Representations
(QDMRs). Break consists of 83,978 examples sampled from 10 question answering datasets over text, images and databases.
This repository contains the Break dataset along with information on the exact data format.
  • 许可:无已知许可
  • 版本:1.0.0
  • 拆分
拆分 样本
'test' 3195
'train' 17503
'validation' 3130
  • 特征
{
    "question_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question_text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "decomposition": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "operators": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "split": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

QDMR-high-level-lexicon

使用以下命令在 TFDS 中加载此数据集:

ds = tfds.load('huggingface:break_data/QDMR-high-level-lexicon')
  • 说明
Break is a human annotated dataset of natural language questions and their Question Decomposition Meaning Representations
(QDMRs). Break consists of 83,978 examples sampled from 10 question answering datasets over text, images and databases.
This repository contains the Break dataset along with information on the exact data format.
  • 许可:无已知许可
  • 版本:1.0.0
  • 拆分
拆分 样本
'test' 3195
'train' 17503
'validation' 3130
  • 特征
{
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "allowed_tokens": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

QDMR

使用以下命令在 TFDS 中加载此数据集:

ds = tfds.load('huggingface:break_data/QDMR')
  • 说明
Break is a human annotated dataset of natural language questions and their Question Decomposition Meaning Representations
(QDMRs). Break consists of 83,978 examples sampled from 10 question answering datasets over text, images and databases.
This repository contains the Break dataset along with information on the exact data format.
  • 许可:无已知许可
  • 版本:1.0.0
  • 拆分
拆分 样本
'test' 8069
'train' 44321
'validation' 7760
  • 特征
{
    "question_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question_text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "decomposition": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "operators": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "split": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

QDMR-lexicon

使用以下命令在 TFDS 中加载此数据集:

ds = tfds.load('huggingface:break_data/QDMR-lexicon')
  • 说明
Break is a human annotated dataset of natural language questions and their Question Decomposition Meaning Representations
(QDMRs). Break consists of 83,978 examples sampled from 10 question answering datasets over text, images and databases.
This repository contains the Break dataset along with information on the exact data format.
  • 许可:无已知许可
  • 版本:1.0.0
  • 拆分
拆分 样本
'test' 8069
'train' 44321
'validation' 7760
  • 特征
{
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "allowed_tokens": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

logical-forms

使用以下命令在 TFDS 中加载此数据集:

ds = tfds.load('huggingface:break_data/logical-forms')
  • 说明
Break is a human annotated dataset of natural language questions and their Question Decomposition Meaning Representations
(QDMRs). Break consists of 83,978 examples sampled from 10 question answering datasets over text, images and databases.
This repository contains the Break dataset along with information on the exact data format.
  • 许可:无已知许可
  • 版本:1.0.0
  • 拆分
拆分 样本
'test' 8006
'train' 44098
'validation' 7719
  • 特征
{
    "question_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question_text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "decomposition": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "operators": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "split": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "program": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}