TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

break_data

参考：

QDMR-high-level

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:break_data/QDMR-high-level')

说明：

Break is a human annotated dataset of natural language questions and their Question Decomposition Meaning Representations
(QDMRs). Break consists of 83,978 examples sampled from 10 question answering datasets over text, images and databases.
This repository contains the Break dataset along with information on the exact data format.

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'test'`	3195
`'train'`	17503
`'validation'`	3130

特征：

{
    "question_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question_text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "decomposition": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "operators": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "split": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

QDMR-high-level-lexicon

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:break_data/QDMR-high-level-lexicon')

说明：

Break is a human annotated dataset of natural language questions and their Question Decomposition Meaning Representations
(QDMRs). Break consists of 83,978 examples sampled from 10 question answering datasets over text, images and databases.
This repository contains the Break dataset along with information on the exact data format.

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'test'`	3195
`'train'`	17503
`'validation'`	3130

特征：

{
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "allowed_tokens": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

QDMR

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:break_data/QDMR')

说明：

Break is a human annotated dataset of natural language questions and their Question Decomposition Meaning Representations
(QDMRs). Break consists of 83,978 examples sampled from 10 question answering datasets over text, images and databases.
This repository contains the Break dataset along with information on the exact data format.

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'test'`	8069
`'train'`	44321
`'validation'`	7760

特征：

{
    "question_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question_text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "decomposition": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "operators": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "split": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

QDMR-lexicon

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:break_data/QDMR-lexicon')

说明：

Break is a human annotated dataset of natural language questions and their Question Decomposition Meaning Representations
(QDMRs). Break consists of 83,978 examples sampled from 10 question answering datasets over text, images and databases.
This repository contains the Break dataset along with information on the exact data format.

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'test'`	8069
`'train'`	44321
`'validation'`	7760

特征：

{
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "allowed_tokens": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

logical-forms

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:break_data/logical-forms')

说明：

Break is a human annotated dataset of natural language questions and their Question Decomposition Meaning Representations
(QDMRs). Break consists of 83,978 examples sampled from 10 question answering datasets over text, images and databases.
This repository contains the Break dataset along with information on the exact data format.

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'test'`	8006
`'train'`	44098
`'validation'`	7719

特征：

{
    "question_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question_text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "decomposition": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "operators": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "split": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "program": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

break_data 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

QDMR-high-level

QDMR-high-level-lexicon

QDMR

QDMR-lexicon

logical-forms

break_data