ニュースqa

参考文献:

結合CSV

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:newsqa/combined-csv')

説明：

NewsQA is a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text from the corresponding articles.

ライセンス: NewsQA CodeCopyright (c) Microsoft Corporation 無断複写・転載を禁じます。 MITライセンス
バージョン: 1.0.0
分割:

スプリット	例
`'train'`	119633

特徴：

{
    "story_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "story_text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer_char_ranges": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

結合されたjson

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:newsqa/combined-json')

説明：

NewsQA is a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text from the corresponding articles.

ライセンス: NewsQA CodeCopyright (c) Microsoft Corporation 無断複写・転載を禁じます。 MITライセンス
バージョン: 1.0.0
分割:

スプリット	例
`'train'`	12744

特徴：

{
    "storyId": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "type": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "questions": {
        "feature": {
            "q": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "isAnswerAbsent": {
                "dtype": "int32",
                "id": null,
                "_type": "Value"
            },
            "isQuestionBad": {
                "dtype": "int32",
                "id": null,
                "_type": "Value"
            },
            "consensus": {
                "s": {
                    "dtype": "int32",
                    "id": null,
                    "_type": "Value"
                },
                "e": {
                    "dtype": "int32",
                    "id": null,
                    "_type": "Value"
                },
                "badQuestion": {
                    "dtype": "bool",
                    "id": null,
                    "_type": "Value"
                },
                "noAnswer": {
                    "dtype": "bool",
                    "id": null,
                    "_type": "Value"
                }
            },
            "answers": {
                "feature": {
                    "sourcerAnswers": {
                        "feature": {
                            "s": {
                                "dtype": "int32",
                                "id": null,
                                "_type": "Value"
                            },
                            "e": {
                                "dtype": "int32",
                                "id": null,
                                "_type": "Value"
                            },
                            "badQuestion": {
                                "dtype": "bool",
                                "id": null,
                                "_type": "Value"
                            },
                            "noAnswer": {
                                "dtype": "bool",
                                "id": null,
                                "_type": "Value"
                            }
                        },
                        "length": -1,
                        "id": null,
                        "_type": "Sequence"
                    }
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            },
            "validated_answers": {
                "feature": {
                    "s": {
                        "dtype": "int32",
                        "id": null,
                        "_type": "Value"
                    },
                    "e": {
                        "dtype": "int32",
                        "id": null,
                        "_type": "Value"
                    },
                    "badQuestion": {
                        "dtype": "bool",
                        "id": null,
                        "_type": "Value"
                    },
                    "noAnswer": {
                        "dtype": "bool",
                        "id": null,
                        "_type": "Value"
                    },
                    "count": {
                        "dtype": "int32",
                        "id": null,
                        "_type": "Value"
                    }
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

スプリット

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:newsqa/split')

説明：

NewsQA is a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text from the corresponding articles.

ライセンス: NewsQA CodeCopyright (c) Microsoft Corporation 無断複写・転載を禁じます。 MITライセンス
バージョン: 1.0.0
分割:

スプリット	例
`'test'`	5126
`'train'`	92549
`'validation'`	5166

特徴：

{
    "story_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "story_text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer_token_ranges": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}