newsqa

सन्दर्भ:

संयुक्त-सीएसवी

इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:

ds = tfds.load('huggingface:newsqa/combined-csv')

विवरण :

NewsQA is a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text from the corresponding articles.

लाइसेंस : न्यूज़क्यूए कोडकॉपीराइट (सी) माइक्रोसॉफ्ट कॉर्पोरेशन सभी अधिकार सुरक्षित। एमआईटी लाइसेंस
संस्करण : 1.0.0
विभाजन :

विभाजित करना	उदाहरण
`'train'`	119633

विशेषताएँ :

{
    "story_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "story_text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer_char_ranges": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

संयुक्त-json

इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:

ds = tfds.load('huggingface:newsqa/combined-json')

विवरण :

NewsQA is a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text from the corresponding articles.

लाइसेंस : न्यूज़क्यूए कोडकॉपीराइट (सी) माइक्रोसॉफ्ट कॉर्पोरेशन सभी अधिकार सुरक्षित। एमआईटी लाइसेंस
संस्करण : 1.0.0
विभाजन :

विभाजित करना	उदाहरण
`'train'`	12744

विशेषताएँ :

{
    "storyId": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "type": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "questions": {
        "feature": {
            "q": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "isAnswerAbsent": {
                "dtype": "int32",
                "id": null,
                "_type": "Value"
            },
            "isQuestionBad": {
                "dtype": "int32",
                "id": null,
                "_type": "Value"
            },
            "consensus": {
                "s": {
                    "dtype": "int32",
                    "id": null,
                    "_type": "Value"
                },
                "e": {
                    "dtype": "int32",
                    "id": null,
                    "_type": "Value"
                },
                "badQuestion": {
                    "dtype": "bool",
                    "id": null,
                    "_type": "Value"
                },
                "noAnswer": {
                    "dtype": "bool",
                    "id": null,
                    "_type": "Value"
                }
            },
            "answers": {
                "feature": {
                    "sourcerAnswers": {
                        "feature": {
                            "s": {
                                "dtype": "int32",
                                "id": null,
                                "_type": "Value"
                            },
                            "e": {
                                "dtype": "int32",
                                "id": null,
                                "_type": "Value"
                            },
                            "badQuestion": {
                                "dtype": "bool",
                                "id": null,
                                "_type": "Value"
                            },
                            "noAnswer": {
                                "dtype": "bool",
                                "id": null,
                                "_type": "Value"
                            }
                        },
                        "length": -1,
                        "id": null,
                        "_type": "Sequence"
                    }
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            },
            "validated_answers": {
                "feature": {
                    "s": {
                        "dtype": "int32",
                        "id": null,
                        "_type": "Value"
                    },
                    "e": {
                        "dtype": "int32",
                        "id": null,
                        "_type": "Value"
                    },
                    "badQuestion": {
                        "dtype": "bool",
                        "id": null,
                        "_type": "Value"
                    },
                    "noAnswer": {
                        "dtype": "bool",
                        "id": null,
                        "_type": "Value"
                    },
                    "count": {
                        "dtype": "int32",
                        "id": null,
                        "_type": "Value"
                    }
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

विभाजित करना

इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:

ds = tfds.load('huggingface:newsqa/split')

विवरण :

NewsQA is a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text from the corresponding articles.

लाइसेंस : न्यूज़क्यूए कोडकॉपीराइट (सी) माइक्रोसॉफ्ट कॉर्पोरेशन सभी अधिकार सुरक्षित। एमआईटी लाइसेंस
संस्करण : 1.0.0
विभाजन :

विभाजित करना	उदाहरण
`'test'`	5126
`'train'`	92549
`'validation'`	5166

विशेषताएँ :

{
    "story_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "story_text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer_token_ranges": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}