सन्दर्भ:
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:s2orc')
- विवरण :
A large corpus of 81.1M English-language academic papers spanning many academic disciplines.
Rich metadata, paper abstracts, resolved bibliographic references, as well as structured full
text for 8.1M open access papers. Full text annotated with automatically-detected inline mentions of
citations, figures, and tables, each linked to their corresponding paper objects. Aggregated papers
from hundreds of academic publishers and digital archives into a unified source, and create the largest
publicly-available collection of machine-readable academic text to date.
- लाइसेंस : सिमेंटिक स्कॉलर ओपन रिसर्च कॉर्पस को ODC-BY के तहत लाइसेंस प्राप्त है।
- संस्करण : 1.1.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 189674763 |
- विशेषताएँ :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"title": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paperAbstract": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"entities": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"s2Url": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"pdfUrls": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"s2PdfUrl": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"authors": [
{
"name": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"ids": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
],
"inCitations": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"outCitations": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"fieldsOfStudy": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"year": {
"dtype": "int32",
"id": null,
"_type": "Value"
},
"venue": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"journalName": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"journalVolume": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"journalPages": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sources": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"doi": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"doiUrl": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"pmid": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"magId": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}