scicite
Stay organized with collections
Save and categorize content based on your preferences.
This is a dataset for classifying citation intents in academic papers. The main
citation intent label for each Json object is specified with the label key while
the citation context is specified in with a context key. Example:
{
'string': 'In chacma baboons, male-infant relationships can be linked to both
formation of friendships and paternity success [30,31].'
'sectionName': 'Introduction',
'label': 'background',
'citingPaperId': '7a6b2d4b405439',
'citedPaperId': '9d1abadc55b5e0',
...
}
You may obtain the full information about the paper using the provided paper ids
with the Semantic Scholar API (https://api.semanticscholar.org/).
The labels are: Method, Background, Result
Split |
Examples |
'test' |
1,859 |
'train' |
8,194 |
'validation' |
916 |
FeaturesDict({
'citeEnd': int64,
'citeStart': int64,
'citedPaperId': Text(shape=(), dtype=string),
'citingPaperId': Text(shape=(), dtype=string),
'excerpt_index': int32,
'id': Text(shape=(), dtype=string),
'isKeyCitation': bool,
'label': ClassLabel(shape=(), dtype=int64, num_classes=3),
'label2': ClassLabel(shape=(), dtype=int64, num_classes=4),
'label2_confidence': float32,
'label_confidence': float32,
'sectionName': Text(shape=(), dtype=string),
'source': ClassLabel(shape=(), dtype=int64, num_classes=7),
'string': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
citeEnd |
Tensor |
|
int64 |
|
citeStart |
Tensor |
|
int64 |
|
citedPaperId |
Text |
|
string |
|
citingPaperId |
Text |
|
string |
|
excerpt_index |
Tensor |
|
int32 |
|
id |
Text |
|
string |
|
isKeyCitation |
Tensor |
|
bool |
|
label |
ClassLabel |
|
int64 |
|
label2 |
ClassLabel |
|
int64 |
|
label2_confidence |
Tensor |
|
float32 |
|
label_confidence |
Tensor |
|
float32 |
|
sectionName |
Text |
|
string |
|
source |
ClassLabel |
|
int64 |
|
string |
Text |
|
string |
|
@InProceedings{Cohan2019Structural,
author={Arman Cohan and Waleed Ammar and Madeleine Van Zuylen and Field Cady},
title={Structural Scaffolds for Citation Intent Classification in Scientific Publications},
booktitle="NAACL",
year="2019"
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-23 UTC.
[null,null,["Last updated 2022-12-23 UTC."],[],[],null,["# scicite\n\n\u003cbr /\u003e\n\n- **Description**:\n\nThis is a dataset for classifying citation intents in academic papers. The main\ncitation intent label for each Json object is specified with the label key while\nthe citation context is specified in with a context key. Example: \n\n {\n 'string': 'In chacma baboons, male-infant relationships can be linked to both\n formation of friendships and paternity success [30,31].'\n 'sectionName': 'Introduction',\n 'label': 'background',\n 'citingPaperId': '7a6b2d4b405439',\n 'citedPaperId': '9d1abadc55b5e0',\n ...\n }\n\nYou may obtain the full information about the paper using the provided paper ids\nwith the Semantic Scholar API (\u003chttps://api.semanticscholar.org/\u003e).\n\nThe labels are: Method, Background, Result\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/scicite)\n\n- **Homepage** :\n \u003chttps://github.com/allenai/scicite\u003e\n\n- **Source code** :\n [`tfds.datasets.scicite.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/scicite/scicite_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): No release notes.\n- **Download size** : `22.12 MiB`\n\n- **Dataset size** : `7.26 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 1,859 |\n| `'train'` | 8,194 |\n| `'validation'` | 916 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'citeEnd': int64,\n 'citeStart': int64,\n 'citedPaperId': Text(shape=(), dtype=string),\n 'citingPaperId': Text(shape=(), dtype=string),\n 'excerpt_index': int32,\n 'id': Text(shape=(), dtype=string),\n 'isKeyCitation': bool,\n 'label': ClassLabel(shape=(), dtype=int64, num_classes=3),\n 'label2': ClassLabel(shape=(), dtype=int64, num_classes=4),\n 'label2_confidence': float32,\n 'label_confidence': float32,\n 'sectionName': Text(shape=(), dtype=string),\n 'source': ClassLabel(shape=(), dtype=int64, num_classes=7),\n 'string': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|-------------------|--------------|-------|---------|-------------|\n| | FeaturesDict | | | |\n| citeEnd | Tensor | | int64 | |\n| citeStart | Tensor | | int64 | |\n| citedPaperId | Text | | string | |\n| citingPaperId | Text | | string | |\n| excerpt_index | Tensor | | int32 | |\n| id | Text | | string | |\n| isKeyCitation | Tensor | | bool | |\n| label | ClassLabel | | int64 | |\n| label2 | ClassLabel | | int64 | |\n| label2_confidence | Tensor | | float32 | |\n| label_confidence | Tensor | | float32 | |\n| sectionName | Text | | string | |\n| source | ClassLabel | | int64 | |\n| string | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('string', 'label')`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @InProceedings{Cohan2019Structural,\n author={Arman Cohan and Waleed Ammar and Madeleine Van Zuylen and Field Cady},\n title={Structural Scaffolds for Citation Intent Classification in Scientific Publications},\n booktitle=\"NAACL\",\n year=\"2019\"\n }"]]