Join TensorFlow at Google I/O, May 11-12 Register now

scicite

  • Description:

This is a dataset for classifying citation intents in academic papers. The main citation intent label for each Json object is specified with the label key while the citation context is specified in with a context key. Example: { 'string': 'In chacma baboons, male-infant relationships can be linked to both formation of friendships and paternity success [30,31].' 'sectionName': 'Introduction', 'label': 'background', 'citingPaperId': '7a6b2d4b405439', 'citedPaperId': '9d1abadc55b5e0', ... } You may obtain the full information about the paper using the provided paper ids with the Semantic Scholar API (https://api.semanticscholar.org/). The labels are: Method, Background, Result

Split Examples
'test' 1,859
'train' 8,194
'validation' 916
  • Feature structure:
FeaturesDict({
    'citeEnd': tf.int64,
    'citeStart': tf.int64,
    'citedPaperId': Text(shape=(), dtype=tf.string),
    'citingPaperId': Text(shape=(), dtype=tf.string),
    'excerpt_index': tf.int32,
    'id': Text(shape=(), dtype=tf.string),
    'isKeyCitation': tf.bool,
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
    'label2': ClassLabel(shape=(), dtype=tf.int64, num_classes=4),
    'label2_confidence': tf.float32,
    'label_confidence': tf.float32,
    'sectionName': Text(shape=(), dtype=tf.string),
    'source': ClassLabel(shape=(), dtype=tf.int64, num_classes=7),
    'string': Text(shape=(), dtype=tf.string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
citeEnd Tensor tf.int64
citeStart Tensor tf.int64
citedPaperId Text tf.string
citingPaperId Text tf.string
excerpt_index Tensor tf.int32
id Text tf.string
isKeyCitation Tensor tf.bool
label ClassLabel tf.int64
label2 ClassLabel tf.int64
label2_confidence Tensor tf.float32
label_confidence Tensor tf.float32
sectionName Text tf.string
source ClassLabel tf.int64
string Text tf.string
  • Citation:
@InProceedings{Cohan2019Structural,
  author={Arman Cohan and Waleed Ammar and Madeleine Van Zuylen and Field Cady},
  title={Structural Scaffolds for Citation Intent Classification in Scientific Publications},
  booktitle="NAACL",
  year="2019"
}