送信済み

参考文献:

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:per_sent')
  • 説明
Person SenTiment (PerSenT) is a crowd-sourced dataset that captures the sentiment of an author towards the main entity in a news article. This dataset contains annotation for 5.3k documents and 38k paragraphs covering 3.2k unique entities.

The dataset consists of sentiment annotations on news articles about people. For each article, annotators judge what the authors sentiment is towards the main (target) entity of the article. The annotations also include similar judgments on paragraphs within the article.

To split the dataset, entities into 4 mutually exclusive sets. Due to the nature of news collections, some entities tend to dominate the collection. In the collection, there were four entities which were the main entity in nearly 800 articles. To avoid these entities from dominating the train or test splits, we moved them to a separate test collection. We split the remaining into a training, dev, and test sets at random. Thus our collection includes one standard test set consisting of articles drawn at random (Test Standard -- `test_random`), while the other is a test set which contains multiple articles about a small number of popular entities (Test Frequent -- `test_fixed`).
  • ライセンス: クリエイティブ コモンズ 表示 4.0 国際ライセンス
  • バージョン: 1.1.0
  • 分割:
スプリット
'test_fixed' 827
'test_random' 579
'train' 3355
'validation' 578
  • 特徴
{
    "DOCUMENT_INDEX": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "TITLE": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "TARGET_ENTITY": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "DOCUMENT": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "MASKED_DOCUMENT": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "TRUE_SENTIMENT": {
        "num_classes": 3,
        "names": [
            "Negative",
            "Neutral",
            "Positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "Paragraph0": {
        "num_classes": 3,
        "names": [
            "Negative",
            "Neutral",
            "Positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "Paragraph1": {
        "num_classes": 3,
        "names": [
            "Negative",
            "Neutral",
            "Positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "Paragraph2": {
        "num_classes": 3,
        "names": [
            "Negative",
            "Neutral",
            "Positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "Paragraph3": {
        "num_classes": 3,
        "names": [
            "Negative",
            "Neutral",
            "Positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "Paragraph4": {
        "num_classes": 3,
        "names": [
            "Negative",
            "Neutral",
            "Positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "Paragraph5": {
        "num_classes": 3,
        "names": [
            "Negative",
            "Neutral",
            "Positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "Paragraph6": {
        "num_classes": 3,
        "names": [
            "Negative",
            "Neutral",
            "Positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "Paragraph7": {
        "num_classes": 3,
        "names": [
            "Negative",
            "Neutral",
            "Positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "Paragraph8": {
        "num_classes": 3,
        "names": [
            "Negative",
            "Neutral",
            "Positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "Paragraph9": {
        "num_classes": 3,
        "names": [
            "Negative",
            "Neutral",
            "Positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "Paragraph10": {
        "num_classes": 3,
        "names": [
            "Negative",
            "Neutral",
            "Positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "Paragraph11": {
        "num_classes": 3,
        "names": [
            "Negative",
            "Neutral",
            "Positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "Paragraph12": {
        "num_classes": 3,
        "names": [
            "Negative",
            "Neutral",
            "Positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "Paragraph13": {
        "num_classes": 3,
        "names": [
            "Negative",
            "Neutral",
            "Positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "Paragraph14": {
        "num_classes": 3,
        "names": [
            "Negative",
            "Neutral",
            "Positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "Paragraph15": {
        "num_classes": 3,
        "names": [
            "Negative",
            "Neutral",
            "Positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}