Riferimenti:
cross_topic_1
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_1')
- Descrizione :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Licenza : nessuna licenza conosciuta
- Versione : 1.0.0
- Divide :
Diviso | Esempi |
---|---|
'test' | 207 |
'train' | 112 |
'validation' | 62 |
- Caratteristiche :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_genre_1
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_genre_1')
- Descrizione :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Licenza : nessuna licenza conosciuta
- Versione : 13.0.0
- Divide :
Diviso | Esempi |
---|---|
'test' | 269 |
'train' | 63 |
'validation' | 112 |
- Caratteristiche :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_2
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_2')
- Descrizione :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Licenza : nessuna licenza conosciuta
- Versione : 2.0.0
- Divide :
Diviso | Esempi |
---|---|
'test' | 179 |
'train' | 112 |
'validation' | 90 |
- Caratteristiche :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_3
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_3')
- Descrizione :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Licenza : nessuna licenza conosciuta
- Versione : 3.0.0
- Divide :
Diviso | Esempi |
---|---|
'test' | 152 |
'train' | 112 |
'validation' | 117 |
- Caratteristiche :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_4
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_4')
- Descrizione :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Licenza : nessuna licenza conosciuta
- Versione : 4.0.0
- Divide :
Diviso | Esempi |
---|---|
'test' | 207 |
'train' | 62 |
'validation' | 112 |
- Caratteristiche :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_5
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_5')
- Descrizione :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Licenza : nessuna licenza conosciuta
- Versione : 5.0.0
- Divide :
Diviso | Esempi |
---|---|
'test' | 229 |
'train' | 62 |
'validation' | 90 |
- Caratteristiche :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_6
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_6')
- Descrizione :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Licenza : nessuna licenza conosciuta
- Versione : 6.0.0
- Divide :
Diviso | Esempi |
---|---|
'test' | 202 |
'train' | 62 |
'validation' | 117 |
- Caratteristiche :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_7
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_7')
- Descrizione :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Licenza : nessuna licenza conosciuta
- Versione : 7.0.0
- Divide :
Diviso | Esempi |
---|---|
'test' | 179 |
'train' | 90 |
'validation' | 112 |
- Caratteristiche :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_8
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_8')
- Descrizione :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Licenza : nessuna licenza conosciuta
- Versione : 8.0.0
- Divide :
Diviso | Esempi |
---|---|
'test' | 229 |
'train' | 90 |
'validation' | 62 |
- Caratteristiche :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_9
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_9')
- Descrizione :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Licenza : nessuna licenza conosciuta
- Versione : 9.0.0
- Divide :
Diviso | Esempi |
---|---|
'test' | 174 |
'train' | 90 |
'validation' | 117 |
- Caratteristiche :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_10
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_10')
- Descrizione :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Licenza : nessuna licenza conosciuta
- Versione : 10.0.0
- Divide :
Diviso | Esempi |
---|---|
'test' | 152 |
'train' | 117 |
'validation' | 112 |
- Caratteristiche :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_11
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_11')
- Descrizione :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Licenza : nessuna licenza conosciuta
- Versione : 11.0.0
- Divide :
Diviso | Esempi |
---|---|
'test' | 202 |
'train' | 117 |
'validation' | 62 |
- Caratteristiche :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_12
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_12')
- Descrizione :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Licenza : nessuna licenza conosciuta
- Versione : 12.0.0
- Divide :
Diviso | Esempi |
---|---|
'test' | 174 |
'train' | 117 |
'validation' | 90 |
- Caratteristiche :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_genre_2
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_genre_2')
- Descrizione :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Licenza : nessuna licenza conosciuta
- Versione : 14.0.0
- Divide :
Diviso | Esempi |
---|---|
'test' | 319 |
'train' | 63 |
'validation' | 62 |
- Caratteristiche :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_genre_3
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_genre_3')
- Descrizione :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Licenza : nessuna licenza conosciuta
- Versione : 15.0.0
- Divide :
Diviso | Esempi |
---|---|
'test' | 291 |
'train' | 63 |
'validation' | 90 |
- Caratteristiche :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_genre_4
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_genre_4')
- Descrizione :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Licenza : nessuna licenza conosciuta
- Versione : 16.0.0
- Divide :
Diviso | Esempi |
---|---|
'test' | 264 |
'train' | 63 |
'validation' | 117 |
- Caratteristiche :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}