Aprenda o que há de mais recente em aprendizado de máquina, IA generativa e muito mais no WiML Symposium 2023 Registre-se

Esta página foi traduzida pela API Cloud Translation.

cola_indic

Referências:

wnli.pt

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wnli.en')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The Winograd Schema Challenge (Levesque et al., 2011) is a reading comprehension task
in which a system must read a sentence with a pronoun and select the referent of that pronoun from
a list of choices. The examples are manually constructed to foil simple statistical methods: Each
one is contingent on contextual information provided by a single word or phrase in the sentence.
To convert the problem into sentence pair classification, we construct sentence pairs by replacing
the ambiguous pronoun with each possible referent. The task is to predict if the sentence with the
pronoun substituted is entailed by the original sentence. We use a small evaluation set consisting of
new examples derived from fiction books that was shared privately by the authors of the original
corpus. While the included training set is balanced between two classes, the test set is imbalanced
between them (65% not entailment). Also, due to a data quirk, the development set is adversarial:
hypotheses are sometimes shared between training and development examples, so if a model memorizes the
training examples, they will predict the wrong label on corresponding development set
example. As with QNLI, each example is evaluated separately, so there is not a systematic correspondence
between a model's score on this task and its score on the unconverted original task. We
call converted dataset WNLI (Winograd NLI). This dataset is translated and publicly released for 3
Indian languages by AI4Bharat.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	146
`'train'`	635
`'validation'`	71

Características :

{
    "hypothesis": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "premise": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 3,
        "names": [
            "not_entailment",
            "entailment",
            "None"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

wnli.hi

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wnli.hi')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The Winograd Schema Challenge (Levesque et al., 2011) is a reading comprehension task
in which a system must read a sentence with a pronoun and select the referent of that pronoun from
a list of choices. The examples are manually constructed to foil simple statistical methods: Each
one is contingent on contextual information provided by a single word or phrase in the sentence.
To convert the problem into sentence pair classification, we construct sentence pairs by replacing
the ambiguous pronoun with each possible referent. The task is to predict if the sentence with the
pronoun substituted is entailed by the original sentence. We use a small evaluation set consisting of
new examples derived from fiction books that was shared privately by the authors of the original
corpus. While the included training set is balanced between two classes, the test set is imbalanced
between them (65% not entailment). Also, due to a data quirk, the development set is adversarial:
hypotheses are sometimes shared between training and development examples, so if a model memorizes the
training examples, they will predict the wrong label on corresponding development set
example. As with QNLI, each example is evaluated separately, so there is not a systematic correspondence
between a model's score on this task and its score on the unconverted original task. We
call converted dataset WNLI (Winograd NLI). This dataset is translated and publicly released for 3
Indian languages by AI4Bharat.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	146
`'train'`	635
`'validation'`	71

Características :

{
    "hypothesis": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "premise": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 3,
        "names": [
            "not_entailment",
            "entailment",
            "None"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

wnli.gu

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wnli.gu')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The Winograd Schema Challenge (Levesque et al., 2011) is a reading comprehension task
in which a system must read a sentence with a pronoun and select the referent of that pronoun from
a list of choices. The examples are manually constructed to foil simple statistical methods: Each
one is contingent on contextual information provided by a single word or phrase in the sentence.
To convert the problem into sentence pair classification, we construct sentence pairs by replacing
the ambiguous pronoun with each possible referent. The task is to predict if the sentence with the
pronoun substituted is entailed by the original sentence. We use a small evaluation set consisting of
new examples derived from fiction books that was shared privately by the authors of the original
corpus. While the included training set is balanced between two classes, the test set is imbalanced
between them (65% not entailment). Also, due to a data quirk, the development set is adversarial:
hypotheses are sometimes shared between training and development examples, so if a model memorizes the
training examples, they will predict the wrong label on corresponding development set
example. As with QNLI, each example is evaluated separately, so there is not a systematic correspondence
between a model's score on this task and its score on the unconverted original task. We
call converted dataset WNLI (Winograd NLI). This dataset is translated and publicly released for 3
Indian languages by AI4Bharat.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	146
`'train'`	635
`'validation'`	71

Características :

{
    "hypothesis": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "premise": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 3,
        "names": [
            "not_entailment",
            "entailment",
            "None"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

wnli.mr

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wnli.mr')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The Winograd Schema Challenge (Levesque et al., 2011) is a reading comprehension task
in which a system must read a sentence with a pronoun and select the referent of that pronoun from
a list of choices. The examples are manually constructed to foil simple statistical methods: Each
one is contingent on contextual information provided by a single word or phrase in the sentence.
To convert the problem into sentence pair classification, we construct sentence pairs by replacing
the ambiguous pronoun with each possible referent. The task is to predict if the sentence with the
pronoun substituted is entailed by the original sentence. We use a small evaluation set consisting of
new examples derived from fiction books that was shared privately by the authors of the original
corpus. While the included training set is balanced between two classes, the test set is imbalanced
between them (65% not entailment). Also, due to a data quirk, the development set is adversarial:
hypotheses are sometimes shared between training and development examples, so if a model memorizes the
training examples, they will predict the wrong label on corresponding development set
example. As with QNLI, each example is evaluated separately, so there is not a systematic correspondence
between a model's score on this task and its score on the unconverted original task. We
call converted dataset WNLI (Winograd NLI). This dataset is translated and publicly released for 3
Indian languages by AI4Bharat.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	146
`'train'`	635
`'validation'`	71

Características :

{
    "hypothesis": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "premise": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 3,
        "names": [
            "not_entailment",
            "entailment",
            "None"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

copa.pt

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/copa.en')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The Choice Of Plausible Alternatives (COPA) evaluation provides researchers with a tool for assessing
progress in open-domain commonsense causal reasoning. COPA consists of 1000 questions, split equally
into development and test sets of 500 questions each. Each question is composed of a premise and two
alternatives, where the task is to select the alternative that more plausibly has a causal relation
with the premise. The correct alternative is randomized so that the expected performance of randomly
guessing is 50%. This dataset is translated and publicly released for 3 languages by AI4Bharat.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	500
`'train'`	400
`'validation'`	100

Características :

{
    "premise": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "choice1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "choice2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    }
}

copa.hi

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/copa.hi')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The Choice Of Plausible Alternatives (COPA) evaluation provides researchers with a tool for assessing
progress in open-domain commonsense causal reasoning. COPA consists of 1000 questions, split equally
into development and test sets of 500 questions each. Each question is composed of a premise and two
alternatives, where the task is to select the alternative that more plausibly has a causal relation
with the premise. The correct alternative is randomized so that the expected performance of randomly
guessing is 50%. This dataset is translated and publicly released for 3 languages by AI4Bharat.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	449
`'train'`	362
`'validation'`	88

Características :

{
    "premise": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "choice1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "choice2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    }
}

copa.gu

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/copa.gu')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The Choice Of Plausible Alternatives (COPA) evaluation provides researchers with a tool for assessing
progress in open-domain commonsense causal reasoning. COPA consists of 1000 questions, split equally
into development and test sets of 500 questions each. Each question is composed of a premise and two
alternatives, where the task is to select the alternative that more plausibly has a causal relation
with the premise. The correct alternative is randomized so that the expected performance of randomly
guessing is 50%. This dataset is translated and publicly released for 3 languages by AI4Bharat.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	448
`'train'`	362
`'validation'`	88

Características :

{
    "premise": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "choice1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "choice2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    }
}

copa.mr

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/copa.mr')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The Choice Of Plausible Alternatives (COPA) evaluation provides researchers with a tool for assessing
progress in open-domain commonsense causal reasoning. COPA consists of 1000 questions, split equally
into development and test sets of 500 questions each. Each question is composed of a premise and two
alternatives, where the task is to select the alternative that more plausibly has a causal relation
with the premise. The correct alternative is randomized so that the expected performance of randomly
guessing is 50%. This dataset is translated and publicly released for 3 languages by AI4Bharat.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	449
`'train'`	362
`'validation'`	88

Características :

{
    "premise": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "choice1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "choice2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    }
}

sna.bn

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/sna.bn')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


This dataset is a collection of Bengali News articles. The dataset is used for classifying articles into
5 different classes namely international, state, kolkata, entertainment and sports.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	1411
`'train'`	11284
`'validation'`	1411

Características :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 6,
        "names": [
            "kolkata",
            "state",
            "national",
            "sports",
            "entertainment",
            "international"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

csqa.as

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/csqa.as')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	2942

Características :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

csqa.bn

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/csqa.bn')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	38845

Características :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

csqa.gu

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/csqa.gu')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	22861

Características :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

csqa.hi

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/csqa.hi')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	35140

Características :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

csqa.kn

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/csqa.kn')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	13666

Características :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

csqa.ml

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/csqa.ml')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	26537

Características :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

csqa.mr

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/csqa.mr')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	11370

Características :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

csqa.or

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/csqa.or')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	1975

Características :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

csqa.pa

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/csqa.pa')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	5667

Características :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

csqa.ta

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/csqa.ta')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	38590

Características :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

csqa.te

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/csqa.te')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Given a text with an entity randomly masked, the task is to predict that masked entity from a list of 4
candidate entities. The dataset contains around 239k examples across 11 languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	41338

Características :

{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "out_of_context_options": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wstp.as

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wstp.as')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	626
`'train'`	5000
`'validation'`	625

Características :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wstp.bn

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wstp.bn')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	5948
`'train'`	47580
`'validation'`	5947

Características :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wstp.gu

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wstp.gu')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	1251
`'train'`	10004
`'validation'`	1251

Características :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wstp.hi

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wstp.hi')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	5509
`'train'`	44069
`'validation'`	5509

Características :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wstp.kn

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wstp.kn')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	4423
`'train'`	35379
`'validation'`	4422

Características :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wstp.ml

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wstp.ml')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	3441
`'train'`	27527
`'validation'`	3441

Características :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wstp.mr

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wstp.mr')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	1306
`'train'`	10446
`'validation'`	1306

Características :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wstp.or

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wstp.or')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	502
`'train'`	4015
`'validation'`	502

Características :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wstp.pa

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wstp.pa')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	1097
`'train'`	8772
`'validation'`	1097

Características :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wstp.ta

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wstp.ta')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	6118
`'train'`	48940
`'validation'`	6117

Características :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wstp.te

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wstp.te')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Predict the correct title for a Wikipedia section from a given list of four candidate titles.
The dataset has 400k examples across 11 Indian languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	10.000
`'train'`	80.000
`'validation'`	10.000

Características :

{
    "sectionText": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correctTitle": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleA": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleB": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleC": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "titleD": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

inltkh.gu

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/inltkh.gu')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Obtained from inltk project. The corpus is a collection of headlines tagged with their news category.
Available for langauges: gu, ml, mr and ta.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	659
`'train'`	5269
`'validation'`	659

Características :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 10,
        "names": [
            "entertainment",
            "business",
            "tech",
            "sports",
            "state",
            "spirituality",
            "tamil-cinema",
            "positive",
            "negative",
            "neutral"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

inltkh.ml

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/inltkh.ml')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Obtained from inltk project. The corpus is a collection of headlines tagged with their news category.
Available for langauges: gu, ml, mr and ta.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	630
`'train'`	5036
`'validation'`	630

Características :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 10,
        "names": [
            "entertainment",
            "business",
            "tech",
            "sports",
            "state",
            "spirituality",
            "tamil-cinema",
            "positive",
            "negative",
            "neutral"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

inltkh.mr

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/inltkh.mr')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Obtained from inltk project. The corpus is a collection of headlines tagged with their news category.
Available for langauges: gu, ml, mr and ta.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	1210
`'train'`	9672
`'validation'`	1210

Características :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 10,
        "names": [
            "entertainment",
            "business",
            "tech",
            "sports",
            "state",
            "spirituality",
            "tamil-cinema",
            "positive",
            "negative",
            "neutral"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

inltkh.ta

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/inltkh.ta')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Obtained from inltk project. The corpus is a collection of headlines tagged with their news category.
Available for langauges: gu, ml, mr and ta.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	669
`'train'`	5346
`'validation'`	669

Características :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 10,
        "names": [
            "entertainment",
            "business",
            "tech",
            "sports",
            "state",
            "spirituality",
            "tamil-cinema",
            "positive",
            "negative",
            "neutral"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

inltkh.te

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/inltkh.te')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


Obtained from inltk project. The corpus is a collection of headlines tagged with their news category.
Available for langauges: gu, ml, mr and ta.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	541
`'train'`	4328
`'validation'`	541

Características :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 10,
        "names": [
            "entertainment",
            "business",
            "tech",
            "sports",
            "state",
            "spirituality",
            "tamil-cinema",
            "positive",
            "negative",
            "neutral"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

bbca.hi

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/bbca.hi')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


This release consists of 4335 Hindi documents with tags from the BBC Hindi News website.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	866
`'train'`	3467

Características :

{
    "label": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cvit-mkb-clsr.en-bn

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/cvit-mkb-clsr.en-bn')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


CVIT Maan ki Baat Dataset - Given a sentence in language $L_1$ the task is to retrieve its translation
from a set of candidate sentences in language $L_2$.
The dataset contains around 39k parallel sentence pairs across 8 Indian languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	5522

Características :

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cvit-mkb-clsr.en-gu

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/cvit-mkb-clsr.en-gu')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


CVIT Maan ki Baat Dataset - Given a sentence in language $L_1$ the task is to retrieve its translation
from a set of candidate sentences in language $L_2$.
The dataset contains around 39k parallel sentence pairs across 8 Indian languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	6463

Características :

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cvit-mkb-clsr.en-hi

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/cvit-mkb-clsr.en-hi')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


CVIT Maan ki Baat Dataset - Given a sentence in language $L_1$ the task is to retrieve its translation
from a set of candidate sentences in language $L_2$.
The dataset contains around 39k parallel sentence pairs across 8 Indian languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	5169

Características :

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cvit-mkb-clsr.en-ml

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/cvit-mkb-clsr.en-ml')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


CVIT Maan ki Baat Dataset - Given a sentence in language $L_1$ the task is to retrieve its translation
from a set of candidate sentences in language $L_2$.
The dataset contains around 39k parallel sentence pairs across 8 Indian languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	4886

Características :

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cvit-mkb-clsr.en-mr

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/cvit-mkb-clsr.en-mr')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


CVIT Maan ki Baat Dataset - Given a sentence in language $L_1$ the task is to retrieve its translation
from a set of candidate sentences in language $L_2$.
The dataset contains around 39k parallel sentence pairs across 8 Indian languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	5760

Características :

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cvit-mkb-clsr.en-or

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/cvit-mkb-clsr.en-or')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


CVIT Maan ki Baat Dataset - Given a sentence in language $L_1$ the task is to retrieve its translation
from a set of candidate sentences in language $L_2$.
The dataset contains around 39k parallel sentence pairs across 8 Indian languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	752

Características :

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cvit-mkb-clsr.en-ta

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/cvit-mkb-clsr.en-ta')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


CVIT Maan ki Baat Dataset - Given a sentence in language $L_1$ the task is to retrieve its translation
from a set of candidate sentences in language $L_2$.
The dataset contains around 39k parallel sentence pairs across 8 Indian languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	5637

Características :

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cvit-mkb-clsr.en-te

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/cvit-mkb-clsr.en-te')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


CVIT Maan ki Baat Dataset - Given a sentence in language $L_1$ the task is to retrieve its translation
from a set of candidate sentences in language $L_2$.
The dataset contains around 39k parallel sentence pairs across 8 Indian languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	5049

Características :

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cvit-mkb-clsr.en-ur

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/cvit-mkb-clsr.en-ur')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


CVIT Maan ki Baat Dataset - Given a sentence in language $L_1$ the task is to retrieve its translation
from a set of candidate sentences in language $L_2$.
The dataset contains around 39k parallel sentence pairs across 8 Indian languages.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	1006

Características :

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

iitp-mr.hi

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/iitp-mr.hi')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


IIT Patna Product Reviews: Sentiment analysis corpus for product reviews posted in Hindi.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	310
`'train'`	2480
`'validation'`	310

Características :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 3,
        "names": [
            "negative",
            "neutral",
            "positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

iitp-pr.hi

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/iitp-pr.hi')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


IIT Patna Product Reviews: Sentiment analysis corpus for product reviews posted in Hindi.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	523
`'train'`	4182
`'validation'`	523

Características :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 3,
        "names": [
            "negative",
            "neutral",
            "positive"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

acta-sc.te

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/actsa-sc.te')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


ACTSA Corpus: Sentiment analysis corpus for Telugu sentences.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	541
`'train'`	4328
`'validation'`	541

Características :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 2,
        "names": [
            "positive",
            "negative"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    }
}

md.oi

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/md.hi')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The Hindi Discourse Analysis dataset is a corpus for analyzing discourse modes present in its sentences.
It contains sentences from stories written by 11 famous authors from the 20th Century. 4-5 stories by
each author have been selected which were available in the public domain resulting in a collection of 53 stories.
Most of these short stories were originally written in Hindi but some of them were written in other Indian languages
and later translated to Hindi.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	997
`'train'`	7974
`'validation'`	997

Características :

{
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "discourse_mode": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "story_number": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    }
}

wiki-ner.as

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wiki-ner.as')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	160
`'train'`	1021
`'validation'`	157

Características :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wiki-ner.bn

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wiki-ner.bn')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	2690
`'train'`	20223
`'validation'`	2985

Características :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wiki-ner.gu

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wiki-ner.gu')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	255
`'train'`	2343
`'validation'`	297

Características :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wiki-ner.hi

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wiki-ner.hi')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	1256
`'train'`	9463
`'validation'`	1114

Características :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wiki-ner.kn

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wiki-ner.kn')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	476
`'train'`	2679
`'validation'`	412

Características :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wiki-ner.ml

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wiki-ner.ml')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	2042
`'train'`	15620
`'validation'`	2067

Características :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wiki-ner.mr

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wiki-ner.mr')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	1329
`'train'`	12151
`'validation'`	1498

Características :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wiki-ner.or

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wiki-ner.or')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	153
`'train'`	1077
`'validation'`	132

Características :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wiki-ner.pa

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wiki-ner.pa')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	179
`'train'`	1408
`'validation'`	186

Características :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wiki-ner.ta

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wiki-ner.ta')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	2611
`'train'`	20466
`'validation'`	2586

Características :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

wiki-ner.te

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:indic_glue/wiki-ner.te')

Descrição :

IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
    variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.


The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using
the linked entities in Wikipedia pages for 282 different languages including Danish.

Licença : Nenhuma licença conhecida
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	1110
`'train'`	7978
`'validation'`	841

Características :

{
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-LOC",
                "B-ORG",
                "B-PER",
                "I-LOC",
                "I-ORG",
                "I-PER",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "additional_info": {
        "feature": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}