- Description:
Contextualization
ASSIN 2 is the second edition of the Avaliação de Similaridade Semântica e Inferência Textual (Evaluating Semantic Similarity and Textual Entailment), and was a workshop collocated with STIL 2019. It follows the first edition of ASSIN, proposing a new shared task with new data.
The workshop evaluated systems that assess two types of relations between two sentences: Semantic Textual Similarity and Textual Entailment.
Semantic Textual Similarity consists of quantifying the level of semantic equivalence between sentences, while Textual Entailment Recognition consists of classifying whether the first sentence entails the second.
Data
The corpus used in ASSIN 2 is composed of rather simple sentences. Following the procedures of SemEval 2014 Task 1, we tried to remove from the corpus named entities and indirect speech, and tried to have all verbs in the present tense. The annotation instructions given to annotators are available (in Portuguese).
The training and validation data are composed, respectively, of 6,500 and 500 sentence pairs in Brazilian Portuguese, annotated for entailment and semantic similarity. Semantic similarity values range from 1 to 5, and text entailment classes are either entailment or none. The test data are composed of approximately 3,000 sentence pairs with the same annotation. All data were manually annotated.
Evaluation
Evaluation The evaluation of submissions to ASSIN 2 was with the same metrics as the first ASSIN, with the F1 of precision and recall as the main metric for text entailment and Pearson correlation for semantic similarity. The evaluation scripts are the same as in the last edition.
PS.: Description is extracted from official homepage.
- Additional Documentation: Explore on Papers With Code 
- Source code: - tfds.datasets.assin2.Builder
- Versions: - 1.0.0(default): Initial release.
 
- Download size: - 2.02 MiB
- Dataset size: - 1.82 MiB
- Auto-cached (documentation): Yes 
- Splits: 
| Split | Examples | 
|---|---|
| 'test' | 2,448 | 
| 'train' | 6,500 | 
| 'validation' | 500 | 
- Feature structure:
FeaturesDict({
    'entailment': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'hypothesis': Text(shape=(), dtype=string),
    'id': int32,
    'similarity': float32,
    'text': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description | 
|---|---|---|---|---|
| FeaturesDict | ||||
| entailment | ClassLabel | int64 | ||
| hypothesis | Text | string | ||
| id | Tensor | int32 | ||
| similarity | Tensor | float32 | ||
| text | Text | string | 
- Supervised keys (See - as_superviseddoc):- None
- Figure (tfds.show_examples): Not supported. 
- Examples (tfds.as_dataframe): 
- Citation:
@inproceedings{DBLP:conf/propor/RealFO20,
  author    = {Livy Real and
               Erick Fonseca and
               Hugo Gon{\c{c} }alo Oliveira},
  editor    = {Paulo Quaresma and
               Renata Vieira and
               Sandra M. Alu{\'{\i} }sio and
               Helena Moniz and
               Fernando Batista and
               Teresa Gon{\c{c} }alves},
  title     = {The {ASSIN} 2 Shared Task: {A} Quick Overview},
  booktitle = {Computational Processing of the Portuguese Language - 14th International
               Conference, {PROPOR} 2020, Evora, Portugal, March 2-4, 2020, Proceedings},
  series    = {Lecture Notes in Computer Science},
  volume    = {12037},
  pages     = {406--412},
  publisher = {Springer},
  year      = {2020},
  url       = {https://doi.org/10.1007/978-3-030-41505-1_39},
  doi       = {10.1007/978-3-030-41505-1_39},
  timestamp = {Tue, 03 Mar 2020 09:40:18 +0100},
  biburl    = {https://dblp.org/rec/conf/propor/RealFO20.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}