dart
Stay organized with collections
Save and categorize content based on your preferences.
DART (DAta Record to Text generation) contains RDF entity-relation annotated
with sentence descriptions that cover all facts in the triple set. DART was
constructed using existing datasets such as: WikiTableQuestions, WikiSQL, WebNLG
and Cleaned E2E. The tables from WikiTableQuestions and WikiSQL were transformed
to subject-predicate-object triples, and its text annotations were mainly
collected from MTurk. The meaningful representations in E2E were also
transformed to triples and its descriptions were used, some that couldn't be
transformed were dropped.
The dataset splits of E2E and WebNLG are kept, and for the WikiTableQuestions
and WikiSQL the Jaccard similarity is used to keep similar tables in the same
set (train/dev/tes).
This dataset is constructed following a standarized table format.
Split |
Examples |
'test' |
12,552 |
'train' |
62,659 |
'validation' |
6,980 |
FeaturesDict({
'input_text': FeaturesDict({
'table': Sequence({
'column_header': string,
'content': string,
'row_number': int16,
}),
}),
'target_text': string,
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
input_text |
FeaturesDict |
|
|
|
input_text/table |
Sequence |
|
|
|
input_text/table/column_header |
Tensor |
|
string |
|
input_text/table/content |
Tensor |
|
string |
|
input_text/table/row_number |
Tensor |
|
int16 |
|
target_text |
Tensor |
|
string |
|
@article{radev2020dart,
title={DART: Open-Domain Structured Data Record to Text Generation},
author={Dragomir Radev and Rui Zhang and Amrit Rau and Abhinand Sivaprasad and Chiachun Hsieh and Nazneen Fatema Rajani and Xiangru Tang and Aadit Vyas and Neha Verma and Pranav Krishna and Yangxiaokang Liu and Nadia Irwanto and Jessica Pan and Faiaz Rahman and Ahmad Zaidi and Murori Mutuma and Yasin Tarabar and Ankit Gupta and Tao Yu and Yi Chern Tan and Xi Victoria Lin and Caiming Xiong and Richard Socher},
journal={arXiv preprint arXiv:2007.02871},
year={2020}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-06 UTC.
[null,null,["Last updated 2022-12-06 UTC."],[],[],null,["# dart\n\n\u003cbr /\u003e\n\n- **Description**:\n\nDART (DAta Record to Text generation) contains RDF entity-relation annotated\nwith sentence descriptions that cover all facts in the triple set. DART was\nconstructed using existing datasets such as: WikiTableQuestions, WikiSQL, WebNLG\nand Cleaned E2E. The tables from WikiTableQuestions and WikiSQL were transformed\nto subject-predicate-object triples, and its text annotations were mainly\ncollected from MTurk. The meaningful representations in E2E were also\ntransformed to triples and its descriptions were used, some that couldn't be\ntransformed were dropped.\n\nThe dataset splits of E2E and WebNLG are kept, and for the WikiTableQuestions\nand WikiSQL the Jaccard similarity is used to keep similar tables in the same\nset (train/dev/tes).\n\nThis dataset is constructed following a standarized table format.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/dart)\n\n- **Homepage** :\n \u003chttps://github.com/Yale-LILY/dart\u003e\n\n- **Source code** :\n [`tfds.structured.dart.Dart`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/structured/dart/dart.py)\n\n- **Versions**:\n\n - **`0.1.0`** (default): No release notes.\n- **Download size** : `249.71 MiB`\n\n- **Dataset size** : `38.83 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 12,552 |\n| `'train'` | 62,659 |\n| `'validation'` | 6,980 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'input_text': FeaturesDict({\n 'table': Sequence({\n 'column_header': string,\n 'content': string,\n 'row_number': int16,\n }),\n }),\n 'target_text': string,\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|--------------------------------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| input_text | FeaturesDict | | | |\n| input_text/table | Sequence | | | |\n| input_text/table/column_header | Tensor | | string | |\n| input_text/table/content | Tensor | | string | |\n| input_text/table/row_number | Tensor | | int16 | |\n| target_text | Tensor | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('input_text', 'target_text')`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @article{radev2020dart,\n title={DART: Open-Domain Structured Data Record to Text Generation},\n author={Dragomir Radev and Rui Zhang and Amrit Rau and Abhinand Sivaprasad and Chiachun Hsieh and Nazneen Fatema Rajani and Xiangru Tang and Aadit Vyas and Neha Verma and Pranav Krishna and Yangxiaokang Liu and Nadia Irwanto and Jessica Pan and Faiaz Rahman and Ahmad Zaidi and Murori Mutuma and Yasin Tarabar and Ankit Gupta and Tao Yu and Yi Chern Tan and Xi Victoria Lin and Caiming Xiong and Richard Socher},\n journal={arXiv preprint arXiv:2007.02871},\n year={2020}"]]