wiki_table_text
Stay organized with collections
Save and categorize content based on your preferences.
Wikipedia tables with at least 3 rows and 2 columns, 3 random rows for each
table were selected for further annotation. Each row was annotated by a
different person, so the dataset is composed by (one row table, text
description) pairs. Annotations include at least 2 cells of the row, but do not
require to include them all. The dataset follows a standarized table format.
Split |
Examples |
'test' |
2,000 |
'train' |
10,000 |
'validation' |
1,318 |
FeaturesDict({
'input_text': FeaturesDict({
'table': Sequence({
'column_header': string,
'content': string,
'row_number': int16,
}),
}),
'target_text': string,
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
input_text |
FeaturesDict |
|
|
|
input_text/table |
Sequence |
|
|
|
input_text/table/column_header |
Tensor |
|
string |
|
input_text/table/content |
Tensor |
|
string |
|
input_text/table/row_number |
Tensor |
|
int16 |
|
target_text |
Tensor |
|
string |
|
@inproceedings{bao2018table,
title={Table-to-Text: Describing Table Region with Natural Language},
author={Junwei Bao and Duyu Tang and Nan Duan and Zhao Yan and Yuanhua Lv and Ming Zhou and Tiejun Zhao},
booktitle={AAAI},
url={https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/download/16138/16782},
year={2018}
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-06 UTC.
[null,null,["Last updated 2022-12-06 UTC."],[],[],null,["# wiki_table_text\n\n\u003cbr /\u003e\n\n- **Description**:\n\nWikipedia tables with at least 3 rows and 2 columns, 3 random rows for each\ntable were selected for further annotation. Each row was annotated by a\ndifferent person, so the dataset is composed by (one row table, text\ndescription) pairs. Annotations include at least 2 cells of the row, but do not\nrequire to include them all. The dataset follows a standarized table format.\n\n- **Homepage** :\n \u003chttps://github.com/msra-nlc/Table2Text\u003e\n\n- **Source code** :\n [`tfds.structured.wiki_table_text.WikiTableText`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/structured/wiki_table_text/wiki_table_text.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Download size** : `3.70 MiB`\n\n- **Dataset size** : `4.64 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 2,000 |\n| `'train'` | 10,000 |\n| `'validation'` | 1,318 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'input_text': FeaturesDict({\n 'table': Sequence({\n 'column_header': string,\n 'content': string,\n 'row_number': int16,\n }),\n }),\n 'target_text': string,\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|--------------------------------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| input_text | FeaturesDict | | | |\n| input_text/table | Sequence | | | |\n| input_text/table/column_header | Tensor | | string | |\n| input_text/table/content | Tensor | | string | |\n| input_text/table/row_number | Tensor | | int16 | |\n| target_text | Tensor | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('input_text', 'target_text')`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @inproceedings{bao2018table,\n title={Table-to-Text: Describing Table Region with Natural Language},\n author={Junwei Bao and Duyu Tang and Nan Duan and Zhao Yan and Yuanhua Lv and Ming Zhou and Tiejun Zhao},\n booktitle={AAAI},\n url={https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/download/16138/16782},\n year={2018}\n }"]]