TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

wiki_table_text

Description:

Wikipedia tables with at least 3 rows and 2 columns, 3 random rows for each table were selected for further annotation. Each row was annotated by a different person, so the dataset is composed by (one row table, text description) pairs. Annotations include at least 2 cells of the row, but do not require to include them all. The dataset follows a standarized table format.

Homepage: https://github.com/msra-nlc/Table2Text
Source code: tfds.structured.wiki_table_text.WikiTableText
Versions:
- 1.0.0 (default): Initial release.
Download size: 3.70 MiB
Dataset size: 4.64 MiB
Auto-cached (documentation): Yes
Splits:

Split	Examples
`'test'`	2,000
`'train'`	10,000
`'validation'`	1,318

Feature structure:

FeaturesDict({
    'input_text': FeaturesDict({
        'table': Sequence({
            'column_header': string,
            'content': string,
            'row_number': int16,
        }),
    }),
    'target_text': string,
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
input_text	FeaturesDict
input_text/table	Sequence
input_text/table/column_header	Tensor	string
input_text/table/content	Tensor	string
input_text/table/row_number	Tensor	int16
target_text	Tensor	string

Supervised keys (See as_supervised doc): ('input_text', 'target_text')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@inproceedings{bao2018table,
  title={Table-to-Text: Describing Table Region with Natural Language},
  author={Junwei Bao and Duyu Tang and Nan Duan and Zhao Yan and Yuanhua Lv and Ming Zhou and Tiejun Zhao},
  booktitle={AAAI},
  url={https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/download/16138/16782},
  year={2018}
}