reddit_tifu
Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Dataset Reddit, di mana TIFU menunjukkan nama subbreddit /r/tifu. Sebagaimana didefinisikan dalam publikasi, gaya "pendek" menggunakan judul sebagai ringkasan dan "panjang" menggunakan tldr sebagai ringkasan.
Fitur meliputi:
- dokumen: kirim teks tanpa tldr.
- tldr: baris tldr.
- judul: judul terpangkas tanpa tldr.
- up: suara positif.
- skor: skor.
- num_comments: jumlah komentar.
upvote_ratio: rasio suara positif.
Dokumentasi Tambahan : Jelajahi di Makalah Dengan Kode north_east
Beranda : https://github.com/ctr4si/MMN
Kode sumber : tfds.datasets.reddit_tifu.Builder
Versi :
-
1.1.0
: Hapus dokumen kosong dan string ringkasan. -
1.1.1
: Tambahkan pemisahan train, dev, dan test (80/10/10) yang digunakan di PEGASUS ( https://arxiv.org/abs/1912.08777 ) dalam konfigurasi terpisah. Ini dibuat secara acak menggunakan fungsi pemisahan tfds dan dirilis untuk memastikan bahwa hasil di Reddit Tifu Long dapat direproduksi dan dibandingkan. Tambahkan juga id
ke titik data. -
1.1.2
(default): Pemisahan yang diperbaiki diunggah.
Struktur fitur :
FeaturesDict({
'documents': Text(shape=(), dtype=string),
'id': Text(shape=(), dtype=string),
'num_comments': float32,
'score': float32,
'title': Text(shape=(), dtype=string),
'tldr': Text(shape=(), dtype=string),
'ups': float32,
'upvote_ratio': float32,
})
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|
| fiturDict | | | |
dokumen | Teks | | rangkaian | |
Indo | Teks | | rangkaian | |
num_comments | Tensor | | float32 | |
skor | Tensor | | float32 | |
judul | Teks | | rangkaian | |
tldr | Teks | | rangkaian | |
UPS | Tensor | | float32 | |
rasio_upvote | Tensor | | float32 | |
@misc{kim2018abstractive,
title={Abstractive Summarization of Reddit Posts with Multi-level Memory Networks},
author={Byeongchang Kim and Hyunwoo Kim and Gunhee Kim},
year={2018},
eprint={1811.00783},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
reddit_tifu/pendek (konfigurasi default)
Deskripsi konfigurasi : Menggunakan judul sebagai ringkasan.
Ukuran unduhan : 639.54 MiB
Ukuran dataset : 141.46 MiB
Auto-cached ( dokumentasi ): Hanya ketika shuffle_files=False
(train)
Perpecahan :
Membelah | Contoh |
---|
'train' | 79.740 |
reddit_tifu/long
Deskripsi konfigurasi : Menggunakan TLDR sebagai ringkasan.
Ukuran unduhan : 639.54 MiB
Ukuran dataset : 93.10 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|
'train' | 42.139 |
reddit_tifu/long_split
Deskripsi konfigurasi : Menggunakan TLDR sebagai ringkasan dan kembalikan pemisahan kereta/tes/dev.
Ukuran unduhan : 639.94 MiB
Ukuran dataset : 93.10 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|
'test' | 4.214 |
'train' | 33.711 |
'validation' | 4.214 |
Kecuali dinyatakan lain, konten di halaman ini dilisensikan berdasarkan Lisensi Creative Commons Attribution 4.0, sedangkan contoh kode dilisensikan berdasarkan Lisensi Apache 2.0. Untuk mengetahui informasi selengkapnya, lihat Kebijakan Situs Google Developers. Java adalah merek dagang terdaftar dari Oracle dan/atau afiliasinya.
Terakhir diperbarui pada 2022-12-23 UTC.
[null,null,["Terakhir diperbarui pada 2022-12-23 UTC."],[],[],null,["# reddit_tifu\n\n\u003cbr /\u003e\n\n- **Description**:\n\nReddit dataset, where TIFU denotes the name of subbreddit /r/tifu. As defined in\nthe publication, style \"short\" uses title as summary and \"long\" uses tldr as\nsummary.\n\nFeatures includes:\n\n- document: post text without tldr.\n- tldr: tldr line.\n- title: trimmed title without tldr.\n- ups: upvotes.\n- score: score.\n- num_comments: number of comments.\n- upvote_ratio: upvote ratio.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/reddit-tifu)\n\n- **Homepage** : \u003chttps://github.com/ctr4si/MMN\u003e\n\n- **Source code** :\n [`tfds.datasets.reddit_tifu.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/reddit_tifu/reddit_tifu_dataset_builder.py)\n\n- **Versions**:\n\n - `1.1.0`: Remove empty document and summary strings.\n - `1.1.1`: Add train, dev and test (80/10/10) splits which are used in PEGASUS (\u003chttps://arxiv.org/abs/1912.08777\u003e) in a separate config. These were created randomly using the tfds split function and are being released to ensure that results on Reddit Tifu Long are reproducible and comparable.Also add `id` to the datapoints.\n - **`1.1.2`** (default): Corrected splits uploaded.\n- **Feature structure**:\n\n FeaturesDict({\n 'documents': Text(shape=(), dtype=string),\n 'id': Text(shape=(), dtype=string),\n 'num_comments': float32,\n 'score': float32,\n 'title': Text(shape=(), dtype=string),\n 'tldr': Text(shape=(), dtype=string),\n 'ups': float32,\n 'upvote_ratio': float32,\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|--------------|--------------|-------|---------|-------------|\n| | FeaturesDict | | | |\n| documents | Text | | string | |\n| id | Text | | string | |\n| num_comments | Tensor | | float32 | |\n| score | Tensor | | float32 | |\n| title | Text | | string | |\n| tldr | Text | | string | |\n| ups | Tensor | | float32 | |\n| upvote_ratio | Tensor | | float32 | |\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Citation**:\n\n @misc{kim2018abstractive,\n title={Abstractive Summarization of Reddit Posts with Multi-level Memory Networks},\n author={Byeongchang Kim and Hyunwoo Kim and Gunhee Kim},\n year={2018},\n eprint={1811.00783},\n archivePrefix={arXiv},\n primaryClass={cs.CL}\n }\n\nreddit_tifu/short (default config)\n----------------------------------\n\n- **Config description**: Using title as summary.\n\n- **Download size** : `639.54 MiB`\n\n- **Dataset size** : `141.46 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Only when `shuffle_files=False` (train)\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'train'` | 79,740 |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('documents', 'title')`\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nreddit_tifu/long\n----------------\n\n- **Config description**: Using TLDR as summary.\n\n- **Download size** : `639.54 MiB`\n\n- **Dataset size** : `93.10 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'train'` | 42,139 |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('documents', 'tldr')`\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nreddit_tifu/long_split\n----------------------\n\n- **Config description**: Using TLDR as summary and return train/test/dev\n splits.\n\n- **Download size** : `639.94 MiB`\n\n- **Dataset size** : `93.10 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 4,214 |\n| `'train'` | 33,711 |\n| `'validation'` | 4,214 |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('documents', 'tldr')`\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]