irc_disentanglement
Stay organized with collections
Save and categorize content based on your preferences.
IRC Disentanglement dataset contains over 77,563 messages from Ubuntu IRC
channel.
Features include message id, message text and timestamp. Target is list of
messages that current message replies to. Each record contains a list of
messages from one day of IRC chat.
Split |
Examples |
'test' |
10 |
'train' |
153 |
'validation' |
10 |
FeaturesDict({
'day': Sequence({
'id': Text(shape=(), dtype=string),
'parents': Sequence(Text(shape=(), dtype=string)),
'text': Text(shape=(), dtype=string),
'timestamp': Text(shape=(), dtype=string),
}),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
day |
Sequence |
|
|
|
day/id |
Text |
|
string |
|
day/parents |
Sequence(Text) |
(None,) |
string |
|
day/text |
Text |
|
string |
|
day/timestamp |
Text |
|
string |
|
@InProceedings{acl19disentangle,
author = {Jonathan K. Kummerfeld and Sai R. Gouravajhala and Joseph Peper and Vignesh Athreya and Chulaka Gunasekara and Jatin Ganhotra and Siva Sankalp Patel and Lazaros Polymenakos and Walter S. Lasecki},
title = {A Large-Scale Corpus for Conversation Disentanglement},
booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
location = {Florence, Italy},
month = {July},
year = {2019},
doi = {10.18653/v1/P19-1374},
pages = {3846--3856},
url = {https://aclweb.org/anthology/papers/P/P19/P19-1374/},
arxiv = {https://arxiv.org/abs/1810.11118},
software = {https://jkk.name/irc-disentanglement},
data = {https://jkk.name/irc-disentanglement},
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-10 UTC.
[null,null,["Last updated 2022-12-10 UTC."],[],[],null,["# irc_disentanglement\n\n\u003cbr /\u003e\n\n- **Description**:\n\nIRC Disentanglement dataset contains over 77,563 messages from Ubuntu IRC\nchannel.\n\nFeatures include message id, message text and timestamp. Target is list of\nmessages that current message replies to. Each record contains a list of\nmessages from one day of IRC chat.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/irc-disentanglement)\n\n- **Homepage** :\n \u003chttps://jkk.name/irc-disentanglement\u003e\n\n- **Source code** :\n [`tfds.datasets.irc_disentanglement.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/irc_disentanglement/irc_disentanglement_dataset_builder.py)\n\n- **Versions**:\n\n - **`2.0.0`** (default): No release notes.\n- **Download size** : `113.53 MiB`\n\n- **Dataset size** : `26.59 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 10 |\n| `'train'` | 153 |\n| `'validation'` | 10 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'day': Sequence({\n 'id': Text(shape=(), dtype=string),\n 'parents': Sequence(Text(shape=(), dtype=string)),\n 'text': Text(shape=(), dtype=string),\n 'timestamp': Text(shape=(), dtype=string),\n }),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|---------------|----------------|---------|--------|-------------|\n| | FeaturesDict | | | |\n| day | Sequence | | | |\n| day/id | Text | | string | |\n| day/parents | Sequence(Text) | (None,) | string | |\n| day/text | Text | | string | |\n| day/timestamp | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @InProceedings{acl19disentangle,\n author = {Jonathan K. Kummerfeld and Sai R. Gouravajhala and Joseph Peper and Vignesh Athreya and Chulaka Gunasekara and Jatin Ganhotra and Siva Sankalp Patel and Lazaros Polymenakos and Walter S. Lasecki},\n title = {A Large-Scale Corpus for Conversation Disentanglement},\n booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},\n location = {Florence, Italy},\n month = {July},\n year = {2019},\n doi = {10.18653/v1/P19-1374},\n pages = {3846--3856},\n url = {https://aclweb.org/anthology/papers/P/P19/P19-1374/},\n arxiv = {https://arxiv.org/abs/1810.11118},\n software = {https://jkk.name/irc-disentanglement},\n data = {https://jkk.name/irc-disentanglement},\n }"]]