unnatural_instructions
Stay organized with collections
Save and categorize content based on your preferences.
Dataset described in the paper: Unnatural Instructions: Tuning Language Models
with (Almost) No Human Labor (2022). Contains sets of natural-language
instructions, with optional constraints / LLM-generated reformulations.
Split |
Examples |
'train' |
66,010 |
FeaturesDict({
'id': Text(shape=(), dtype=string),
'instances': Sequence({
'constraints': Text(shape=(), dtype=string),
'input': Text(shape=(), dtype=string),
'instruction_with_input': Text(shape=(), dtype=string),
'output': Text(shape=(), dtype=string),
}),
'instruction': Text(shape=(), dtype=string),
'reformulations': Sequence({
'input': Text(shape=(), dtype=string),
'instruction': Text(shape=(), dtype=string),
'instruction_with_input': Text(shape=(), dtype=string),
'output': Text(shape=(), dtype=string),
}),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
id |
Text |
|
string |
Unique identifier for example. |
instances |
Sequence |
|
|
|
instances/constraints |
Text |
|
string |
Task-specific constraints. |
instances/input |
Text |
|
string |
Input to be fed into placeholders for given instruction. |
instances/instruction_with_input |
Text |
|
string |
Instructions with inputs supplied to placeholders. |
instances/output |
Text |
|
string |
Target output for given task. |
instruction |
Text |
|
string |
Instruction with placeholder for inputs. |
reformulations |
Sequence |
|
|
|
reformulations/input |
Text |
|
string |
Input to be fed into placeholders for given instruction. |
reformulations/instruction |
Text |
|
string |
Instruction with placeholder for inputs. |
reformulations/instruction_with_input |
Text |
|
string |
Instructions with inputs supplied to placeholders. |
reformulations/output |
Text |
|
string |
Target output for given task. |
@misc{honovich2022unnatural,
title = {Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor},
author = {Honovich, Or and Scialom, Thomas and Levy, Omer and Schick, Timo},
url = {https://arxiv.org/abs/2212.09689},
publisher = {arXiv},
year={2022}
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2023-01-19 UTC.
[null,null,["Last updated 2023-01-19 UTC."],[],[],null,["# unnatural_instructions\n\n\u003cbr /\u003e\n\n- **Description**:\n\nDataset described in the paper: Unnatural Instructions: Tuning Language Models\nwith (Almost) No Human Labor (2022). Contains sets of natural-language\ninstructions, with optional constraints / LLM-generated reformulations.\n\n- **Homepage** :\n \u003chttps://github.com/orhonovich/unnatural-instructions\u003e\n\n- **Source code** :\n [`tfds.text.unnatural_instructions.UnnaturalInstructions`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/text/unnatural_instructions/unnatural_instructions.py)\n\n- **Versions**:\n\n - **`0.0.1`** (default): Initial release. Omit instructions / inputs, as they require additional processing to be used. Instruction_with_inputs and reformulations contain instructions and contexts.\n- **Download size** : `17.48 MiB`\n\n- **Dataset size** : `154.71 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Only when `shuffle_files=False` (train)\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'train'` | 66,010 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'id': Text(shape=(), dtype=string),\n 'instances': Sequence({\n 'constraints': Text(shape=(), dtype=string),\n 'input': Text(shape=(), dtype=string),\n 'instruction_with_input': Text(shape=(), dtype=string),\n 'output': Text(shape=(), dtype=string),\n }),\n 'instruction': Text(shape=(), dtype=string),\n 'reformulations': Sequence({\n 'input': Text(shape=(), dtype=string),\n 'instruction': Text(shape=(), dtype=string),\n 'instruction_with_input': Text(shape=(), dtype=string),\n 'output': Text(shape=(), dtype=string),\n }),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|---------------------------------------|--------------|-------|--------|----------------------------------------------------------|\n| | FeaturesDict | | | |\n| id | Text | | string | Unique identifier for example. |\n| instances | Sequence | | | |\n| instances/constraints | Text | | string | Task-specific constraints. |\n| instances/input | Text | | string | Input to be fed into placeholders for given instruction. |\n| instances/instruction_with_input | Text | | string | Instructions with inputs supplied to placeholders. |\n| instances/output | Text | | string | Target output for given task. |\n| instruction | Text | | string | Instruction with placeholder for inputs. |\n| reformulations | Sequence | | | |\n| reformulations/input | Text | | string | Input to be fed into placeholders for given instruction. |\n| reformulations/instruction | Text | | string | Instruction with placeholder for inputs. |\n| reformulations/instruction_with_input | Text | | string | Instructions with inputs supplied to placeholders. |\n| reformulations/output | Text | | string | Target output for given task. |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @misc{honovich2022unnatural,\n title = {Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor},\n author = {Honovich, Or and Scialom, Thomas and Levy, Omer and Schick, Timo},\n url = {https://arxiv.org/abs/2212.09689},\n publisher = {arXiv},\n year={2022}\n }"]]