- Description:
databricks-dolly-15k is an open source dataset of instruction-following
records used in training
databricks/dolly-v2-12b that
was generated by thousands of Databricks employees in several of the behavioral
categories outlined in the InstructGPT
paper, including brainstorming, classification, closed QA, generation,
information extraction, open QA, and summarization.
This dataset can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported License.
Homepage: https://github.com/databrickslabs/dolly
Source code:
tfds.datasets.databricks_dolly.BuilderVersions:
1.0.0(default): Initial release.
Download size:
12.60 MiBDataset size:
12.69 MiBAuto-cached (documentation): Yes
Splits:
| Split | Examples |
|---|---|
'train' |
15,014 |
- Feature structure:
FeaturesDict({
'category': Text(shape=(), dtype=string),
'context': Text(shape=(), dtype=string),
'instruction': Text(shape=(), dtype=string),
'response': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| FeaturesDict | ||||
| category | Text | string | ||
| context | Text | string | ||
| instruction | Text | string | ||
| response | Text | string |
Supervised keys (See
as_superviseddoc):NoneFigure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation: