TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

databricks_dolly

Description:

databricks-dolly-15k is an open source dataset of instruction-following records used in training databricks/dolly-v2-12b that was generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization.

This dataset can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported License.

Homepage: https://github.com/databrickslabs/dolly
Source code: tfds.datasets.databricks_dolly.Builder
Versions:
- 1.0.0 (default): Initial release.
Download size: 12.60 MiB
Dataset size: 12.69 MiB
Auto-cached (documentation): Yes
Splits:

Split	Examples
`'train'`	15,014

Feature structure:

FeaturesDict({
    'category': Text(shape=(), dtype=string),
    'context': Text(shape=(), dtype=string),
    'instruction': Text(shape=(), dtype=string),
    'response': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
category	Text	string
context	Text	string
instruction	Text	string
response	Text	string

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

databricks_dolly Stay organized with collections Save and categorize content based on your preferences.

databricks_dolly