TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

tfds.core.SequentialWriter

Class to write a TFDS dataset sequentially.

tfds.core.SequentialWriter(
    ds_info: dataset_info.DatasetInfo,
    max_examples_per_shard: int,
    overwrite: bool = True,
    file_format: str = 'tfrecord'
)

The SequentialWriter can be used to generate TFDS datasets by directly appending TF Examples to the desired splits.

Once the user creates a SequentialWriter with a given DatasetInfo, they can create splits, append examples to them, and close them whenever they are finished.

Note that:

Not closing a split may cause data to be lost.
The examples are written to disk in the same order that they are given to the writer.
Since the SequentialWriter doesn't know how many examples are going to be written, it can't estimate the optimal number of shards per split. Use the max_examples_per_shard parameter in the constructor to control how many elements there should be per shard.

The datasets written with this writer can be read directly with tfds.builder_from_directories.

Example:

writer = SequentialWriter(ds_info=ds_info, max_examples_per_shard=1000) writer.initialize_splits(['train', 'test'])

while (...): # Code that generates the examples writer.add_examples({'train': [example1, example2], 'test': [example3]}) ...

writer.close_splits()

Args
`ds_info`	DatasetInfo for this dataset.
`max_examples_per_shard`	maximum number of examples to write per shard.
`overwrite`	if True, it ignores and overwrites any existing data. Otherwise, it loads the existing dataset and appends the new data (new data will always be created as new shards).
`file_format`	An entry in file_adapters.FileFormat.

Methods

`add_examples`

View source

add_examples(
    split_examples: Dict[str, List[Any]]
) -> None

Adds examples to the splits.

Args
`split_examples`	dictionary of `split_name`:list_of_examples that includes the list of examples that has to be added to each of the splits. Not all the existing splits have to be in the dictionary

Raises
`KeyError`	if any of the splits doesn't exist.

`close_all`

View source

close_all() -> None

Closes all the open splits.

`close_splits`

View source

close_splits(
    splits: List[str]
) -> None

Closes the given list of splits.

Args
`splits`	list of split names.

Raises
`KeyError`	if any of the splits doesn't exist.

`initialize_splits`

View source

initialize_splits(
    splits: List[str], fail_if_exists: bool = True
) -> None

Adds new splits to the dataset.

Args
`splits`	list of split names to add.
`fail_if_exists`	will fail if this split already contains data.

Raises
`KeyError`	if the split is already present.

tfds.core.SequentialWriter

Note that:

Example:

Args

Methods

add_examples

close_all

close_splits

initialize_splits

`add_examples`

`close_all`

`close_splits`

`initialize_splits`