Class to write a TFDS dataset sequentially.

The SequentialWriter can be used to generate TFDS datasets by directly appending TF Examples to the desired splits.

Once the user creates a SequentialWriter with a given DatasetInfo, they can create splits, append examples to them, and close them whenever they are finished.

Note that:

  • Not closing a split may cause data to be lost.
  • The examples are written to disk in the same order that they are given to the writer.
  • Since the SequentialWriter doesn't know how many examples are going to be written, it can't estimate the optimal number of shards per split. Use the max_examples_per_shard parameter in the constructor to control how many elements there should be per shard.

The datasets written with this writer can be read directly with tfds.builder_from_directories.


writer = SequentialWriter(ds_info=ds_info, max_examples_per_shard=1000) writer.initialize_splits(['train', 'test'])

while (...): # Code that generates the examples writer.add_examples({'train': [example1, example2], 'test': [example3]}) ...


ds_info DatasetInfo for this dataset.
max_examples_per_shard maximum number of examples to write per shard.
overwrite if True, it ignores and overwrites any existing data. Otherwise, it loads the existing dataset and appends the new data (new data will always be created as new shards).



View source

Adds examples to the splits.

split_examples dictionary of split_name:list_of_examples that includes the list of examples that has to be added to each of the splits. Not all the existing splits have to be in the dictionary

KeyError if any of the splits doesn't exist.


View source

Closes all the open splits.


View source

Closes the given list of splits.

splits list of split names.

KeyError if any of the splits doesn't exist.


View source

Adds new splits to the dataset.

splits list of split names to add.
fail_if_exists will fail if this split already contains data.

KeyError if the split is already present.