|  View source on GitHub | 
Class to write a TFDS dataset sequentially.
tfds.core.SequentialWriter(
    ds_info: dataset_info.DatasetInfo,
    max_examples_per_shard: int,
    overwrite: bool = True,
    file_format: str = 'tfrecord'
)
The SequentialWriter can be used to generate TFDS datasets by directly appending TF Examples to the desired splits.
Once the user creates a SequentialWriter with a given DatasetInfo, they can create splits, append examples to them, and close them whenever they are finished.
Note that:
- Not closing a split may cause data to be lost.
- The examples are written to disk in the same order that they are given to the writer.
- Since the SequentialWriter doesn't know how many examples are going to be
written, it can't estimate the optimal number of shards per split. Use the
max_examples_per_shardparameter in the constructor to control how many elements there should be per shard.
The datasets written with this writer can be read directly with
tfds.builder_from_directories.
Example:
writer = SequentialWriter(ds_info=ds_info, max_examples_per_shard=1000) writer.initialize_splits(['train', 'test'])
while (...): # Code that generates the examples writer.add_examples({'train': [example1, example2], 'test': [example3]}) ...
writer.close_splits()
Methods
add_examples
add_examples(
    split_examples: Dict[str, List[Any]]
) -> None
Adds examples to the splits.
| Args | |
|---|---|
| split_examples | dictionary of split_name:list_of_examples that includes
the list of examples that has to be added to each of the splits. Not all
the existing splits have to be in the dictionary | 
| Raises | |
|---|---|
| KeyError | if any of the splits doesn't exist. | 
close_all
close_all() -> None
Closes all the open splits.
close_splits
close_splits(
    splits: List[str]
) -> None
Closes the given list of splits.
| Args | |
|---|---|
| splits | list of split names. | 
| Raises | |
|---|---|
| KeyError | if any of the splits doesn't exist. | 
initialize_splits
initialize_splits(
    splits: List[str], fail_if_exists: bool = True
) -> None
Adds new splits to the dataset.
| Args | |
|---|---|
| splits | list of split names to add. | 
| fail_if_exists | will fail if this split already contains data. | 
| Raises | |
|---|---|
| KeyError | if the split is already present. |