importtempfilepath=os.path.join(tempfile.gettempdir(),"saved_data")# Save a datasetdataset=tf.data.Dataset.range(2)tf.data.experimental.save(dataset,path)new_dataset=tf.data.experimental.load(path)foreleminnew_dataset:print(elem)tf.Tensor(0,shape=(),dtype=int64)tf.Tensor(1,shape=(),dtype=int64)
The saved dataset is saved in multiple file "shards". By default, the dataset
output is divided to shards in a round-robin fashion but custom sharding can
be specified via the shard_func function. For example, you can save the
dataset to using a single shard as follows:
Required. A directory to use for saving the dataset.
compression
Optional. The algorithm to use to compress data when writing
it. Supported options are GZIP and NONE. Defaults to NONE.
shard_func
Optional. A function to control the mapping of dataset elements
to file shards. The function is expected to map elements of the input
dataset to int64 shard IDs. If present, the function will be traced and
executed as graph computation.
checkpoint_args
Optional args for checkpointing which will be passed into
the tf.train.CheckpointManager. If checkpoint_args are not specified,
then checkpointing will not be performed. The save() implementation
creates a tf.train.Checkpoint object internally, so users should not
set the checkpoint argument in checkpoint_args.
Raises
ValueError if checkpoint is passed into checkpoint_args.
[null,null,["Last updated 2023-03-17 UTC."],[],[],null,["# tf.data.experimental.save\n\n\u003cbr /\u003e\n\n|--------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v2.9.3/tensorflow/python/data/experimental/ops/io.py#L43-L144) |\n\nSaves the content of the given dataset. \n\n tf.data.experimental.save(\n dataset, path, compression=None, shard_func=None, checkpoint_args=None\n )\n\n#### Example usage:\n\n import tempfile\n path = os.path.join(tempfile.gettempdir(), \"saved_data\")\n # Save a dataset\n dataset = tf.data.Dataset.range(2)\n tf.data.experimental.save(dataset, path)\n new_dataset = tf.data.experimental.load(path)\n for elem in new_dataset:\n print(elem)\n tf.Tensor(0, shape=(), dtype=int64)\n tf.Tensor(1, shape=(), dtype=int64)\n\nThe saved dataset is saved in multiple file \"shards\". By default, the dataset\noutput is divided to shards in a round-robin fashion but custom sharding can\nbe specified via the `shard_func` function. For example, you can save the\ndataset to using a single shard as follows: \n\n dataset = make_dataset()\n def custom_shard_func(element):\n return 0\n dataset = tf.data.experimental.save(\n path=\"/path/to/data\", ..., shard_func=custom_shard_func)\n\nTo enable checkpointing, pass in `checkpoint_args` to the `save` method\nas follows: \n\n dataset = tf.data.Dataset.range(100)\n save_dir = \"...\"\n checkpoint_prefix = \"...\"\n step_counter = tf.Variable(0, trainable=False)\n checkpoint_args = {\n \"checkpoint_interval\": 50,\n \"step_counter\": step_counter,\n \"directory\": checkpoint_prefix,\n \"max_to_keep\": 20,\n }\n dataset.save(dataset, save_dir, checkpoint_args=checkpoint_args)\n\n| **Note:** The directory layout and file format used for saving the dataset is considered an implementation detail and may change. For this reason, datasets saved through [`tf.data.experimental.save`](../../../tf/data/experimental/save) should only be consumed through [`tf.data.experimental.load`](../../../tf/data/experimental/load), which is guaranteed to be backwards compatible.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `dataset` | The dataset to save. |\n| `path` | Required. A directory to use for saving the dataset. |\n| `compression` | Optional. The algorithm to use to compress data when writing it. Supported options are `GZIP` and `NONE`. Defaults to `NONE`. |\n| `shard_func` | Optional. A function to control the mapping of dataset elements to file shards. The function is expected to map elements of the input dataset to int64 shard IDs. If present, the function will be traced and executed as graph computation. |\n| `checkpoint_args` | Optional args for checkpointing which will be passed into the [`tf.train.CheckpointManager`](../../../tf/train/CheckpointManager). If `checkpoint_args` are not specified, then checkpointing will not be performed. The `save()` implementation creates a [`tf.train.Checkpoint`](../../../tf/train/Checkpoint) object internally, so users should not set the `checkpoint` argument in `checkpoint_args`. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|---|---|\n| ValueError if `checkpoint` is passed into `checkpoint_args`. ||\n\n\u003cbr /\u003e"]]