tf.data.TFRecordDataset

A Dataset comprising records from one or more TFRecord files.

Inherits From: Dataset

tf.data.TFRecordDataset(
    filenames, compression_type=None, buffer_size=None, num_parallel_reads=None
)

This dataset loads TFRecords from the files as bytes, exactly as they were written.TFRecordDataset does not do any parsing or decoding on its own. Parsing and decoding can be done by applying Dataset.map transformations after the TFRecordDataset.

A minimal example is given below:

import tempfile
example_path = os.path.join(tempfile.gettempdir(), "example.tfrecords")
np.random.seed(0)

# Write the records to a file.
with tf.io.TFRecordWriter(example_path) as file_writer:
  for _ in range(4):
    x, y = np.random.random(), np.random.random()

    record_bytes = tf.train.Example(features=tf.train.Features(feature={
        "x": tf.train.Feature(float_list=tf.train.FloatList(value=[x])),
        "y": tf.train.Feature(float_list=tf.train.FloatList(value=[y])),
    })).SerializeToString()
    file_writer.write(record_bytes)

# Read the data back out.
def decode_fn(record_bytes):
  return tf.io.parse_single_example(
      # Data
      record_bytes,

      # Schema
      {"x": tf.io.FixedLenFeature([], dtype=tf.float32),
       "y": tf.io.FixedLenFeature([], dtype=tf.float32)}
  )

for batch in tf.data.TFRecordDataset([example_path]).map(decode_fn):
  print("x = {x:.4f},  y = {y:.4f}".format(**batch))
x = 0.5488,  y = 0.7152
x = 0.6028,  y = 0.5449
x = 0.4237,  y = 0.6459
x = 0.4376,  y = 0.8918

Args
`filenames`	A `tf.string` tensor or `tf.data.Dataset` containing one or more filenames.
`compression_type`	(Optional.) A `tf.string` scalar evaluating to one of `""` (no compression), `"ZLIB"`, or `"GZIP"`.
`buffer_size`	(Optional.) A `tf.int64` scalar representing the number of bytes in the read buffer. If your input pipeline is I/O bottlenecked, consider setting this parameter to a value 1-100 MBs. If `None`, a sensible default for both local and remote file systems is used.
`num_parallel_reads`	(Optional.) A `tf.int64` scalar representing the number of files to read in parallel. If greater than one, the records of files read in parallel are outputted in an interleaved order. If your input pipeline is I/O bottlenecked, consider setting this parameter to a value greater than one to parallelize the I/O. If `None`, files will be read sequentially.

Raises
`TypeError`	If any argument does not have the expected type.
`ValueError`	If any argument does not have the expected shape.

Attributes
`element_spec`	The type specification of an element of this dataset. `dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])` `dataset.element_spec` `TensorSpec(shape=(), dtype=tf.int32, name=None)` For more information, read this guide.

Attributes

element_spec

The type specification of an element of this dataset.

dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
dataset.element_spec
TensorSpec(shape=(), dtype=tf.int32, name=None)

For more information, read this guide.

Raises
`TypeError`	if an element contains a non-`Tensor` value.
`RuntimeError`	if eager execution is not enabled.

Args
`batch_size`	A `tf.int64` scalar `tf.Tensor`, representing the number of consecutive elements of this dataset to combine in a single batch.
`drop_remainder`	(Optional.) A `tf.bool` scalar `tf.Tensor`, representing whether the last batch should be dropped in the case it has fewer than `batch_size` elements; the default behavior is not to drop the smaller batch.
`num_parallel_calls`	(Optional.) A `tf.int64` scalar `tf.Tensor`, representing the number of batches to compute asynchronously in parallel. If not specified, batches will be computed sequentially. If the value `tf.data.AUTOTUNE` is used, then the number of parallel calls is set dynamically based on available resources.
`deterministic`	(Optional.) When `num_parallel_calls` is specified, if this boolean is specified (`True` or `False`), it controls the order in which the transformation produces elements. If set to `False`, the transformation is allowed to yield elements out of order to trade determinism for performance. If not specified, the `tf.data.Options.experimental_deterministic` option (`True` by default) controls the behavior.

Args
`element_length_func`	function from element in `Dataset` to `tf.int32`, determines the length of the element, which will determine the bucket it goes into.
`bucket_boundaries`	`list<int>`, upper length boundaries of the buckets.
`bucket_batch_sizes`	`list<int>`, batch size per bucket. Length should be `len(bucket_boundaries) + 1`.
`padded_shapes`	Nested structure of `tf.TensorShape` to pass to `tf.data.Dataset.padded_batch`. If not provided, will use `dataset.output_shapes`, which will result in variable length dimensions being padded out to the maximum length in each batch.
`padding_values`	Values to pad with, passed to `tf.data.Dataset.padded_batch`. Defaults to padding with 0.
`pad_to_bucket_boundary`	bool, if `False`, will pad dimensions with unknown size to maximum length in batch. If `True`, will pad dimensions with unknown size to bucket boundary minus 1 (i.e., the maximum length in each bucket), and caller must ensure that the source `Dataset` does not contain any elements with length longer than `max(bucket_boundaries)`.
`no_padding`	`bool`, indicates whether to pad the batch features (features need to be either of type `tf.sparse.SparseTensor` or of same shape).
`drop_remainder`	(Optional.) A `tf.bool` scalar `tf.Tensor`, representing whether the last batch should be dropped in the case it has fewer than `batch_size` elements; the default behavior is not to drop the smaller batch.

Args
`generator`	A callable object that returns an object that supports the `iter()` protocol. If `args` is not specified, `generator` must take no arguments; otherwise it must take as many arguments as there are values in `args`.
`output_types`	(Optional.) A (nested) structure of `tf.DType` objects corresponding to each component of an element yielded by `generator`.
`output_shapes`	(Optional.) A (nested) structure of `tf.TensorShape` objects corresponding to each component of an element yielded by `generator`.
`args`	(Optional.) A tuple of `tf.Tensor` objects that will be evaluated and passed to `generator` as NumPy-array arguments.
`output_signature`	(Optional.) A (nested) structure of `tf.TypeSpec` objects corresponding to each component of an element yielded by `generator`.

Args
`key_func`	A function mapping a nested structure of tensors (having shapes and types defined by `self.output_shapes` and `self.output_types`) to a scalar `tf.int64` tensor.
`reduce_func`	A function mapping a key and a dataset of up to `window_size` consecutive elements matching that key to another dataset.
`window_size`	A `tf.int64` scalar `tf.Tensor`, representing the number of consecutive elements matching the same key to combine in a single batch, which will be passed to `reduce_func`. Mutually exclusive with `window_size_func`.
`window_size_func`	A function mapping a key to a `tf.int64` scalar `tf.Tensor`, representing the number of consecutive elements matching the same key to combine in a single batch, which will be passed to `reduce_func`. Mutually exclusive with `window_size`.

Args
`map_func`	A function mapping a dataset element to a dataset.
`cycle_length`	(Optional.) The number of input elements that will be processed concurrently. If not set, the tf.data runtime decides what it should be based on available CPU. If `num_parallel_calls` is set to `tf.data.AUTOTUNE`, the `cycle_length` argument identifies the maximum degree of parallelism.
`block_length`	(Optional.) The number of consecutive elements to produce from each input element before cycling to another input element. If not set, defaults to 1.
`num_parallel_calls`	(Optional.) If specified, the implementation creates a threadpool, which is used to fetch inputs from cycle elements asynchronously and in parallel. The default behavior is to fetch inputs from cycle elements synchronously with no parallelism. If the value `tf.data.AUTOTUNE` is used, then the number of parallel calls is set dynamically based on available CPU.
`deterministic`	(Optional.) When `num_parallel_calls` is specified, if this boolean is specified (`True` or `False`), it controls the order in which the transformation produces elements. If set to `False`, the transformation is allowed to yield elements out of order to trade determinism for performance. If not specified, the `tf.data.Options.experimental_deterministic` option (`True` by default) controls the behavior.

Args
`file_pattern`	A string, a list of strings, or a `tf.Tensor` of string type (scalar or vector), representing the filename glob (i.e. shell wildcard) pattern(s) that will be matched.
`shuffle`	(Optional.) If `True`, the file names will be shuffled randomly. Defaults to `True`.
`seed`	(Optional.) A `tf.int64` scalar `tf.Tensor`, representing the random seed that will be used to create the distribution. See `tf.random.set_seed` for behavior.

Args
`initial_state`	An element representing the initial state of the transformation.
`reduce_func`	A function that maps `(old_state, input_element)` to `new_state`. It must take two arguments and return a new element The structure of `new_state` must match the structure of `initial_state`.

Args
`initial_state`	A nested structure of tensors, representing the initial state of the accumulator.
`scan_func`	A function that maps `(old_state, input_element)` to `(new_state, output_element)`. It must take two arguments and return a pair of nested structures of tensors. The `new_state` must match the structure of `initial_state`.

Args
`num_shards`	A `tf.int64` scalar `tf.Tensor`, representing the number of shards operating in parallel.
`index`	A `tf.int64` scalar `tf.Tensor`, representing the worker index.

Args
`path`	Required. A directory to use for storing / loading the snapshot to / from.
`compression`	Optional. The type of compression to apply to the snapshot written to disk. Supported options are `GZIP`, `SNAPPY`, `AUTO` or None. Defaults to `AUTO`, which attempts to pick an appropriate compression algorithm for the dataset.
`reader_func`	Optional. A function to control how to read data from snapshot shards.
`shard_func`	Optional. A function to control how to shard data when writing a snapshot.

Args
`size`	A `tf.int64` scalar `tf.Tensor`, representing the number of elements of the input dataset to combine into a window. Must be positive.
`shift`	(Optional.) A `tf.int64` scalar `tf.Tensor`, representing the number of input elements by which the window moves in each iteration. Defaults to `size`. Must be positive.
`stride`	(Optional.) A `tf.int64` scalar `tf.Tensor`, representing the stride of the input elements in the sliding window. Must be positive. The default value of 1 means "retain every input element".
`drop_remainder`	(Optional.) A `tf.bool` scalar `tf.Tensor`, representing whether the last windows should be dropped if their size is smaller than `size`.

tf.data.TFRecordDataset Stay organized with collections Save and categorize content based on your preferences.

Args

Raises

Attributes

Methods

apply

as_numpy_iterator

batch

bucket_by_sequence_length

cache

cardinality

concatenate

enumerate

filter

flat_map

The type signature is:

from_generator

from_tensor_slices

from_tensors

get_single_element

Keras

Estimator

group_by_window

interleave

The type signature is:

For example:

list_files

Example:

map

options

padded_batch

prefetch

random

range

reduce

repeat

scan

shard

Important caveats:

shuffle

skip

snapshot

take

take_while

unbatch

unique

window

For example:

Shift

Stride

Nested elements

The type signature is:

Flatten a dataset of windows

with_options

zip

__bool__

__iter__

__len__

__nonzero__

tf.data.TFRecordDataset

`apply`

`as_numpy_iterator`

`batch`

`bucket_by_sequence_length`

`cache`

`cardinality`

`concatenate`

`enumerate`

`filter`

`flat_map`

`from_generator`

`from_tensor_slices`

`from_tensors`

`get_single_element`

`group_by_window`

`interleave`

`list_files`

`map`

`options`

`padded_batch`

`prefetch`

`random`

`range`

`reduce`

`repeat`

`scan`

`shard`

`shuffle`

`skip`

`snapshot`

`take`

`take_while`

`unbatch`

`unique`

`window`

`with_options`

`zip`

`bool`

`iter`

`len`

`nonzero`