tf.data.experimental.make_batched_features_dataset
Stay organized with collections
Save and categorize content based on your preferences.
Returns a Dataset
of feature dictionaries from Example
protos.
tf.data.experimental.make_batched_features_dataset(
file_pattern,
batch_size,
features,
reader=None,
label_key=None,
reader_args=None,
num_epochs=None,
shuffle=True,
shuffle_buffer_size=10000,
shuffle_seed=None,
prefetch_buffer_size=None,
reader_num_threads=None,
parser_num_threads=None,
sloppy_ordering=False,
drop_final_batch=False
)
Used in the notebooks
If label_key argument is provided, returns a Dataset
of tuple
comprising of feature dictionaries and label.
Example:
serialized_examples = [
features {
feature { key: "age" value { int64_list { value: [ 0 ] } } }
feature { key: "gender" value { bytes_list { value: [ "f" ] } } }
feature { key: "kws" value { bytes_list { value: [ "code", "art" ] } } }
},
features {
feature { key: "age" value { int64_list { value: [] } } }
feature { key: "gender" value { bytes_list { value: [ "f" ] } } }
feature { key: "kws" value { bytes_list { value: [ "sports" ] } } }
}
]
We can use arguments:
features: {
"age": FixedLenFeature([], dtype=tf.int64, default_value=-1),
"gender": FixedLenFeature([], dtype=tf.string),
"kws": VarLenFeature(dtype=tf.string),
}
And the expected output is:
{
"age": [[0], [-1]],
"gender": [["f"], ["f"]],
"kws": SparseTensor(
indices=[[0, 0], [0, 1], [1, 0]],
values=["code", "art", "sports"]
dense_shape=[2, 2]),
}
Args |
file_pattern
|
List of files or patterns of file paths containing
Example records. See tf.io.gfile.glob for pattern rules.
|
batch_size
|
An int representing the number of records to combine
in a single batch.
|
features
|
A dict mapping feature keys to FixedLenFeature or
VarLenFeature values. See tf.io.parse_example .
|
reader
|
A function or class that can be
called with a filenames tensor and (optional) reader_args and returns
a Dataset of Example tensors. Defaults to tf.data.TFRecordDataset .
|
label_key
|
(Optional) A string corresponding to the key labels are stored in
tf.Examples . If provided, it must be one of the features key,
otherwise results in ValueError .
|
reader_args
|
Additional arguments to pass to the reader class.
|
num_epochs
|
Integer specifying the number of times to read through the
dataset. If None, cycles through the dataset forever. Defaults to None .
|
shuffle
|
A boolean, indicates whether the input should be shuffled. Defaults
to True .
|
shuffle_buffer_size
|
Buffer size of the ShuffleDataset. A large capacity
ensures better shuffling but would increase memory usage and startup time.
|
shuffle_seed
|
Randomization seed to use for shuffling.
|
prefetch_buffer_size
|
Number of feature batches to prefetch in order to
improve performance. Recommended value is the number of batches consumed
per training step. Defaults to auto-tune.
|
reader_num_threads
|
Number of threads used to read Example records. If >1,
the results will be interleaved. Defaults to 1 .
|
parser_num_threads
|
Number of threads to use for parsing Example tensors
into a dictionary of Feature tensors. Defaults to 2 .
|
sloppy_ordering
|
If True , reading performance will be improved at
the cost of non-deterministic ordering. If False , the order of elements
produced is deterministic prior to shuffling (elements are still
randomized if shuffle=True . Note that if the seed is set, then order
of elements after shuffling is deterministic). Defaults to False .
|
drop_final_batch
|
If True , and the batch size does not evenly divide the
input dataset size, the final smaller batch will be dropped. Defaults to
False .
|
Returns |
A dataset of dict elements, (or a tuple of dict elements and label).
Each dict maps feature keys to Tensor or SparseTensor objects.
|
Raises |
TypeError
|
If reader is of the wrong type.
|
ValueError
|
If label_key is not one of the features keys.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2024-04-26 UTC.
[null,null,["Last updated 2024-04-26 UTC."],[],[],null,["# tf.data.experimental.make_batched_features_dataset\n\n\u003cbr /\u003e\n\n|----------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v2.16.1/tensorflow/python/data/experimental/ops/readers.py#L915-L1096) |\n\nReturns a `Dataset` of feature dictionaries from `Example` protos. \n\n tf.data.experimental.make_batched_features_dataset(\n file_pattern,\n batch_size,\n features,\n reader=None,\n label_key=None,\n reader_args=None,\n num_epochs=None,\n shuffle=True,\n shuffle_buffer_size=10000,\n shuffle_seed=None,\n prefetch_buffer_size=None,\n reader_num_threads=None,\n parser_num_threads=None,\n sloppy_ordering=False,\n drop_final_batch=False\n )\n\n### Used in the notebooks\n\n| Used in the tutorials |\n|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| - [Preprocessing data with TensorFlow Transform](https://www.tensorflow.org/tfx/tutorials/transform/census) - [Graph-based Neural Structured Learning in TFX](https://www.tensorflow.org/tfx/tutorials/tfx/neural_structured_learning) |\n\nIf label_key argument is provided, returns a `Dataset` of tuple\ncomprising of feature dictionaries and label.\n\n#### Example:\n\n serialized_examples = [\n features {\n feature { key: \"age\" value { int64_list { value: [ 0 ] } } }\n feature { key: \"gender\" value { bytes_list { value: [ \"f\" ] } } }\n feature { key: \"kws\" value { bytes_list { value: [ \"code\", \"art\" ] } } }\n },\n features {\n feature { key: \"age\" value { int64_list { value: [] } } }\n feature { key: \"gender\" value { bytes_list { value: [ \"f\" ] } } }\n feature { key: \"kws\" value { bytes_list { value: [ \"sports\" ] } } }\n }\n ]\n\n#### We can use arguments:\n\n features: {\n \"age\": FixedLenFeature([], dtype=tf.int64, default_value=-1),\n \"gender\": FixedLenFeature([], dtype=tf.string),\n \"kws\": VarLenFeature(dtype=tf.string),\n }\n\nAnd the expected output is: \n\n {\n \"age\": [[0], [-1]],\n \"gender\": [[\"f\"], [\"f\"]],\n \"kws\": SparseTensor(\n indices=[[0, 0], [0, 1], [1, 0]],\n values=[\"code\", \"art\", \"sports\"]\n dense_shape=[2, 2]),\n }\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `file_pattern` | List of files or patterns of file paths containing `Example` records. See [`tf.io.gfile.glob`](../../../tf/io/gfile/glob) for pattern rules. |\n| `batch_size` | An int representing the number of records to combine in a single batch. |\n| `features` | A `dict` mapping feature keys to `FixedLenFeature` or `VarLenFeature` values. See [`tf.io.parse_example`](../../../tf/io/parse_example). |\n| `reader` | A function or class that can be called with a `filenames` tensor and (optional) `reader_args` and returns a `Dataset` of `Example` tensors. Defaults to [`tf.data.TFRecordDataset`](../../../tf/data/TFRecordDataset). |\n| `label_key` | (Optional) A string corresponding to the key labels are stored in `tf.Examples`. If provided, it must be one of the `features` key, otherwise results in `ValueError`. |\n| `reader_args` | Additional arguments to pass to the reader class. |\n| `num_epochs` | Integer specifying the number of times to read through the dataset. If None, cycles through the dataset forever. Defaults to `None`. |\n| `shuffle` | A boolean, indicates whether the input should be shuffled. Defaults to `True`. |\n| `shuffle_buffer_size` | Buffer size of the ShuffleDataset. A large capacity ensures better shuffling but would increase memory usage and startup time. |\n| `shuffle_seed` | Randomization seed to use for shuffling. |\n| `prefetch_buffer_size` | Number of feature batches to prefetch in order to improve performance. Recommended value is the number of batches consumed per training step. Defaults to auto-tune. |\n| `reader_num_threads` | Number of threads used to read `Example` records. If \\\u003e1, the results will be interleaved. Defaults to `1`. |\n| `parser_num_threads` | Number of threads to use for parsing `Example` tensors into a dictionary of `Feature` tensors. Defaults to `2`. |\n| `sloppy_ordering` | If `True`, reading performance will be improved at the cost of non-deterministic ordering. If `False`, the order of elements produced is deterministic prior to shuffling (elements are still randomized if `shuffle=True`. Note that if the seed is set, then order of elements after shuffling is deterministic). Defaults to `False`. |\n| `drop_final_batch` | If `True`, and the batch size does not evenly divide the input dataset size, the final smaller batch will be dropped. Defaults to `False`. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A dataset of `dict` elements, (or a tuple of `dict` elements and label). Each `dict` maps feature keys to `Tensor` or `SparseTensor` objects. ||\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|--------------|---------------------------------------------------|\n| `TypeError` | If `reader` is of the wrong type. |\n| `ValueError` | If `label_key` is not one of the `features` keys. |\n\n\u003cbr /\u003e"]]