tfr.data.read_batched_sequence_example_dataset
Stay organized with collections
Save and categorize content based on your preferences.
Returns a Dataset
of features from SequenceExample
.
tfr.data.read_batched_sequence_example_dataset(
file_pattern,
batch_size,
list_size,
context_feature_spec,
example_feature_spec,
reader=tfr.keras.pipeline.DatasetHparams.dataset_reader
,
reader_args=None,
num_epochs=None,
shuffle=True,
shuffle_buffer_size=1000,
shuffle_seed=None,
prefetch_buffer_size=32,
reader_num_threads=10,
sloppy_ordering=True,
drop_final_batch=False
)
Example:
data = [
sequence_example {
context {
feature {
key: "query_length"
value { int64_list { value: 3 } }
}
}
feature_lists {
feature_list {
key: "unigrams"
value {
feature { bytes_list { value: "tensorflow" } }
feature { bytes_list { value: ["learning" "to" "rank"] } }
}
}
feature_list {
key: "utility"
value {
feature { float_list { value: 0.0 } }
feature { float_list { value: 1.0 } }
}
}
}
}
sequence_example {
context {
feature {
key: "query_length"
value { int64_list { value: 2 } }
}
}
feature_lists {
feature_list {
key: "unigrams"
value {
feature { bytes_list { value: "gbdt" } }
feature { }
}
}
feature_list {
key: "utility"
value {
feature { float_list { value: 0.0 } }
feature { float_list { value: 0.0 } }
}
}
}
}
]
We can use arguments:
context_features: {
"query_length": parsing_ops.FixedLenFeature([1], dtypes.int64)
}
example_features: {
"unigrams": parsing_ops.VarLenFeature(dtypes.string),
"utility": parsing_ops.FixedLenFeature([1], dtypes.float32,
default_value=[0.])
}
batch_size: 2
And the expected output is:
{
"unigrams": SparseTensor(
indices=array([[0, 0, 0], [0, 1, 0], [0, 1, 1], [0, 1, 2], [1, 0, 0], [1,
1, 0], [1, 1, 1]]),
values=["tensorflow", "learning", "to", "rank", "gbdt"],
dense_shape=array([2, 2, 3])),
"utility": [[[ 0.], [ 1.]], [[ 0.], [ 0.]]],
"query_length": [[3], [2]],
}
Args |
file_pattern
|
(str | list(str)) List of files or patterns of file paths
containing tf.SequenceExample protos. See tf.gfile.Glob for pattern
rules.
|
batch_size
|
(int) Number of records to combine in a single batch.
|
list_size
|
(int) The number of frames to keep in a SequenceExample. If
specified, truncation or padding may happen. Otherwise, set it to None to
allow dynamic list size.
|
context_feature_spec
|
(dict) A mapping from feature keys to
FixedLenFeature or VarLenFeature values.
|
example_feature_spec
|
(dict) A mapping feature keys to FixedLenFeature or
VarLenFeature values.
|
reader
|
A function or class that can be called with a filenames tensor and
(optional) reader_args and returns a Dataset . Defaults to
tf.data.TFRecordDataset .
|
reader_args
|
(list) Additional argument list to pass to the reader class.
|
num_epochs
|
(int) Number of times to read through the dataset. If None,
cycles through the dataset forever. Defaults to None .
|
shuffle
|
(bool) Indicates whether the input should be shuffled. Defaults to
True .
|
shuffle_buffer_size
|
(int) Buffer size of the ShuffleDataset. A large
capacity ensures better shuffling but would increase memory usage and
startup time.
|
shuffle_seed
|
(int) Randomization seed to use for shuffling.
|
prefetch_buffer_size
|
(int) Number of feature batches to prefetch in order
to improve performance. Recommended value is the number of batches
consumed per training step (default is 1).
|
reader_num_threads
|
(int) Number of threads used to read records. If greater
than 1, the results will be interleaved.
|
sloppy_ordering
|
(bool) If True , reading performance will be improved at
the cost of non-deterministic ordering. If False , the order of elements
produced is deterministic prior to shuffling (elements are still
randomized if shuffle=True . Note that if the seed is set, then order of
elements after shuffling is deterministic). Defaults to False .
|
drop_final_batch
|
(bool) If True , and the batch size does not evenly
divide the input dataset size, the final smaller batch will be dropped.
Defaults to False . If True , the batch_size can be statically inferred.
|
Returns |
A dataset of dict elements. Each dict maps feature keys to
Tensor or SparseTensor objects. The context features are mapped to a
rank-2 tensor of shape [batch_size, feature_size], and the example features
are mapped to a rank-3 tensor of shape [batch_size, list_size,
feature_size], where list_size is the number of examples.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2023-08-18 UTC.
[null,null,["Last updated 2023-08-18 UTC."],[],[],null,["# tfr.data.read_batched_sequence_example_dataset\n\n\u003cbr /\u003e\n\n|--------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/ranking/blob/v0.5.3/tensorflow_ranking/python/data.py#L1149-L1310) |\n\nReturns a `Dataset` of features from `SequenceExample`. \n\n tfr.data.read_batched_sequence_example_dataset(\n file_pattern,\n batch_size,\n list_size,\n context_feature_spec,\n example_feature_spec,\n reader=../../tfr/keras/pipeline/DatasetHparams/dataset_reader,\n reader_args=None,\n num_epochs=None,\n shuffle=True,\n shuffle_buffer_size=1000,\n shuffle_seed=None,\n prefetch_buffer_size=32,\n reader_num_threads=10,\n sloppy_ordering=True,\n drop_final_batch=False\n )\n\n#### Example:\n\n data = [\n sequence_example {\n context {\n feature {\n key: \"query_length\"\n value { int64_list { value: 3 } }\n }\n }\n feature_lists {\n feature_list {\n key: \"unigrams\"\n value {\n feature { bytes_list { value: \"tensorflow\" } }\n feature { bytes_list { value: [\"learning\" \"to\" \"rank\"] } }\n }\n }\n feature_list {\n key: \"utility\"\n value {\n feature { float_list { value: 0.0 } }\n feature { float_list { value: 1.0 } }\n }\n }\n }\n }\n sequence_example {\n context {\n feature {\n key: \"query_length\"\n value { int64_list { value: 2 } }\n }\n }\n feature_lists {\n feature_list {\n key: \"unigrams\"\n value {\n feature { bytes_list { value: \"gbdt\" } }\n feature { }\n }\n }\n feature_list {\n key: \"utility\"\n value {\n feature { float_list { value: 0.0 } }\n feature { float_list { value: 0.0 } }\n }\n }\n }\n }\n ]\n\n#### We can use arguments:\n\n context_features: {\n \"query_length\": parsing_ops.FixedLenFeature([1], dtypes.int64)\n }\n example_features: {\n \"unigrams\": parsing_ops.VarLenFeature(dtypes.string),\n \"utility\": parsing_ops.FixedLenFeature([1], dtypes.float32,\n default_value=[0.])\n }\n batch_size: 2\n\nAnd the expected output is: \n\n {\n \"unigrams\": SparseTensor(\n indices=array([[0, 0, 0], [0, 1, 0], [0, 1, 1], [0, 1, 2], [1, 0, 0], [1,\n 1, 0], [1, 1, 1]]),\n values=[\"tensorflow\", \"learning\", \"to\", \"rank\", \"gbdt\"],\n dense_shape=array([2, 2, 3])),\n \"utility\": [[[ 0.], [ 1.]], [[ 0.], [ 0.]]],\n \"query_length\": [[3], [2]],\n }\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `file_pattern` | (str \\| list(str)) List of files or patterns of file paths containing tf.SequenceExample protos. See `tf.gfile.Glob` for pattern rules. |\n| `batch_size` | (int) Number of records to combine in a single batch. |\n| `list_size` | (int) The number of frames to keep in a SequenceExample. If specified, truncation or padding may happen. Otherwise, set it to None to allow dynamic list size. |\n| `context_feature_spec` | (dict) A mapping from feature keys to `FixedLenFeature` or `VarLenFeature` values. |\n| `example_feature_spec` | (dict) A mapping feature keys to `FixedLenFeature` or `VarLenFeature` values. |\n| `reader` | A function or class that can be called with a `filenames` tensor and (optional) `reader_args` and returns a `Dataset`. Defaults to [`tf.data.TFRecordDataset`](https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset). |\n| `reader_args` | (list) Additional argument list to pass to the reader class. |\n| `num_epochs` | (int) Number of times to read through the dataset. If None, cycles through the dataset forever. Defaults to `None`. |\n| `shuffle` | (bool) Indicates whether the input should be shuffled. Defaults to `True`. |\n| `shuffle_buffer_size` | (int) Buffer size of the ShuffleDataset. A large capacity ensures better shuffling but would increase memory usage and startup time. |\n| `shuffle_seed` | (int) Randomization seed to use for shuffling. |\n| `prefetch_buffer_size` | (int) Number of feature batches to prefetch in order to improve performance. Recommended value is the number of batches consumed per training step (default is 1). |\n| `reader_num_threads` | (int) Number of threads used to read records. If greater than 1, the results will be interleaved. |\n| `sloppy_ordering` | (bool) If `True`, reading performance will be improved at the cost of non-deterministic ordering. If `False`, the order of elements produced is deterministic prior to shuffling (elements are still randomized if `shuffle=True`. Note that if the seed is set, then order of elements after shuffling is deterministic). Defaults to `False`. |\n| `drop_final_batch` | (bool) If `True`, and the batch size does not evenly divide the input dataset size, the final smaller batch will be dropped. Defaults to `False`. If `True`, the batch_size can be statically inferred. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A dataset of `dict` elements. Each `dict` maps feature keys to `Tensor` or `SparseTensor` objects. The context features are mapped to a rank-2 tensor of shape \\[batch_size, feature_size\\], and the example features are mapped to a rank-3 tensor of shape \\[batch_size, list_size, feature_size\\], where list_size is the number of examples. ||\n\n\u003cbr /\u003e"]]