tfx_bsl.public.tfxio.TensorFlowDatasetOptions

Options for TFXIO's TensorFlowDataset.

tfx_bsl.public.tfxio.TensorFlowDatasetOptions(
    batch_size: int,
    drop_final_batch: bool = False,
    num_epochs: Optional[int] = None,
    shuffle: bool = True,
    shuffle_buffer_size: int = 10000,
    shuffle_seed: Optional[int] = None,
    prefetch_buffer_size: int = tf.data.experimental.AUTOTUNE,
    reader_num_threads: int = tf.data.experimental.AUTOTUNE,
    parser_num_threads: int = tf.data.experimental.AUTOTUNE,
    sloppy_ordering: bool = False,
    label_key: Optional[str] = None
)

Used in the notebooks

Used in the tutorials
Using TensorFlow Recommenders with TFX Better ML Engineering with ML Metadata TFX Keras Component Tutorial Reading data from BigQuery with TFX and Vertex Pipelines Simple TFX Pipeline for Vertex Pipelines

Args
`batch_size`	An int representing the number of records to combine in a single batch.
`drop_final_batch`	If `True`, and the batch size does not evenly divide the input dataset size, the final smaller batch will be dropped. Defaults to `False`.
`num_epochs`	Integer specifying the number of times to read through the dataset. If None, cycles through the dataset forever. Defaults to `None`.
`shuffle`	A boolean, indicates whether the input should be shuffled. Defaults to `True`.
`shuffle_buffer_size`	Buffer size of the items to shuffle. The size is the number of items (i.e. records for a record based TFXIO) to hold. Only data read into the buffer will be shuffled (there is no shuffling across buffers). A large capacity ensures better shuffling but would increase memory usage and startup time.
`shuffle_seed`	Randomization seed to use for shuffling.
`prefetch_buffer_size`	Number of feature batches to prefetch in order to improve performance. Recommended value is the number of batches consumed per training step. Defaults to auto-tune.
`reader_num_threads`	Number of threads used to read records. If >1, the results will be interleaved. Defaults to tf.data.experimental.AUTOTUNE.
`parser_num_threads`	Number of threads to use for parsing `Example` tensors into a dictionary of `Feature` tensors (if applicable). Defaults to auto-tune.
`sloppy_ordering`	If `True`, reading performance will be improved at the cost of non-deterministic ordering. If `False`, the order of elements produced is deterministic prior to shuffling (elements are still randomized if `shuffle=True`. Note that if the seed is set, then order of elements after shuffling is deterministic). Defaults to False.
`label_key`	name of the label tensor. If provided, the returned dataset will yield Tuple[Dict[str, Tensor], Tensor], where the second term in the tuple is the label tensor and the dict (the first term) will not contain the label feature.

Attributes
`batch_size`	A `namedtuple` alias for field number 0
`drop_final_batch`	A `namedtuple` alias for field number 1
`num_epochs`	A `namedtuple` alias for field number 2
`shuffle`	A `namedtuple` alias for field number 3
`shuffle_buffer_size`	A `namedtuple` alias for field number 4
`shuffle_seed`	A `namedtuple` alias for field number 5
`prefetch_buffer_size`	A `namedtuple` alias for field number 6
`reader_num_threads`	A `namedtuple` alias for field number 7
`parser_num_threads`	A `namedtuple` alias for field number 8
`sloppy_ordering`	A `namedtuple` alias for field number 9
`label_key`	A `namedtuple` alias for field number 10

tfx_bsl.public.tfxio.TensorFlowDatasetOptions

Used in the notebooks

Args

Attributes