Join the SIG TFX-Addons community and help make TFX even better!


Options for TFXIO's TensorFlowDataset.

batch_size An int representing the number of records to combine in a single batch.
drop_final_batch If True, and the batch size does not evenly divide the input dataset size, the final smaller batch will be dropped. Defaults to False.
num_epochs Integer specifying the number of times to read through the dataset. If None, cycles through the dataset forever. Defaults to None.
shuffle A boolean, indicates whether the input should be shuffled. Defaults to True.
shuffle_buffer_size Buffer size of the items to shuffle. The size is the number of items (i.e. records for a record based TFXIO) to hold. Only data read into the buffer will be shuffled (there is no shuffling across buffers). A large capacity ensures better shuffling but would increase memory usage and startup time.
shuffle_seed Randomization seed to use for shuffling.
prefetch_buffer_size Number of feature batches to prefetch in order to improve performance. Recommended value is the number of batches consumed per training step. Defaults to auto-tune.
reader_num_threads Number of threads used to read records. If >1, the results will be interleaved. Defaults to
parser_num_threads Number of threads to use for parsing Example tensors into a dictionary of Feature tensors (if applicable). Defaults to auto-tune.
sloppy_ordering If True, reading performance will be improved at the cost of non-deterministic ordering. If False, the order of elements produced is deterministic prior to shuffling (elements are still randomized if shuffle=True. Note that if the seed is set, then order of elements after shuffling is deterministic). Defaults to False.
label_key name of the label tensor. If provided, the returned dataset will yield Tuple[Dict[str, Tensor], Tensor], where the second term in the tuple is the label tensor and the dict (the first term) will not contain the label feature.