tf.contrib.data.sloppy_interleave

View source on GitHub

A non-deterministic version of the Dataset.interleave() transformation. (deprecated)

sloppy_interleave() maps map_func across dataset, and non-deterministically interleaves the results.

The resulting dataset is almost identical to interleave. The key difference is that if retrieving a value from a given output iterator would cause get_next to block, that iterator will be skipped, and consumed when next available. If consuming from all iterators would cause the get_next call to block, the get_next call blocks until the first value is available.

If the underlying datasets produce elements as fast as they are consumed, the sloppy_interleave transformation behaves identically to interleave. However, if an underlying dataset would block the consumer, sloppy_interleave can violate the round-robin order (that interleave strictly obeys), producing an element from a different underlying dataset instead.

Example usage:

# Preprocess 4 files concurrently.
filenames = tf.data.Dataset.list_files("/path/to/data/train*.tfrecords")
dataset = filenames.apply(
    tf.contrib.data.sloppy_interleave(
        lambda filename: tf.data.TFRecordDataset(filename),
        cycle_length=4))

map_func A function mapping a nested structure of tensors (having shapes and types defined by self.output_shapes and self.output_types) to a Dataset.
cycle_length The number of input Datasets to interleave from in parallel.
block_length The number of consecutive elements to pull from an input Dataset before advancing to the next input Dataset. Note: sloppy_interleave will skip the remainder of elements in the block_length in order to avoid blocking.

A Dataset transformation function, which can be passed to tf.data.Dataset.apply.