tf.data.experimental.CsvDataset

A Dataset comprising lines from one or more CSV files.

Inherits From: Dataset

Used in the notebooks

Used in the guide Used in the tutorials

The tf.data.experimental.CsvDataset class provides a minimal CSV Dataset interface. There is also a richer tf.data.experimental.make_csv_dataset function which provides additional convenience features such as column header parsing, column type-inference, automatic shuffling, and file interleaving.

The elements of this dataset correspond to records from the file(s). RFC 4180 format is expected for CSV files (https://tools.ietf.org/html/rfc4180) Note that we allow leading and trailing spaces for int or float fields.

For example, suppose we have a file 'my_file0.csv' with four CSV columns of different data types:

with open('/tmp/my_file0.csv', 'w') as f:
  f.write('abcdefg,4.28E10,5.55E6,12\n')
  f.write('hijklmn,-5.3E14,,2\n')

We can construct a CsvDataset from it as follows:

dataset = tf.data.experimental.CsvDataset(
  "/tmp/my_file0.csv",
  [tf.float32,  # Required field, use dtype or empty tensor
   tf.constant([0.0], dtype=tf.float32),  # Optional field, default to 0.0
   tf.int32,  # Required field, use dtype or empty tensor
  ],
  select_cols=[1,2,3]  # Only parse last three columns
)

The expected output of its iterations is:

for element in dataset.as_numpy_iterator():