ML Community Day is November 9! Join us for updates from TensorFlow, JAX, and more Learn more

A Dataset comprising lines from one or more CSV files.

Inherits From: Dataset

Used in the notebooks

Used in the guide Used in the tutorials

The class provides a minimal CSV Dataset interface. There is also a richer function which provides additional convenience features such as column header parsing, column type-inference, automatic shuffling, and file interleaving.

The elements of this dataset correspond to records from the file(s). RFC 4180 format is expected for CSV files ( Note that we allow leading and trailing spaces for int or float fields.

For example, suppose we have a file 'my_file0.csv' with four CSV columns of different data types:

with open('/tmp/my_file0.csv', 'w') as f:

We can construct a CsvDataset from it as follows:

dataset =
  [tf.float32,  # Required field, use dtype or empty tensor
   tf.constant([0.0], dtype=tf.float32),  # Optional field, default to 0.0
   tf.int32,  # Required field, use dtype or empty tensor
  select_cols=[1,2,3]  # Only parse last three columns

The expected output of its iterations is:

for element in dataset.as_numpy_iterator():
(4.28e10, 5.55e6, 12)
(-5.3e14, 0.0, 2)

See for more in-depth example usage.

filenames A tf.string tensor containing one or more filenames.
record_defaults A list of default values for the CSV fields. Each item in the list is either a valid CSV DType (float32, float64, int32, int64, string), or a Tensor object with one of the above types. One per column of CSV data, with either a scalar Tensor default value for the column if it is optional, or DType or empty Tensor if required. If both this and select_columns are specified, these must have the same lengths, and column_defaults is assumed to be sorted in order of increasing column index. If both this and 'ex