tf.keras.preprocessing.image.ImageDataGenerator

Generate batches of tensor image data with real-time data augmentation.

Used in the notebooks

Used in the guide

The data will be looped over (in batches).

featurewise_center Boolean. Set input mean to 0 over the dataset, feature-wise.
samplewise_center Boolean. Set each sample mean to 0.
featurewise_std_normalization Boolean. Divide inputs by std of the dataset, feature-wise.
samplewise_std_normalization Boolean. Divide each input by its std.
zca_epsilon epsilon for ZCA whitening. Default is 1e-6.
zca_whitening Boolean. Apply ZCA whitening.
rotation_range Int. Degree range for random rotations.
width_shift_range Float, 1-D array-like or int

  • float: fraction of total width, if < 1, or pixels if >= 1.
  • 1-D array-like: random elements from the array.
  • int: integer number of pixels from interval (-width_shift_range, +width_shift_range)
  • With width_shift_range=2 possible values are integers [-1, 0, +1], same as with width_shift_range=[-1, 0, +1], while with width_shift_range=1.0 possible values are floats in the interval [-1.0, +1.0).
height_shift_range Float, 1-D array-like or int
  • float: fraction of total height, if < 1, or pixels if >= 1.
  • 1-D array-like: random elements from the array.
  • int: integer number of pixels from interval (-height_shift_range, +height_shift_range)
  • With height_shift_range=2 possible values are integers [-1, 0, +1], same as with height_shift_range=[-1, 0, +1], while with height_shift_range=1.0 possible values are floats in the interval [-1.0, +1.0).
  • brightness_range Tuple or list of two floats. Range for picking a brightness shift value from.
    shear_range Float. Shear Intensity (Shear angle in counter-clockwise direction in degrees)
    zoom_range Float or [lower, upper]. Range for random zoom. If a float, [lower, upper] = [1-zoom_range, 1+zoom_range].
    channel_shift_range Float. Range for random channel shifts.
    fill_mode One of {"constant", "nearest", "reflect" or "wrap"}. Default is 'nearest'. Points outside the boundaries of the input are filled according to the given mode:
  • 'constant': kkkkkkkk|abcd|kkkkkkkk (cval=k)
  • 'nearest': aaaaaaaa|abcd|dddddddd
  • 'reflect': abcddcba|abcd|dcbaabcd
  • 'wrap': abcdabcd|abcd|abcdabcd
  • cval Float or Int. Value used for points outside the boundaries when fill_mode = "constant".
    horizontal_flip Boolean. Randomly flip inputs horizontally.
    vertical_flip Boolean. Randomly flip inputs vertically.
    rescale rescaling factor. Defaults to None. If None or 0, no rescaling is applied, otherwise we multiply the data by the value provided (after applying all other transformations).
    preprocessing_function function that will be applied on each input. The function will run after the image is resized and augmented. The function should take one argument: one image (Numpy tensor with rank 3), and should output a Numpy tensor with the same shape.
    data_format Image data format, either "channels_first" or "channels_last". "channels_last" mode means that the images should have shape (samples, height, width, channels), "channels_first" mode means that the images should have shape (samples, channels, height, width). It defaults to the image_data_format value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "channels_last".
    validation_split Float. Fraction of images reserved for validation (strictly between 0 and 1).
    dtype Dtype to use for the generated arrays.

    ValueError If the value of the argument, data_format is other than "channels_last" or "channels_first".
    ValueError If the value of the argument, validation_split > 1 or validation_split < 0.

    Examples:

    Example of using .flow(x, y):

    (x_train, y_train), (x_test, y_test) = cifar10.load_data()
    y_train = utils.to_categorical(y_train, num_classes)
    y_test = utils.to_categorical(y_test, num_classes)
    datagen = ImageDataGenerator(
        featurewise_center=True,
        featurewise_std_normalization=True,
        rotation_range=20,
        width_shift_range=0.2,
        height_shift_range=0.2,
        horizontal_flip=True,
        validation_split=0.2)
    # compute quantities required for featurewise normalization
    # (std, mean, and principal components if ZCA whitening is applied)
    datagen.fit(x_train)
    # fits the model on batches with real-time data augmentation:
    model.fit(datagen.flow(x_train, y_train, batch_size=32,
             subset='training'),
             validation_data=datagen.flow(x_train, y_train,
             batch_size=8, subset='validation'),
             steps_per_epoch=len(x_train) / 32, epochs=epochs)
    # here's a more "manual" example
    for e in range(epochs):
        print('Epoch', e)
        batches = 0
        for x_batch, y_batch in datagen.flow(x_train, y_train, batch_size=32):
            model.fit(x_batch, y_batch)
            batches += 1
            if batches >= len(x_train) / 32:
                # we need to break the loop by hand because
                # the generator loops indefinitely
                break
    

    Example of using .flow_from_directory(directory):