nsl.keras.adversarial_loss

Computes the adversarial loss for model given features and labels.

This utility function adds adversarial perturbations to the input features, runs the model on the perturbed features for predictions, and returns the corresponding loss loss_fn(labels, model(perturbed_features)). This function can be used in a Keras subclassed model and a custom training loop. This can also be used freely as a helper function in eager execution mode.

The adversarial perturbation is based on the gradient of the labeled loss on the original input features, i.e. loss_fn(labels, model(features)). Therefore, this function needs to compute the model's predictions on the input features as model(features), and the labeled loss as loss_fn(labels, predictions). If predictions or labeled loss have already been computed, they can be passed in via the predictions and labeled_loss arguments in order to save computational resources. Note that in eager execution mode, gradient_tape needs to be set accordingly when passing in predictions or labeled_loss, so that the gradient can be computed correctly.

Example:

# A linear regression model (for demonstrating the usage only)
model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(2,))])
loss_fn = tf.keras.losses.MeanSquaredError()
optimizer = tf.keras.optimizers.SGD()

# Custom training loop. (The actual training data is omitted for clarity.)
for x, y in train_dataset:
  with tf.GradientTape() as tape_w:

    # A separate GradientTape is needed for watching the input.
    with tf.GradientTape() as tape_x:
      tape_x.watch(x)

      # Regular forward pass.
      labeled_loss = loss_fn(y, model(x))

    # Calculates the adversarial loss. This will reuse labeled_loss and will
    # consume tape_x.
    adv_loss = nsl.keras.adversarial_loss(
        x, y, model, loss_fn, labeled_loss=labeled_loss, gradient_tape=tape_x)

    # Combines both losses. This could also be a weighted combination.
    total_loss = labeled_loss + adv_loss

  # Regular backward pass.
  gradients = tape_w.gradient(total_loss, model.trainable_variables)
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))

features Input features, should be a Tensor or a collection of Tensor objects. If it is a collection, the first dimension of all Tensor objects inside should be the same (i.e. batch size).
labels Target labels.
model A callable that takes features as inputs and computes predictions as outputs. An example would be a tf.keras.Model object.
loss_fn A callable which calcualtes labeled loss from labels, predictions, and sample_weight. An example would be a tf.keras.losses.Loss object.
sample_weights (optional) A 1-D Tensor of weights for the examples, with the same length as the first dimension of features.
adv_config (optional) An nsl.configs.AdvRegConfig object for adversarial regularization hyperparameters. Use nsl.configs.make_adv_reg_config to construct one.
predictions (optional) Precomputed value of model(features). If set, the value will be reused when calculating adversarial regularization. In eager mode, the gradient_tape has to be set as well.
labeled_loss (optional) Precomputed value of loss_fn(labels, model(features)). If set, the value will be reused when calculating adversarial regularization. In eager mode, the gradient_tape has to be set as well.
gradient_tape (optional) A tf.GradientTape object watching features.
model_kwargs (optional) A dictionary of additional keyword arguments to be passed to the model.

A Tensor for adversarial regularization loss, i.e. labeled loss on adversarially perturbed features.