tf_agents.bandits.policies.constraints.AbsoluteConstraint

Class for representing a trainable absolute value constraint.

Inherits From: NeuralConstraint, BaseConstraint

This constraint class implements an absolute value constraint such as

expected_value(action) >= absolute_value

or

expected_value(action) <= absolute_value

time_step_spec A TimeStep spec of the expected time_steps.
action_spec A nest of BoundedTensorSpec representing the actions.
constraint_network An instance of tf_agents.network.Network used to provide estimates of action feasibility. The input structure should be consistent with the observation_spec.
error_loss_fn A function for computing the loss used to train the constraint network. The default is tf.losses.mean_squared_error.
comparator_fn a comparator function, such as tf.greater or tf.less.
absolute_value the threshold value we want to use in the constraint.
name Python str name of this agent. All variables in this module will fall under that name. Defaults to the class name.

constraint_network

observation_spec

Methods

compute_loss

View source

Computes loss for training the constraint network.

Args
observations A batch of observations.
actions A batch of actions.
rewards A batch of rewards.
weights Optional scalar or elementwise (per-batch-entry) importance weights. The output batch loss will be scaled by these weights, and the final scalar loss is the mean of these values.
training Whether the loss is being used for training.

Returns
loss A Tensor containing the loss for the training step.

initialize

View source

Returns an op to initialize the constraint.

__call__

View source

Returns the probability of input actions being feasible.