tf_agents.bandits.policies.constraints.AbsoluteConstraint

Class for representing a trainable absolute value constraint.

Inherits From: NeuralConstraint, BaseConstraint

tf_agents.bandits.policies.constraints.AbsoluteConstraint(
    time_step_spec: tf_agents.typing.types.TimeStep,
    action_spec: tf_agents.typing.types.BoundedTensorSpec,
    constraint_network: tf_agents.typing.types.Network,
    error_loss_fn: tf_agents.typing.types.LossFn = tf.compat.v1.losses.mean_squared_error,
    comparator_fn: tf_agents.typing.types.ComparatorFn = tf.greater,
    absolute_value: float = 0.0,
    name: Text = 'AbsoluteConstraint'
)

This constraint class implements an absolute value constraint such as

expected_value(action) >= absolute_value

expected_value(action) <= absolute_value

Args
`time_step_spec`	A `TimeStep` spec of the expected time_steps.
`action_spec`	A nest of `BoundedTensorSpec` representing the actions.
`constraint_network`	An instance of `tf_agents.network.Network` used to provide estimates of action feasibility. The input structure should be consistent with the `observation_spec`.
`error_loss_fn`	A function for computing the loss used to train the constraint network. The default is `tf.losses.mean_squared_error`.
`comparator_fn`	a comparator function, such as tf.greater or tf.less.
`absolute_value`	the threshold value we want to use in the constraint.
`name`	Python str name of this agent. All variables in this module will fall under that name. Defaults to the class name.

Attributes
`constraint_network`
`observation_spec`

Attributes

constraint_network

observation_spec

Methods

`compute_loss`

View source

compute_loss(
    observations: tf_agents.typing.types.NestedTensor,
    actions: tf_agents.typing.types.NestedTensor,
    rewards: tf_agents.typing.types.Tensor,
    weights: Optional[types.Float] = None,
    training: bool = False
) -> tf_agents.typing.types.Tensor

Computes loss for training the constraint network.

Args
`observations`	A batch of observations.
`actions`	A batch of actions.
`rewards`	A batch of rewards.
`weights`	Optional scalar or elementwise (per-batch-entry) importance weights. The output batch loss will be scaled by these weights, and the final scalar loss is the mean of these values.
`training`	Whether the loss is being used for training.

Returns
`loss`	A `Tensor` containing the loss for the training step.

`initialize`

View source

initialize()

Returns an op to initialize the constraint.

`call`

View source

__call__(
    observation, actions=None
)

Returns the probability of input actions being feasible.

tf_agents.bandits.policies.constraints.AbsoluteConstraint

Args

Attributes

Methods

compute_loss

initialize

__call__

`compute_loss`

`initialize`

`call`