tf_agents.bandits.policies.constraints.NeuralConstraint

Class for representing a trainable constraint using a neural network.

Inherits From: BaseConstraint

tf_agents.bandits.policies.constraints.NeuralConstraint(
    time_step_spec: tf_agents.typing.types.TimeStep,
    action_spec: tf_agents.typing.types.BoundedTensorSpec,
    constraint_network: Optional[tf_agents.typing.types.Network],
    error_loss_fn: tf_agents.typing.types.LossFn = tf.compat.v1.losses.mean_squared_error,
    name: Optional[Text] = 'NeuralConstraint'
)

This constraint class uses a neural network to compute the action feasibility. In this case, the loss function needs to be exposed for training the neural network weights, typically done by the agent that uses this constraint.

Args
`time_step_spec`	A `TimeStep` spec of the expected time_steps.
`action_spec`	A nest of `BoundedTensorSpec` representing the actions.
`constraint_network`	An instance of `tf_agents.network.Network` used to provide estimates of action feasibility. The input structure should be consistent with the `observation_spec`. If the constraint network is not available at construction time, it can be set later on using the constraint_network setter.
`error_loss_fn`	A function for computing the loss used to train the constraint network. The default is `tf.losses.mean_squared_error`.
`name`	Python str name of this agent. All variables in this module will fall under that name. Defaults to the class name.

Attributes
`constraint_network`
`observation_spec`

Attributes

constraint_network

observation_spec

Methods

`compute_loss`

View source

compute_loss(
    observations: tf_agents.typing.types.NestedTensor,
    actions: tf_agents.typing.types.NestedTensor,
    rewards: tf_agents.typing.types.Tensor,
    weights: Optional[types.Float] = None,
    training: bool = False
) -> tf_agents.typing.types.Tensor

Computes loss for training the constraint network.

Args
`observations`	A batch of observations.
`actions`	A batch of actions.
`rewards`	A batch of rewards.
`weights`	Optional scalar or elementwise (per-batch-entry) importance weights. The output batch loss will be scaled by these weights, and the final scalar loss is the mean of these values.
`training`	Whether the loss is being used for training.

Returns
`loss`	A `Tensor` containing the loss for the training step.

`initialize`

View source

initialize()

Returns an op to initialize the constraint.

`call`

View source

__call__(
    observation, actions=None
)

Returns the probability of input actions being feasible.

tf_agents.bandits.policies.constraints.NeuralConstraint

Args

Attributes

Methods

compute_loss

initialize

__call__

`compute_loss`

`initialize`

`call`