tf_agents.bandits.metrics.tf_metrics.DistanceFromGreedyMetric

Difference between the estimated reward of the chosen and the best action.

Inherits From: TFStepMetric

tf_agents.bandits.metrics.tf_metrics.DistanceFromGreedyMetric(
    estimated_reward_fn: Callable[[tf_agents.typing.types.Tensor], tf_agents.typing.types.Tensor],
    name: Optional[Text] = 'DistanceFromGreedyMetric',
    dtype: float = tf.float32
)

This metric measures how 'safely' the agent explores: it calculates the difference between what the agent thinks it would have gotten had it chosen the best looking action, vs the action it actually took. This metric is not equivalent to the regret, because the regret is calculated as a distance from optimality, while here everything calculated is based on the policy's 'belief'.

Args
`estimated_reward_fn`	A function that takes the observation as input and computes the estimated rewards that the greedy policy uses.
`name`	(str) name of the metric
`dtype`	dtype of the metric value.

Methods

`call`

View source

call(
    trajectory
)

Update the metric value.

Args
`trajectory`	A tf_agents.trajectory.Trajectory

Returns
The arguments, for easy chaining.

`init_variables`

View source

init_variables()

Initializes this Metric's variables.

Should be called after variables are created in the first execution of __call__(). If using graph execution, the return value should be run() in a session before running the op returned by __call__(). (See example above.)

Returns
If using graph execution, this returns an op to perform the initialization. Under eager execution, the variables are reset to their initial values as a side effect and this function returns None.

`reset`

View source

reset()

Resets the values being tracked by the metric.

`result`

View source

result()

Computes and returns a final value for the metric.

`tf_summaries`

View source

tf_summaries(
    train_step=None, step_metrics=()
)

Generates summaries against train_step and all step_metrics.

Args
`train_step`	(Optional) Step counter for training iterations. If None, no metric is generated against the global step.
`step_metrics`	(Optional) Iterable of step metrics to generate summaries against.

Returns
A list of summaries.

`call`

View source

__call__(
    *args, **kwargs
)

Returns op to execute to update this metric for these inputs.

Returns None if eager execution is enabled. Returns a graph-mode function if graph execution is enabled.

Args

Args
`*args`
`**kwargs`	A mini-batch of inputs to the Metric, passed on to `call()`.

*args

**kwargs A mini-batch of inputs to the Metric, passed on to call().