tf_agents.policies.utils.populate_policy_info

Populates policy info given all needed input.

tf_agents.policies.utils.populate_policy_info(
    arm_observations: tf_agents.typing.types.Tensor,
    chosen_actions: tf_agents.typing.types.Tensor,
    rewards_for_argmax: tf_agents.typing.types.Tensor,
    est_rewards: tf_agents.typing.types.Tensor,
    emit_policy_info: Sequence[Text],
    accepts_per_arm_features: bool
) -> tf_agents.policies.utils.PolicyInfo

Args
`arm_observations`	In case the policy accepts per-arm feautures, this is a Tensor with the per-arm features. Otherwise its value is unused.
`chosen_actions`	A Tensor with the indices of the chosen actions.
`rewards_for_argmax`	The sampled or optimistically boosted reward estimates based on which the policy chooses the action greedily.
`est_rewards`	A Tensor with the rewards estimated by the model.
`emit_policy_info`	A set of policy info keys, specifying wich info fields to populate
`accepts_per_arm_features`	(bool) Whether the policy accepts per-arm features.

Returns
A policy info.

tf_agents.policies.utils.populate_policy_info

Args

Returns