tf_agents.bandits.policies.policy_utilities.populate_policy_info

Populates policy info given all needed input.

arm_observations In case the policy accepts per-arm feautures, this is a Tensor with the per-arm features. Otherwise its value is unused.
chosen_actions A Tensor with the indices of the chosen actions.
rewards_for_argmax The sampled or optimistically boosted reward estimates based on which the policy chooses the action greedily.
est_rewards A Tensor with the rewards estimated by the model.
emit_policy_info A set of policy info keys, specifying wich info fields to populate
accepts_per_arm_features (bool) Whether the policy accepts per-arm features.

A policy info.