Populates policy info given all needed input.
tf_agents.policies.utils.populate_policy_info(
arm_observations: tf_agents.typing.types.Tensor
,
chosen_actions: tf_agents.typing.types.Tensor
,
rewards_for_argmax: tf_agents.typing.types.Tensor
,
est_rewards: tf_agents.typing.types.Tensor
,
emit_policy_info: Sequence[Text],
accepts_per_arm_features: bool
) -> tf_agents.policies.utils.PolicyInfo
Args |
arm_observations
|
In case the policy accepts per-arm feautures, this is a
Tensor with the per-arm features. Otherwise its value is unused.
|
chosen_actions
|
A Tensor with the indices of the chosen actions.
|
rewards_for_argmax
|
The sampled or optimistically boosted reward estimates
based on which the policy chooses the action greedily.
|
est_rewards
|
A Tensor with the rewards estimated by the model.
|
emit_policy_info
|
A set of policy info keys, specifying wich info fields to
populate
|
accepts_per_arm_features
|
(bool) Whether the policy accepts per-arm
features.
|