View source on GitHub |
PerArmPolicyInfo(log_probability, predicted_rewards_mean, multiobjective_scalarized_predicted_rewards_mean, predicted_rewards_optimistic, predicted_rewards_sampled, bandit_policy_type, chosen_arm_features)
tf_agents.policies.utils.PerArmPolicyInfo(
log_probability=(),
predicted_rewards_mean=(),
multiobjective_scalarized_predicted_rewards_mean=(),
predicted_rewards_optimistic=(),
predicted_rewards_sampled=(),
bandit_policy_type=(),
chosen_arm_features=()
)