View source on GitHub |
PolicyInfo(log_probability, predicted_rewards_mean, multiobjective_scalarized_predicted_rewards_mean, predicted_rewards_optimistic, predicted_rewards_sampled, bandit_policy_type)
tf_agents.policies.utils.PolicyInfo(
log_probability=(),
predicted_rewards_mean=(),
multiobjective_scalarized_predicted_rewards_mean=(),
predicted_rewards_optimistic=(),
predicted_rewards_sampled=(),
bandit_policy_type=()
)