Module: tf_agents.bandits.policies.reward_prediction_base_policy

Base policy that samples actions based on predicted rewards.

Classes

class RewardPredictionBasePolicy: Base class to build policies based on reward predictions.