Module: tf_agents.bandits.policies

Module importing all policies.


bernoulli_thompson_sampling_policy module: Policy for Bernoulli Thompson Sampling.

boltzmann_reward_prediction_policy module: Policy for reward prediction and boltzmann exploration.

categorical_policy module: Policy that chooses actions based on a categorical distribution.

constraints module: An API for representing constraints.

falcon_reward_prediction_policy module: Policy that samples actions based on the FALCON algorithm.

greedy_multi_objective_neural_policy module: Policy for greedy multi-objective prediction.

greedy_reward_prediction_policy module: Policy for greedy reward prediction.

lin_ucb_policy module: Linear UCB Policy.

linalg module: Utility code for linear algebra functions.

linear_bandit_policy module: Linear Bandit Policy.

linear_thompson_sampling_policy module: Linear Thompson Sampling Policy.

loss_utils module: Loss utility code.

mixture_policy module: A policy class that chooses from a set of policies to get the actions from.

neural_linucb_policy module: Neural + LinUCB Policy.

ranking_policy module: Ranking policy.

reward_prediction_base_policy module: Base policy that samples actions based on predicted rewards.