View source on GitHub |
Policy that samples actions based on the FALCON algorithm.
This policy implements an action sampling distribution based on the following paper: David Simchi-Levi and Yunzong Xu, "Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability", Mathematics of Operations Research, 2021. https://arxiv.org/pdf/2003.12699.pdf
Classes
class FalconRewardPredictionPolicy
: Policy that samples actions based on the FALCON algorithm.
Functions
get_number_of_trainable_elements(...)
: Gets the total # of elements in the network's trainable variables.