View source on GitHub
|
Implements the Neural + LinUCB bandit algorithm.
Applies LinUCB on top of an encoding network. Since LinUCB is a linear method, the encoding network is used to capture the non-linear relationship between the context features and the expected rewards. The encoding network may be already trained or not; if not trained, the method can optionally train it using epsilon greedy.
Reference:
Carlos Riquelme, George Tucker, Jasper Snoek,
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep
Networks for Thompson Sampling, ICLR 2018.
Classes
class NeuralLinUCBAgent: An agent implementing the LinUCB algorithm on top of a neural network.
class NeuralLinUCBVariableCollection: A collection of variables used by NeuralLinUCBAgent.
Other Members | |
|---|---|
| absolute_import |
Instance of __future__._Feature
|
| division |
Instance of __future__._Feature
|
| print_function |
Instance of __future__._Feature
|
View source on GitHub