tf_agents.bandits.environments.wheel_py_environment.WheelPyEnvironment

Implements the Wheel Bandit environment.

Inherits From: BanditPyEnvironment, PyEnvironment

tf_agents.bandits.environments.wheel_py_environment.WheelPyEnvironment(
    delta: float,
    mu_base: Sequence[float],
    std_base: Sequence[float],
    mu_high: float,
    std_high: float,
    batch_size: Optional[int] = None,
    name: Optional[Text] = 'wheel'
)

This environment implements the wheel bandit from Section 5.4 of 1 as a subclass of BanditPyEnvironment.

Context features are sampled uniformly at random in the unit circle in R^2. There are 5 possible actions. There exists an exploration parameter delta in (0, 1) that determines the difficulty of the problem and the need for exploration.

References:

[1]. Carlos Riquelme, George Tucker, Jasper Snoek "Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling", International Conference on Learning Representations (ICLR) 2018. https://arxiv.org/abs/1802.09127

Args
`delta`	float in `(0, 1)`. Exploration parameter.
`mu_base`	(vector of float) Mean reward for each action, if the context norm is below delta. The size of the vector is expected to be 5 (i.e., equal to the number of actions.)
`std_base`	(vector of float) std of the Gaussian reward for each action if the context norm is below delta. The size of the vector is expected to be 5 (i.e., equal to the number of actions.)
`mu_high`	(float) Mean reward for the optimal action if the context norm is above delta.
`std_high`	(float) Reward std for optimal action if the context norm is above delta.
`batch_size`	(optional) (int) Number of observations generated per call.
`name`	(optional) The name of this environment instance.

Attributes
`batch_size`	The batch size of the environment.
`batched`	Whether the environment is batched or not. If the environment supports batched observations and actions, then overwrite this property to True. A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size. When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension. When batched and handle_auto_reset, it checks `np.all(steps.is_last())`.
`name`

Attributes

batch_size The batch size of the environment.

batched

Whether the environment is batched or not.

If the environment supports batched observations and actions, then overwrite this property to True.

A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.

When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.

When batched and handle_auto_reset, it checks np.all(steps.is_last()).

name

tf_agents.bandits.environments.wheel_py_environment.WheelPyEnvironment Stay organized with collections Save and categorize content based on your preferences.

References:

Args

Attributes

Methods

action_spec

close

current_time_step

discount_spec

get_info

get_state

observation_spec

render

reset

reward_spec

seed

set_state

should_reset

step

time_step_spec

__enter__

__exit__

tf_agents.bandits.environments.wheel_py_environment.WheelPyEnvironment

`action_spec`

`close`

`current_time_step`

`discount_spec`

`get_info`

`get_state`

`observation_spec`

`render`

`reset`

`reward_spec`

`seed`

`set_state`

`should_reset`

`step`

`time_step_spec`

`enter`

`exit`