tf_agents.bandits.environments.bernoulli_py_environment.BernoulliPyEnvironment

Implements finite-armed Bernoulli Bandits.

Inherits From: BanditPyEnvironment, PyEnvironment

tf_agents.bandits.environments.bernoulli_py_environment.BernoulliPyEnvironment(
    means: Sequence[tf_agents.typing.types.Float],
    batch_size: Optional[types.Int] = 1
)

This environment implements a finite-armed non-contextual Bernoulli Bandit environment as a subclass of BanditPyEnvironment. For every arm, the reward distribution is 0/1 (Bernoulli) with parameter p set at the initialization. For a reference, see e.g., Example 1.1 in "A Tutorial on Thompson Sampling" by Russo et al. (https://web.stanford.edu/~bvr/pubs/TS_Tutorial.pdf)

Args
`means`	vector of floats in [0, 1], the mean rewards for actions. The number of arms is determined by its length.
`batch_size`	(int) The batch size.

Attributes
`batch_size`	The batch size of the environment.
`batched`	Whether the environment is batched or not. If the environment supports batched observations and actions, then overwrite this property to True. A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size. When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension. When batched and handle_auto_reset, it checks `np.all(steps.is_last())`.
`name`

Attributes

batch_size The batch size of the environment.

batched

Whether the environment is batched or not.

If the environment supports batched observations and actions, then overwrite this property to True.

A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.

When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.

When batched and handle_auto_reset, it checks np.all(steps.is_last()).

name

tf_agents.bandits.environments.bernoulli_py_environment.BernoulliPyEnvironment Stay organized with collections Save and categorize content based on your preferences.

Args

Attributes

Methods

action_spec

close

current_time_step

discount_spec

get_info

get_state

observation_spec

render

reset

reward_spec

seed

set_state

should_reset

step

time_step_spec

__enter__

__exit__

tf_agents.bandits.environments.bernoulli_py_environment.BernoulliPyEnvironment

`action_spec`

`close`

`current_time_step`

`discount_spec`

`get_info`

`get_state`

`observation_spec`

`render`

`reset`

`reward_spec`

`seed`

`set_state`

`should_reset`

`step`

`time_step_spec`

`enter`

`exit`