tf_agents.bandits.environments.piecewise_stochastic_environment.PiecewiseStationaryDynamics

A piecewise stationary environment dynamics.

Inherits From: EnvironmentDynamics

tf_agents.bandits.environments.piecewise_stochastic_environment.PiecewiseStationaryDynamics(
    observation_distribution: types.Distribution,
    interval_distribution: types.Distribution,
    observation_to_reward_distribution: types.Distribution,
    additive_reward_distribution: types.Distribution
)

This is a piecewise stationary environment which computes rewards as:

rewards(t) = observation(t) * observation_to_reward(i) + additive_reward(i)

where t is the environment time (env_time) and i is the index of each piece. The environment time is incremented after the reward is computed while the piece index is incremented at the end of the time interval. The parameters observation_to_reward(i), additive_reward(i), and the length of interval, are drawn from given distributions at the beginning of each temporal interval.

Args
`observation_distribution`	A distribution from tfp.distributions with shape `[batch_size, observation_dim]` Note that the values of `batch_size` and `observation_dim` are deduced from the distribution.
`interval_distribution`	A scalar distribution from `tfp.distributions`. The value is casted to `int64` to update the time range.
`observation_to_reward_distribution`	A distribution from `tfp.distributions` with shape `[observation_dim, num_actions]`. The value `observation_dim` must match the second dimension of `observation_distribution`.
`additive_reward_distribution`	A distribution from `tfp.distributions` with shape `[num_actions]`. This models the non-contextual behavior of the bandit.

Attributes
`action_spec`	Specification of the actions.
`batch_size`	Returns the batch size used for observations and rewards.
`observation_spec`	Specification of the observations.

Methods

`compute_optimal_action`

View source

compute_optimal_action(
    observation: tf_agents.typing.types.NestedTensor
) -> tf_agents.typing.types.NestedTensor

`compute_optimal_reward`

View source

compute_optimal_reward(
    observation: tf_agents.typing.types.NestedTensor
) -> tf_agents.typing.types.NestedTensor

`observation`

View source

observation(
    unused_t
) -> tf_agents.typing.types.NestedTensor

Returns an observation batch for the given time.

Args
`env_time`	The scalar int64 tensor of the environment time step. This is incremented by the environment after the reward is computed.

Returns
The observation batch with spec according to `observation_spec.`

`reward`

View source

reward(
    observation: tf_agents.typing.types.NestedTensor,
    t: tf_agents.typing.types.Int
) -> tf_agents.typing.types.NestedTensor

Reward for the given observation and time step.

Args
`observation`	A batch of observations with spec according to `observation_spec.`
`env_time`	The scalar int64 tensor of the environment time step. This is incremented by the environment after the reward is computed.

Returns
A batch of rewards with spec shape [batch_size, num_actions] containing rewards for all arms.

tf_agents.bandits.environments.piecewise_stochastic_environment.PiecewiseStationaryDynamics

Args

Attributes

Methods

compute_optimal_action

compute_optimal_reward

observation

reward

`compute_optimal_action`

`compute_optimal_reward`

`observation`

`reward`