View source on GitHub
|
Environment wrapper that adds action masks to a bandit environment.
This environment wrapper takes a BanditTFEnvironment as input, and generates
a new environment where the observations are joined with boolean action
masks. These masks describe which actions are allowed in a given time step. If a
disallowed action is chosen in a time step, the environment will raise an
error. The masks are drawn independently from Bernoulli-distributed random
variables with parameter action_probability.
The observations from the original environment and the mask are joined by the
given join_fn function, and the result of the join function will be the
observation in the new environment.
Usage:
''' env = MyFavoriteBanditEnvironment(...) def join_fn(context, mask): return (context, mask) masked_env = BernoulliActionMaskTFEnvironment(env, join_fn, 0.5) '''
Classes
class BernoulliActionMaskTFEnvironment: An environment wrapper that adds action masks to observations.
Other Members | |
|---|---|
| absolute_import |
Instance of __future__._Feature
|
| division |
Instance of __future__._Feature
|
| print_function |
Instance of __future__._Feature
|
View source on GitHub