View source on GitHub
|
Base wrapper implementing PyEnvironmentBaseWrapper interface for Gym envs.
Inherits From: PyEnvironment
tf_agents.environments.gym_wrapper.GymWrapper(
gym_env: gym.Env,
discount: tf_agents.typing.types.Float = 1.0,
spec_dtype_map: Optional[Dict[gym.Space, np.dtype]] = None,
match_obs_space_dtype: bool = True,
auto_reset: bool = True,
simplify_box_bounds: bool = True,
render_kwargs: Optional[Dict[str, Any]] = None
)
Action and observation specs are automatically generated from the action and observation spaces. See base class for py_environment.Base details.
Args | |
|---|---|
handle_auto_reset
|
When True the base class will handle auto_reset of
the Environment.
|
Methods
action_spec
action_spec() -> tf_agents.typing.types.NestedArraySpec
Defines the actions that should be provided to step().
May use a subclass of ArraySpec that specifies additional properties such
as min and max bounds on the values.
| Returns | |
|---|---|
An ArraySpec, or a nested dict, list or tuple of ArraySpecs.
|
close
close() -> None
Frees any resources used by the environment.
Implement this method for an environment backed by an external process.
This method be used directly
env = Env(...)
# Use env.
env.close()
or via a context manager
with Env(...) as env:
# Use env.
current_time_step
current_time_step() -> tf_agents.trajectories.TimeStep
Returns the current timestep.
discount_spec
discount_spec() -> tf_agents.typing.types.NestedArraySpec
Defines the discount that are returned by step().
Override this method to define an environment that uses non-standard discount values, for example an environment with array-valued discounts.
| Returns | |
|---|---|
An ArraySpec, or a nested dict, list or tuple of ArraySpecs.
|
get_info
get_info() -> Any
Returns the gym environment info returned on the last step.
get_state
get_state() -> Any
Returns the state of the environment.
The state contains everything required to restore the environment to the
current configuration. This can contain e.g.
- The current time_step.
- The number of steps taken in the environment (for finite horizon MDPs).
- Hidden state (for POMDPs).
Callers should not assume anything about the contents or format of the
returned state. It should be treated as a token that can be passed back to
set_state() later.
Note that the returned state handle should not be modified by the
environment later on, and ensuring this (e.g. using copy.deepcopy) is the
responsibility of the environment.
| Returns | |
|---|---|
state
|
The current state of the environment. |
observation_spec
observation_spec() -> tf_agents.typing.types.NestedArraySpec
Defines the observations provided by the environment.
May use a subclass of ArraySpec that specifies additional properties such
as min and max bounds on the values.
| Returns | |
|---|---|
An ArraySpec, or a nested dict, list or tuple of ArraySpecs.
|
render
render(
mode: Text = 'rgb_array'
) -> Any
Renders the environment.
| Args | |
|---|---|
mode
|
One of ['rgb_array', 'human']. Renders to an numpy array, or brings up a window where the environment can be visualized. |
| Returns | |
|---|---|
An ndarray of shape [width, height, 3] denoting an RGB image if mode is
rgb_array. Otherwise return nothing and render directly to a display
window.
|
| Raises | |
|---|---|
NotImplementedError
|
If the environment does not support rendering. |
reset
reset() -> tf_agents.trajectories.TimeStep
Starts a new sequence and returns the first TimeStep of this sequence.
| Returns | |
|---|---|
A TimeStep namedtuple containing:
step_type: A StepType of FIRST.
reward: 0.0, indicating the reward.
discount: 1.0, indicating the discount.
observation: A NumPy array, or a nested dict, list or tuple of arrays
corresponding to observation_spec().
|
reward_spec
reward_spec() -> tf_agents.typing.types.NestedArraySpec
Defines the rewards that are returned by step().
Override this method to define an environment that uses non-standard reward values, for example an environment with array-valued rewards.
| Returns | |
|---|---|
An ArraySpec, or a nested dict, list or tuple of ArraySpecs.
|
seed
seed(
seed: tf_agents.typing.types.Seed
) -> tf_agents.typing.types.Seed
Seeds the environment.
| Args | |
|---|---|
seed
|
Value to use as seed for the environment. |
set_state
set_state(
state: Any
) -> None
Restores the environment to a given state.
See definition of state in the documentation for get_state().
| Args | |
|---|---|
state
|
A state to restore the environment to. |
should_reset
should_reset(
current_time_step: tf_agents.trajectories.TimeStep
) -> bool
Whether the Environmet should reset given the current timestep.
By default it only resets when all time_steps are LAST.
| Args | |
|---|---|
current_time_step
|
The current TimeStep.
|
| Returns | |
|---|---|
| A bool indicating whether the Environment should reset or not. |
step
step(
action: tf_agents.typing.types.NestedArray
) -> tf_agents.trajectories.TimeStep
Updates the environment according to the action and returns a TimeStep.
If the environment returned a TimeStep with StepType.LAST at the
previous step the implementation of _step in the environment should call
reset to start a new sequence and ignore action.
This method will start a new sequence if called after the environment
has been constructed and reset has not been called. In this case
action will be ignored.
If should_reset(current_time_step) is True, then this method will reset
by itself. In this case action will be ignored.
| Args | |
|---|---|
action
|
A NumPy array, or a nested dict, list or tuple of arrays
corresponding to action_spec().
|
| Returns | |
|---|---|
A TimeStep namedtuple containing:
step_type: A StepType value.
reward: A NumPy array, reward value for this timestep.
discount: A NumPy array, discount in the range [0, 1].
observation: A NumPy array, or a nested dict, list or tuple of arrays
corresponding to observation_spec().
|
time_step_spec
time_step_spec() -> tf_agents.trajectories.TimeStep
Describes the TimeStep fields returned by step().
Override this method to define an environment that uses non-standard values
for any of the items returned by step(). For example, an environment with
array-valued rewards.
| Returns | |
|---|---|
A TimeStep namedtuple containing (possibly nested) ArraySpecs defining
the step_type, reward, discount, and observation structure.
|
__enter__
__enter__()
Allows the environment to be used in a with-statement context.
__exit__
__exit__(
unused_exception_type, unused_exc_value, unused_traceback
)
Allows the environment to be used in a with-statement context.
View source on GitHub