View source on GitHub
|
Abstract base class for Python Policies.
tf_agents.policies.py_policy.PyPolicy(
time_step_spec: tf_agents.trajectories.TimeStep,
action_spec: tf_agents.typing.types.NestedArraySpec,
policy_state_spec: tf_agents.typing.types.NestedArraySpec = (),
info_spec: tf_agents.typing.types.NestedArraySpec = (),
observation_and_action_constraint_splitter: Optional[types.Splitter] = None
)
The action(time_step, policy_state) method returns a PolicyStep named tuple
containing the following nested arrays:
action: The action to be applied on the environment.
state: The state of the policy (E.g. RNN state) to be fed into the next
call to action.
info: Optional side information such as action log probabilities.
For stateful policies, e.g. those containing RNNs, an initial policy state can
be obtained through a call to get_initial_state().
Example of simple use in Python:
py_env = PyEnvironment() policy = PyPolicy()
time_step = py_env.reset() policy_state = policy.get_initial_state()
acc_reward = 0 while not time_step.is_last(): action_step = policy.action(time_step, policy_state) policy_state = action_step.state time_step = py_env.step(action_step.action) acc_reward += time_step.reward
Methods
action
action(
time_step: tf_agents.trajectories.TimeStep,
policy_state: tf_agents.typing.types.NestedArray = (),
seed: Optional[types.Seed] = None
) -> tf_agents.trajectories.PolicyStep
Generates next action given the time_step and policy_state.
| Args | |
|---|---|
time_step
|
A TimeStep tuple corresponding to time_step_spec().
|
policy_state
|
An optional previous policy_state. |
seed
|
Seed to use if action uses sampling (optional). |
| Returns | |
|---|---|
A PolicyStep named tuple containing:
action: A nest of action Arrays matching the action_spec().
state: A nest of policy states to be fed into the next call to action.
info: Optional side information such as action log probabilities.
|
get_initial_state
get_initial_state(
batch_size: Optional[int] = None
) -> tf_agents.typing.types.NestedArray
Returns an initial state usable by the policy.
| Args | |
|---|---|
batch_size
|
An optional batch size. |
| Returns | |
|---|---|
| An initial policy state. |
View source on GitHub