tf_agents.trajectories.single_step
Create a Trajectory transitioning between StepTypes FIRST
and LAST
.
tf_agents.trajectories.single_step(
observation: tf_agents.typing.types.NestedSpecTensorOrArray
,
action: tf_agents.typing.types.NestedSpecTensorOrArray
,
policy_info: tf_agents.typing.types.NestedSpecTensorOrArray
,
reward: tf_agents.typing.types.NestedSpecTensorOrArray
,
discount: tf_agents.typing.types.SpecTensorOrArray
) -> tf_agents.trajectories.Trajectory
All inputs may be batched.
The input discount
is used to infer the outer shape of the inputs,
as it is always expected to be a singleton array with scalar inner shape.
Args |
observation
|
(possibly nested tuple of) Tensor or np.ndarray ; all shaped
[B, ...] , [T, ...] , or [B, T, ...] .
|
action
|
(possibly nested tuple of) Tensor or np.ndarray ; all shaped [B,
...] , [T, ...] , or [B, T, ...] .
|
policy_info
|
(possibly nested tuple of) Tensor or np.ndarray ; all shaped
[B, ...] , [T, ...] , or [B, T, ...] .
|
reward
|
(possibly nested tuple of) Tensor or np.ndarray ; all shaped [B,
...] , [T, ...] , or [B, T, ...] .
|
discount
|
A floating point vector Tensor or np.ndarray ; shaped [B] ,
[T] , or [B, T] (optional).
|
Returns |
A Trajectory instance.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-04-26 UTC.
[null,null,["Last updated 2024-04-26 UTC."],[],[]]