![]() |
Create a Trajectory from tensors representing a single episode.
tf_agents.trajectories.trajectory.from_episode(
observation: tf_agents.typing.types.NestedSpecTensorOrArray
,
action: tf_agents.typing.types.NestedSpecTensorOrArray
,
policy_info: tf_agents.typing.types.NestedSpecTensorOrArray
,
reward: tf_agents.typing.types.NestedSpecTensorOrArray
,
discount: Optional[types.SpecTensorOrArray] = None
) -> tf_agents.trajectories.Trajectory
If none of the inputs are tensors, then numpy arrays are generated instead.
If discount
is not provided, the first entry in reward
is used to estimate
T
:
reward_0 = tf.nest.flatten(reward)[0]
T = shape(reward_0)[0]
In this case, a discount
of all ones having dtype float32
is generated.
Args | |
---|---|
observation
|
(possibly nested tuple of) Tensor or np.ndarray ; all shaped
[T, ...] .
|
action
|
(possibly nested tuple of) Tensor or np.ndarray ; all shaped
[T, ...] .
|
policy_info
|
(possibly nested tuple of) Tensor or np.ndarray ; all shaped
[T, ...] .
|
reward
|
(possibly nested tuple of) Tensor or np.ndarray ; all shaped
[T, ...] .
|
discount
|
A floating point vector Tensor or np.ndarray ; shaped
[T] (optional).
|
Returns | |
---|---|
An instance of Trajectory .
|