tf_agents.trajectories.last

Create a Trajectory transitioning between StepTypes MID and LAST.

All inputs may be batched.

The input discount is used to infer the outer shape of the inputs, as it is always expected to be a singleton array with scalar inner shape.

observation (possibly nested tuple of) Tensor or np.ndarray; all shaped [B, ...], [T, ...], or [B, T, ...].
action (possibly nested tuple of) Tensor or np.ndarray; all shaped [B, ...], [T, ...], or [B, T, ...].
policy_info (possibly nested tuple of) Tensor or np.ndarray; all shaped [B, ...], [T, ...], or [B, T, ...].
reward (possibly nested tuple of) Tensor or np.ndarray; all shaped [B, ...], [T, ...], or [B, T, ...].
discount A floating point vector Tensor or np.ndarray; shaped [B], [T], or [B, T] (optional).

A Trajectory instance.