tf_agents.trajectories.single_step

Create a Trajectory transitioning between StepTypes FIRST and LAST.

View aliases

Main aliases

tf_agents.trajectories.trajectory.single_step

tf_agents.trajectories.single_step(
    observation: tf_agents.typing.types.NestedSpecTensorOrArray,
    action: tf_agents.typing.types.NestedSpecTensorOrArray,
    policy_info: tf_agents.typing.types.NestedSpecTensorOrArray,
    reward: tf_agents.typing.types.NestedSpecTensorOrArray,
    discount: tf_agents.typing.types.SpecTensorOrArray
) -> tf_agents.trajectories.Trajectory

All inputs may be batched.

The input discount is used to infer the outer shape of the inputs, as it is always expected to be a singleton array with scalar inner shape.

Args
`observation`	(possibly nested tuple of) `Tensor` or `np.ndarray`; all shaped `[B, ...]`, `[T, ...]`, or `[B, T, ...]`.
`action`	(possibly nested tuple of) `Tensor` or `np.ndarray`; all shaped `[B, ...]`, `[T, ...]`, or `[B, T, ...]`.
`policy_info`	(possibly nested tuple of) `Tensor` or `np.ndarray`; all shaped `[B, ...]`, `[T, ...]`, or `[B, T, ...]`.
`reward`	(possibly nested tuple of) `Tensor` or `np.ndarray`; all shaped `[B, ...]`, `[T, ...]`, or `[B, T, ...]`.
`discount`	A floating point vector `Tensor` or `np.ndarray`; shaped `[B]`, `[T]`, or `[B, T]` (optional).

Returns
A `Trajectory` instance.

tf_agents.trajectories.single_step

View aliases

Args

Returns