Notice that reward and discount for time_steps are undefined, therefore filled
with zero.
Args
trajectory
An instance of Trajectory. The tensors in Trajectory must have
shape [B, T, ...] when next_trajectory is None. discount is assumed
to be a scalar float; hence the shape of trajectory.discount must be
[B, T].
next_trajectory
(optional) An instance of Trajectory.
Returns
A tuple (time_steps, policy_steps, next_time_steps). The reward and
discount fields of time_steps are filled with zeros because these
cannot be deduced (please do not use them).
[null,null,["Last updated 2024-04-26 UTC."],[],[],null,["# tf_agents.trajectories.to_transition\n\n\u003cbr /\u003e\n\n|---------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/trajectories/trajectory.py#L650-L713) |\n\nCreate a transition from a trajectory or two adjacent trajectories.\n\n#### View aliases\n\n\n**Main aliases**\n\n[`tf_agents.trajectories.trajectory.to_transition`](https://www.tensorflow.org/agents/api_docs/python/tf_agents/trajectories/to_transition)\n\n\u003cbr /\u003e\n\n tf_agents.trajectories.to_transition(\n trajectory: ../../tf_agents/trajectories/Trajectory,\n next_trajectory: Optional[../../tf_agents/trajectories/Trajectory] = None\n ) -\u003e ../../tf_agents/trajectories/Transition\n\n**Note:** If `next_trajectory` is not provided, tensors of `trajectory` are sliced along their *second* (`time`) dimension; for example: \n\n time_steps.step_type = trajectory.step_type[:,:-1]\n time_steps.observation = trajectory.observation[:,:-1]\n next_time_steps.observation = trajectory.observation[:,1:]\n next_time_steps. step_type = trajectory. next_step_type[:,:-1]\n next_time_steps.reward = trajectory.reward[:,:-1]\n next_time_steps. discount = trajectory. discount[:,:-1]\n\nNotice that reward and discount for time_steps are undefined, therefore filled\nwith zero.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `trajectory` | An instance of `Trajectory`. The tensors in Trajectory must have shape `[B, T, ...]` when next_trajectory is `None`. `discount` is assumed to be a scalar float; hence the shape of `trajectory.discount` must be `[B, T]`. |\n| `next_trajectory` | (optional) An instance of `Trajectory`. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A tuple `(time_steps, policy_steps, next_time_steps)`. The `reward` and `discount` fields of `time_steps` are filled with zeros because these cannot be deduced (please do not use them). ||\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|--------------|------------------------------------------------------|\n| `ValueError` | if `discount` rank is not within the range \\[1, 2\\]. |\n\n\u003cbr /\u003e"]]