An instance of Trajectory. The tensors in Trajectory must have
shape [B, T, ...]. discount is assumed to be a scalar float, hence
the shape of trajectory.discount must be [B, T].
gamma
A floating point scalar; the discount factor.
Returns
An N-step Transition where N = T - 1. The reward and discount in
time_step.{reward, discount} are NaN. The n-step discounted reward
and final discount are stored in next_time_step.{reward, discount}.
All tensors in the Transition have shape [B, ...] (no time dimension).
[null,null,["Last updated 2024-04-26 UTC."],[],[],null,["# tf_agents.trajectories.to_n_step_transition\n\n\u003cbr /\u003e\n\n|---------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/trajectories/trajectory.py#L716-L850) |\n\nCreate an n-step transition from a trajectory with `T=N + 1` frames.\n\n#### View aliases\n\n\n**Main aliases**\n\n[`tf_agents.trajectories.trajectory.to_n_step_transition`](https://www.tensorflow.org/agents/api_docs/python/tf_agents/trajectories/to_n_step_transition)\n\n\u003cbr /\u003e\n\n tf_agents.trajectories.to_n_step_transition(\n trajectory: ../../tf_agents/trajectories/Trajectory,\n gamma: ../../tf_agents/typing/types/Float\n ) -\u003e ../../tf_agents/trajectories/Transition\n\n| **Note:** Tensors of `trajectory` are sliced along their *second* (`time`) dimension, to pull out the appropriate fields for the n-step transitions.\n\nThe output transition's `next_time_step.{reward, discount}` will contain\nN-step discounted reward and discount values calculated as: \n\n next_time_step.reward = r_t +\n g^{1} * d_t * r_{t+1} +\n g^{2} * d_t * d_{t+1} * r_{t+2} +\n g^{3} * d_t * d_{t+1} * d_{t+2} * r_{t+3} +\n ...\n g^{N-1} * d_t * ... * d_{t+N-2} * r_{t+N-1}\n next_time_step.discount = g^{N-1} * d_t * d_{t+1} * ... * d_{t+N-1}\n\n#### In python notation:\n\n discount = gamma**(N-1) * reduce_prod(trajectory.discount[:, :-1])\n reward = discounted_return(\n rewards=trajectory.reward[:, :-1],\n discounts=gamma * trajectory.discount[:, :-1])\n\nWhen `trajectory.discount[:, :-1]` is an all-ones tensor, this is equivalent\nto: \n\n next_time_step.discount = (\n gamma**(N-1) * tf.ones_like(trajectory.discount[:, 0]))\n next_time_step.reward = (\n sum_{n=0}^{N-1} gamma**n * trajectory.reward[:, n])\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `trajectory` | An instance of `Trajectory`. The tensors in Trajectory must have shape `[B, T, ...]`. `discount` is assumed to be a scalar float, hence the shape of `trajectory.discount` must be `[B, T]`. |\n| `gamma` | A floating point scalar; the discount factor. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| An N-step `Transition` where `N = T - 1`. The reward and discount in `time_step.{reward, discount}` are NaN. The n-step discounted reward and final discount are stored in `next_time_step.{reward, discount}`. All tensors in the `Transition` have shape `[B, ...]` (no time dimension). ||\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|--------------|--------------------------------|\n| `ValueError` | if `discount.shape.rank != 2`. |\n| `ValueError` | if `discount.shape[1] \u003c 2`. |\n\n\u003cbr /\u003e"]]