# tf_agents.utils.value_ops.discounted_return

Computes discounted return.

``````Q_n = sum_{n'=n}^N gamma^(n'-n) * r_{n'} + gamma^(N-n+1)*final_value.
``````

For details, see "Reinforcement Learning: An Introduction" Second Edition by Richard S. Sutton and Andrew G. Barto

#### Define abbreviations:

`B`: batch size representing number of trajectories. `T`: number of steps per trajectory. This is equal to `N - n` in the equation above.

`rewards` Tensor with shape `[T, B]` (or `[T]`) representing rewards.
`discounts` Tensor with shape `[T, B]` (or `[T]`) representing discounts.
`final_value` (Optional.). Default: An all zeros tensor. Tensor with shape `[B]` (or `[1]`) representing value estimate at `T`. This is optional; when set, it allows final value to bootstrap the reward computation.
`time_major` A boolean indicating whether input tensors are time major. False means input tensors have shape `[B, T]`.
`provide_all_returns` A boolean; if True, this will provide all of the returns by time dimension; if False, this will only give the single complete discounted return.

If `provide_all_returns`: A tensor with shape `[T, B]` (or `[T]`) representing the discounted returns. The shape is `[B, T]` when `not time_major`. If `not provide_all_returns`: A tensor with shape `[B]` (or []) representing the discounted returns.

