Discounted future sum of batch-major values.
tf_agents.utils.common.discounted_future_sum(
values, gamma, num_steps
)
Args | |
---|---|
values
|
A Tensor of shape [batch_size, total_steps] and dtype float32. |
gamma
|
A float discount value. |
num_steps
|
A positive integer number of future steps to sum. |
Returns | |
---|---|
A Tensor of shape [batch_size, total_steps], where each entry (i, j) is
the result of summing the entries of values starting from
gamma^0 * values[i, j] to
gamma^(num_steps - 1) * values[i, j + num_steps - 1] ,
with zeros padded to values.
For example, values=[5, 6, 7], gamma=0.9, will result in sequence:
|