Compute the return from each index in an episode.
tf_agents.utils.common.compute_returns(
rewards: tf_agents.typing.types.Tensor
,
discounts: tf_agents.typing.types.Tensor
,
time_major: bool = False
)
Args |
rewards
|
Tensor [T] , [B, T] , [T, B] of per-timestep reward.
|
discounts
|
Tensor [T] , [B, T] , [T, B] of per-timestep discount factor.
Should be 0 . for final step of each episode.
|
time_major
|
Bool, when batched inputs setting it to True , inputs are
expected to be time-major: [T, B] otherwise, batch-major: [B, T] .
|
Returns |
Tensor of per-timestep cumulative returns.
|