![]() |
Update the covariance matrix a
and the weighted sum of rewards b
.
tf_agents.bandits.agents.linear_bandit_agent.update_a_and_b_with_forgetting(
a_prev: tf_agents.typing.types.Tensor
,
b_prev: tf_agents.typing.types.Tensor
,
r: tf_agents.typing.types.Tensor
,
x: tf_agents.typing.types.Tensor
,
gamma: float
) -> Tuple[tf_agents.typing.types.Tensor
, tf_agents.typing.types.Tensor
]
This function updates the covariance matrix a
and the sum of weighted
rewards b
using a forgetting factor gamma
.
Returns | |
---|---|
The updated estimates of a and b .
|