tf_agents.bandits.agents.linear_bandit_agent.update_a_and_b_with_forgetting

Update the covariance matrix a and the weighted sum of rewards b.

This function updates the covariance matrix a and the sum of weighted rewards b using a forgetting factor gamma.

a_prev previous estimate of a.
b_prev previous estimate of b.
r a Tensor of shape [batch_size]. This is the rewards of the batched observations.
x a Tensor of shape [batch_size, context_dim]. This is the matrix with the (batched) observations.
gamma a float forgetting factor in [0.0, 1.0].

The updated estimates of a and b.