tf_agents.bandits.agents.linear_bandit_agent.update_a_and_b_with_forgetting

Update the covariance matrix a and the weighted sum of rewards b.

tf_agents.bandits.agents.linear_bandit_agent.update_a_and_b_with_forgetting(
    a_prev: tf_agents.typing.types.Tensor,
    b_prev: tf_agents.typing.types.Tensor,
    r: tf_agents.typing.types.Tensor,
    x: tf_agents.typing.types.Tensor,
    gamma: float
) -> Tuple[tf_agents.typing.types.Tensor, tf_agents.typing.types.Tensor]

This function updates the covariance matrix a and the sum of weighted rewards b using a forgetting factor gamma.

Args
`a_prev`	previous estimate of `a`.
`b_prev`	previous estimate of `b`.
`r`	a `Tensor` of shape [`batch_size`]. This is the rewards of the batched observations.
`x`	a `Tensor` of shape [`batch_size`, `context_dim`]. This is the matrix with the (batched) observations.
`gamma`	a float forgetting factor in [0.0, 1.0].

Returns
The updated estimates of `a` and `b`.

tf_agents.bandits.agents.linear_bandit_agent.update_a_and_b_with_forgetting

Args

Returns