tf_agents.bandits.agents.ranking_agent.compute_score_tensor_for_cascading

Gives scores for all items in a batch.

The score of items that are before the chosen index is -1, the score of the chosen values are given by chosen_value. The rest of the items receive a score of 0. selected the negative feedback reward.

chosen_index The index of the slot chosen, or num_slots if no slot is chosen.
chosen_value The value of the chosen item.
num_slots The number of slots. The output score vector will have shape [batch_size, num_slots].
non_click_score (float) The score value for items lying "before" the clicked item. If not set, -1 is used. It is recommended (but not enforced) to use a negative value.

A tensor of shape [batch_size, num_slots], with scores for every item in the recommendation.