View source on GitHub |
Utilities for policies.
Classes
class BanditPolicyType
: Enumeration of bandit policy types.
class InfoFields
: Strings which can be used in the policy info fields.
class PerArmPolicyInfo
: PerArmPolicyInfo(log_probability, predicted_rewards_mean, multiobjective_scalarized_predicted_rewards_mean, predicted_rewards_optimistic, predicted_rewards_sampled, bandit_policy_type, chosen_arm_features)
class PolicyInfo
: PolicyInfo(log_probability, predicted_rewards_mean, multiobjective_scalarized_predicted_rewards_mean, predicted_rewards_optimistic, predicted_rewards_sampled, bandit_policy_type)
Functions
bandit_policy_uniform_mask(...)
: Set bandit policy type tensor to BanditPolicyType.UNIFORM based on mask.
check_no_mask_with_arm_features(...)
create_bandit_policy_type_tensor_spec(...)
: Create tensor spec for bandit policy type.
create_chosen_arm_features_info_spec(...)
: Creates the chosen arm features info spec from the arm observation spec.
get_model_index(...)
: Returns the model index for a specific arm.
get_num_actions_from_tensor_spec(...)
: Validates action_spec
and returns number of actions.
has_bandit_policy_type(...)
: Check if policy info has bandit_policy_type
field/tensor.
has_chosen_arm_features(...)
: Check if policy info has chosen_arm_features
field/tensor.
masked_argmax(...)
: Computes the argmax where the allowed elements are given by a mask.
populate_policy_info(...)
: Populates policy info given all needed input.
set_bandit_policy_type(...)
: Sets the InfoFields.BANDIT_POLICY_TYPE on info to bandit_policy_type.
Other Members | |
---|---|
absolute_import |
Instance of __future__._Feature
|
division |
Instance of __future__._Feature
|
print_function |
Instance of __future__._Feature
|