
Exposes a numpy API for saved_model policies in Eager mode.

Inherits From: PyTFEagerPolicyBase, PyPolicy

Main aliases


Used in the notebooks

Used in the tutorials

model_path Path to a saved_model generated by the policy_saver.
time_step_spec Optional nested structure of ArraySpecs describing the policy's time_step_spec. This is not used by the SavedModelPyTFEagerPolicy, but may be accessed by other objects as it is part of the public policy API.
action_spec Optional nested structure of ArraySpecs describing the policy's action_spec. This is not used by the SavedModelPyTFEagerPolicy, but may be accessed by other objects as it is part of the public policy API.
policy_state_spec Optional nested structure of ArraySpecs describing the policy's policy_state_spec. This is not used by the SavedModelPyTFEagerPolicy, but may be accessed by other objects as it is part of the public policy API.
info_spec Optional nested structure of ArraySpecs describing the policy's info_spec. This is not used by the SavedModelPyTFEagerPolicy, but may be accessed by other objects as it is part of the public policy API.
load_specs_from_pbtxt If True the specs will be loaded from the proto file generated by the policy_saver.
use_tf_function See PyTFEagerPolicyBase.
batch_time_steps See PyTFEagerPolicyBase.

action_spec Describes the ArraySpecs of the np.Array returned by action().

action can be a single np.Array, or a nested dict, list or tuple of np.Array.

collect_data_spec Describes the data collected when using this policy with an environment.
info_spec Describes the Arrays emitted as info by action().

policy_state_spec Describes the arrays expected by functions with policy_state as input.
policy_step_spec Describes the output of action().
time_step_spec Describes the TimeStep np.Arrays expected by action(time_step).
trajectory_spec Describes the data collected when using this policy with an environment.



View source

Generates next action given the time_step and policy_state.

time_step A TimeStep tuple corresponding to time_step_spec().
policy_state An optional previous policy_state.
seed Seed to use if action uses sampling (optional).

A PolicyStep named tuple containing: action: A nest of action Arrays matching the action_spec(). state: A nest of policy states to be fed into the next call to action. info: Optional side information such as action log probabilities.


View source

Returns an initial state usable by the policy.

batch_size An optional batch size.

An initial policy state.


View source

Returns the metadata of the saved model.


View source

Returns the training global step of the saved model.


View source

Returns the training step of the restored checkpoint.


View source

Allows users to update saved_model variables directly from a checkpoint.

checkpoint_path is a path that was passed to either or PolicySaver.save_checkpoint(). The policy looks for set of checkpoint files with the file prefix `/variables/variables'

checkpoint_path Path to the checkpoint to restore and use to udpate this policy.


View source