A tuple that represents a trajectory.
tf_agents.trajectories.trajectory.Trajectory(
step_type, observation, action, policy_info, next_step_type, reward, discount
)
Used in the notebooks
A Trajectory
represents a sequence of aligned time steps. It captures the
observation, step_type from current time step with the computed action
and policy_info. Discount, reward and next_step_type come from the next
time step.
Attributes |
step_type
|
A StepType .
|
observation
|
An array (tensor), or a nested dict, list or tuple of arrays
(tensors) that represents the observation.
|
action
|
An array/a tensor, or a nested dict, list or tuple of actions. This
represents action generated according to the observation.
|
policy_info
|
An arbitrary nest that contains auxiliary information related
to the action. Note that this does not include the policy/RNN state which
was used to generate the action.
|
next_step_type
|
The StepType of the next time step.
|
reward
|
An array/a tensor, or a nested dict, list, or tuple of rewards.
This represents the rewards and/or constraint satisfiability after
performing the action in an environment.
|
discount
|
A scalar that representing the discount factor to multiply with
future rewards.
|
Methods
is_boundary
View source
is_boundary() -> tf_agents.typing.types.Bool
is_first
View source
is_first() -> tf_agents.typing.types.Bool
is_last
View source
is_last() -> tf_agents.typing.types.Bool
is_mid
View source
is_mid() -> tf_agents.typing.types.Bool
replace
View source
replace(
**kwargs
) -> "Trajectory"
Exposes as namedtuple._replace.
Usage:
new_trajectory = trajectory.replace(policy_info=())
This returns a new trajectory with an empty policy_info.
Args |
**kwargs
|
key/value pairs of fields in the trajectory.
|
Returns |
A new Trajectory .
|