Builds a trajectory from a single-step bandit episode.
tf_agents.bandits.drivers.driver_utils.trajectory_for_bandit(
initial_step: tf_agents.typing.types.TimeStep
,
action_step: tf_agents.typing.types.PolicyStep
,
final_step: tf_agents.typing.types.TimeStep
) -> tf_agents.typing.types.NestedTensor
Since all episodes consist of a single step, the returned Trajectory
has no
time dimension. All input and output Tensor
s/arrays are expected to have
shape [batch_size, ...]
.
Args |
initial_step
|
A TimeStep returned from environment.step(...) .
|
action_step
|
A PolicyStep returned by policy.action(...) .
|
final_step
|
A TimeStep returned from environment.step(...) .
|
Returns |
A Trajectory containing zeros for discount value and StepType.LAST for
both step_type and next_step_type .
|