Returns a TimeStep
with step_type
set equal to StepType.MID
.
tf_agents.trajectories.transition(
observation: tf_agents.typing.types.NestedTensorOrArray
,
reward: tf_agents.typing.types.NestedTensorOrArray
,
discount: tf_agents.typing.types.Float
= 1.0,
outer_dims: Optional[types.Shape] = None
) -> tf_agents.trajectories.TimeStep
Used in the notebooks
For TF transitions, the batch size is inferred from the shape of reward
.
If discount
is a scalar, and observation
contains Tensors,
then discount
will be broadcasted to match reward.shape
.
Args |
observation
|
A NumPy array, tensor, or a nested dict, list or tuple of
arrays or tensors.
|
reward
|
A NumPy array, tensor, or a nested dict, list or tuple of arrays or
tensors.
|
discount
|
(optional) A scalar, or 1D NumPy array, or tensor.
|
outer_dims
|
(optional) If provided, it will be used to determine the batch
dimensions. If not, the batch dimensions will be inferred by reward's
shape. If reward is a vector, but not batched use ().
|
Raises |
ValueError
|
If observations are tensors but reward's statically known rank
is not 0 or 1 .
|