|View source on GitHub|
Interface for service job manager.
tfx.orchestration.experimental.core.pipeline_state.PipelineState) -> Set[
Ensures necessary service jobs are started and healthy for the pipeline.
Service jobs are long-running jobs associated with a node or the pipeline that persist across executions (eg: worker pools, Tensorboard, etc). Service jobs are started before the nodes that depend on them are started.
ensure_services will be called in the orchestration loop periodically and
is expected to:
- Start any service jobs required by the pipeline nodes.
- Probe job health and handle failures. If a service job fails, the corresponding node uids should be returned.
- Optionally stop service jobs that are no longer needed. Whether or not a service job is needed is context dependent, for eg: in a typical sync pipeline, one may want Tensorboard job to continue running even after the corresponding trainer has completed but others like worker pool services may be shutdown.
|List of NodeUids of nodes whose service jobs are in a state of permanent failure.|
tfx.orchestration.experimental.core.pipeline_state.PipelineState) -> None
Stops all service jobs associated with the pipeline.