![]() |
Airflow-specific TFX Component.
tfx.orchestration.airflow.airflow_component.AirflowComponent(
parent_dag: models.DAG,
component: tfx.dsl.components.base.base_node.BaseNode
,
component_launcher_class: Type[tfx.orchestration.launcher.base_component_launcher.BaseComponentLauncher
],
pipeline_info: tfx.orchestration.data_types.PipelineInfo
,
enable_cache: bool,
metadata_connection_config: metadata_store_pb2.ConnectionConfig,
beam_pipeline_args: List[Text],
additional_pipeline_args: Dict[Text, Any],
component_config: tfx.orchestration.config.base_component_config.BaseComponentConfig
)
This class wrap a component run into its own PythonOperator in Airflow.
Args | |
---|---|
parent_dag
|
An AirflowPipeline instance as the pipeline DAG. |
component
|
An instance of base_node.BaseNode that holds all properties of a logical component. |
component_launcher_class
|
The class of the launcher to launch the component. |
pipeline_info
|
An instance of data_types.PipelineInfo that holds pipeline properties. |
enable_cache
|
Whether or not cache is enabled for this component run. |
metadata_connection_config
|
A config proto for metadata connection. |
beam_pipeline_args
|
Pipeline arguments for Beam powered Components. |
additional_pipeline_args
|
Additional pipeline args. |
component_config
|
Component config to launch the component. |
Attributes | |
---|---|
dag
|
Returns the Operator's DAG if set, otherwise raises an error |
dag_id
|
Returns dag id if it has one or an adhoc + owner |
deps
|
Returns the list of dependencies for the operator. These differ from execution context dependencies in that they are specific to tasks and can be extended/overridden by subclasses. |
downstream_list
|
@property: list of tasks directly downstream |
downstream_task_ids
|
@property: list of ids of tasks directly downstream |
log
|
|
logger
|
|
priority_weight_total
|
Total priority weight for the task. It might include all upstream or downstream tasks.
depending on the weight rule.
|
schedule_interval
|
The schedule interval of the DAG always wins over individual tasks so that tasks within a DAG always line up. The task still needs a schedule_interval as it may not be attached to a DAG. |
task_type
|
@property: type of the task |
upstream_list
|
@property: list of tasks directly upstream |
upstream_task_ids
|
@property: list of ids of tasks directly upstream |
Methods
add_only_new
add_only_new(
item_set, item
)
Adds only new items to item set
clear
clear(
start_date=None, end_date=None, upstream=False, downstream=False, session=None
)
Clears the state of task instances associated with the task, following the parameters specified.
dry_run
dry_run()
Performs dry run for the operator - just render template fields.
execute
execute(
context
)
This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.
Refer to get_template_context for more context.
execute_callable
execute_callable()
get_direct_relative_ids
get_direct_relative_ids(
upstream=False
)
Get the direct relative ids to the current task, upstream or downstream.
get_direct_relatives
get_direct_relatives(
upstream=False
)
Get the direct relatives to the current task, upstream or downstream.
get_extra_links
get_extra_links(
dttm, link_name
)
For an operator, gets the URL that the external links specified in
extra_links
should point to.
:raise ValueError: The error message of a ValueError will be passed on through to
the fronted to show up as a tooltip on the disabled link
:param dttm: The datetime parsed execution date for the URL being searched for
:param link_name: The name of the link we're looking for the URL for. Should be
one of the options specified in extra_links
:return: A URL
get_flat_relative_ids
get_flat_relative_ids(
upstream=False, found_descendants=None
)
Get a flat list of relatives' ids, either upstream or downstream.
get_flat_relatives
get_flat_relatives(
upstream=False
)
Get a flat list of relatives, either upstream or downstream.
get_serialized_fields
@classmethod
get_serialized_fields()
Stringified DAGs and operators contain exactly these fields.
get_task_instances
get_task_instances(
start_date=None, end_date=None, session=None
)
Get a set of task instance related to this task for a specific date range.
get_template_env
get_template_env()
Fetch a Jinja template environment from the DAG or instantiate empty environment if no DAG.
has_dag
has_dag()
Returns True if the Operator has been assigned to a DAG.
on_kill
on_kill()
Override this method to cleanup subprocesses when a task instance gets killed. Any use of the threading, subprocess or multiprocessing module within an operator needs to be cleaned up or it will leave ghost processes behind.
post_execute
post_execute(
context, result=None
)
This hook is triggered right after self.execute() is called. It is passed the execution context and any results returned by the operator.
pre_execute
pre_execute(
context
)
This hook is triggered right before self.execute() is called.
prepare_template
prepare_template()
Hook that is triggered after the templated fields get replaced by their content. If you need your operator to alter the content of the file before the template is rendered, it should override this method to do so.
render_template
render_template(
content, context, jinja_env=None, seen_oids=None
)
Render a templated string. The content can be a collection holding multiple templated strings and will be templated recursively.
:param content: Content to template. Only strings can be templated (may be inside collection). :type content: Any :param context: Dict with values to apply on templated content :type context: dict :param jinja_env: Jinja environment. Can be provided to avoid re-creating Jinja environments during recursion. :type jinja_env: jinja2.Environment :param seen_oids: template fields already rendered (to avoid RecursionError on circular dependencies) :type seen_oids: set :return: Templated content
render_template_fields
render_template_fields(
context, jinja_env=None
)
Template all attributes listed in template_fields. Note this operation is irreversible.
:param context: Dict with values to apply on content :type context: dict :param jinja_env: Jinja environment :type jinja_env: jinja2.Environment
resolve_template_files
resolve_template_files()
run
run(
start_date=None, end_date=None, ignore_first_depends_on_past=False,
ignore_ti_state=False, mark_success=False
)
Run a set of task instances for a date range.
set_downstream
set_downstream(
task_or_task_list
)
Set a task or a task list to be directly downstream from the current task.
set_upstream
set_upstream(
task_or_task_list
)
Set a task or a task list to be directly upstream from the current task.
xcom_pull
xcom_pull(
context, task_ids=None, dag_id=None, key=XCOM_RETURN_KEY,
include_prior_dates=None
)
See TaskInstance.xcom_pull()
xcom_push
xcom_push(
context, key, value, execution_date=None
)
See TaskInstance.xcom_push()
__eq__
__eq__(
other
)
Return self==value.
__ge__
__ge__(
other, NotImplemented=NotImplemented
)
Return a >= b. Computed by @total_ordering from (not a < b).
__gt__
__gt__(
other, NotImplemented=NotImplemented
)
Return a > b. Computed by @total_ordering from (not a < b) and (a != b).
__le__
__le__(
other, NotImplemented=NotImplemented
)
Return a <= b. Computed by @total_ordering from (a < b) or (a == b).
__lt__
__lt__(
other
)
Return self<value.
__ne__
__ne__(
other
)
Return self!=value.