tfx.v1.types.standard_artifacts.Examples

Artifact that contains the training data.

Inherits From: Artifact

Training data should be brought in to the TFX pipeline using components like ExampleGen. Data in Examples artifact is split and stored separately. The file and payload format must be specified as optional custom properties if not using default formats. Please see https://www.tensorflow.org/tfx/guide/examplegen#span_version_and_split to understand about span, version and splits.

  • Properties:

    • span: Integer to distinguish group of Examples.
    • version: Integer to represent updated data.
    • splits: A list of split names. For example, ["train", "test"].
  • File structure:

    • {uri}/
      • Split-{split_name1}/: Files for split
        • All direct children files are recognized as the data.
        • File format and payload format are determined by custom properties.
      • Split-{split_name2}/: Another split...
  • Commonly used custom properties of the Examples artifact:

    • file_format: a string that represents the file format. See tfx/components/util/tfxio_utils.py:make_tfxio for available values.
    • payload_format: int (enum) value of the data payload format. See tfx/proto/example_gen.proto:PayloadFormat for available formats.

splits

Child Classes

class TYPE_ANNOTATION

Methods

path

Path to the artifact URI's split subdirectory.

This method DOES NOT create a directory path it returns; caller must make a directory of the returned path value before writing.

Args
split A name of the split, e.g. "train", "validation", "test".

Raises
ValueError if the split is not in the self.splits.

Returns
A path to {self.uri}/Split-{split}.

PROPERTIES

{
 'span': PropertyType.INT,
 'split_names': PropertyType.STRING,
 'version': PropertyType.INT
}

TYPE_NAME 'Examples'