Abstract basic class of all TFXIO API implementations.
Methods
ArrowSchema
@abc.abstractmethodArrowSchema() -> pa.Schema
Returns the schema of the RecordBatch produced by self.BeamSource().
May raise an error if the TFMD schema was not provided at construction time.
BeamSource
@abc.abstractmethodBeamSource( batch_size: Optional[int] = None ) -> beam.PTransform
Returns a beam PTransform that produces PCollection[pa.RecordBatch].
May NOT raise an error if the TFMD schema was not provided at construction time.
If a TFMD schema was provided at construction time, all the
pa.RecordBatches in the result PCollection must be of the same schema
returned by self.ArrowSchema. If a TFMD schema was not provided, the
pa.RecordBatches might not be of the same schema (they may contain
different numbers of columns).
| Args | |
|---|---|
batch_size
|
if not None, the pa.RecordBatch produced will be of the
specified size. Otherwise it's automatically tuned by Beam.
|
Project
Project(
tensor_names: List[Text]
) -> 'TFXIO'
Projects the dataset represented by this TFXIO.
A Projected TFXIO:
- Only columns needed for given tensor_names are guaranteed to be
produced by
self.BeamSource() self.TensorAdapterConfig()andself.TensorFlowDataset()are trimmed to contain only those tensors.- It retains a reference to the very original TFXIO, so its TensorAdapter
knows about the specs of the tensors that would be produced by the
original TensorAdapter. Also see
TensorAdapter.OriginalTensorSpec().
May raise an error if the TFMD schema was not provided at construction time.
| Args | |
|---|---|
tensor_names
|
a set of tensor names. |
| Returns | |
|---|---|
A TFXIO instance that is the same as self except that:
|
RecordBatches
@abc.abstractmethodRecordBatches( options:tfx_bsl.public.tfxio.RecordBatchesOptions) -> Iterator[pa.RecordBatch]
Returns an iterable of record batches.
This can be used outside of Apache Beam or TensorFlow to access data.
| Args | |
|---|---|
options
|
An options object for iterating over record batches. Look at
dataset_options.RecordBatchesOptions for more details.
|
TensorAdapter
TensorAdapter() -> tfx_bsl.public.tfxio.TensorAdapter
Returns a TensorAdapter that converts pa.RecordBatch to TF inputs.
May raise an error if the TFMD schema was not provided at construction time.
TensorAdapterConfig
TensorAdapterConfig() -> tfx_bsl.public.tfxio.TensorAdapterConfig
Returns the config to initialize a TensorAdapter.
| Returns | |
|---|---|
a TensorAdapterConfig that is the same as what is used to initialize the
TensorAdapter returned by self.TensorAdapter().
|
TensorFlowDataset
@abc.abstractmethodTensorFlowDataset( options:tfx_bsl.public.tfxio.TensorFlowDatasetOptions) -> tf.data.Dataset
Returns a tf.data.Dataset of TF inputs.
May raise an error if the TFMD schema was not provided at construction time.
| Args | |
|---|---|
options
|
an options object for the tf.data.Dataset. Look at
dataset_options.TensorFlowDatasetOptions for more details.
|
TensorRepresentations
@abc.abstractmethodTensorRepresentations() ->tfx_bsl.public.tfxio.TensorRepresentations
Returns the TensorRepresentations.
These TensorRepresentations describe the tensors or composite tensors
produced by the TensorAdapter created from self.TensorAdapter() or
the tf.data.Dataset created from self.TensorFlowDataset().
May raise an error if the TFMD schema was not provided at construction time. May raise an error if the tensor representations are invalid.