View source on GitHub
|
Parses avro records into a dict of tensors.
tfio.experimental.columnar.parse_avro(
serialized, reader_schema, features, avro_names=None, name=None
)
This op parses serialized avro records into a dictionary mapping keys to
Tensor, and SparseTensor objects. features is a dict from keys to
VarLenFeature, SparseFeature, RaggedFeature, and FixedLenFeature
objects. Each VarLenFeature and SparseFeature is mapped to a
SparseTensor; each FixedLenFeature is mapped to a Tensor.
Each VarLenFeature maps to a SparseTensor of the specified type
representing a ragged matrix. Its indices are [batch, index] where batch
identifies the example in serialized, and index is the value's index in
the list of values associated with that feature and example.
Each SparseFeature maps to a SparseTensor of the specified type
representing a Tensor of dense_shape [batch_size] + SparseFeature.size.
Its values come from the feature in the examples with key value_key.
A values[i] comes from a position k in the feature of an example at batch
entry batch. This positional information is recorded in indices[i] as
[batch, index_0, index_1, ...] where index_j is the k-th value of
the feature in the example at with key SparseFeature.index_key[j].
In other words, we split the indices (except the first index indicating the
batch entry) of a SparseTensor by dimension into different features of the
avro record. Due to its complexity a VarLenFeature should be preferred
over a SparseFeature whenever possible.
Each FixedLenFeature df maps to a Tensor of the specified type (or
tf.float32 if not specified) and shape (serialized.size(),) + df.shape.
FixedLenFeature entries with a default_value are optional. With no default
value, we will fail if that Feature is missing from any example in
serialized.
Use this within the dataset.map(parser_fn=parse_avro).
Only works for batched serialized input!
Args | |
|---|---|
serialized
|
The batched, serialized string tensors. |
reader_schema
|
The reader schema. Note, this MUST match the reader schema from the avro_record_dataset. Otherwise, this op will segfault! |
features
|
A map of feature names mapped to feature information. |
avro_names
|
(Optional.) may contain descriptive names for the
corresponding serialized avro parts. These may be useful for debugging
purposes, but they have no effect on the output. If not None,
avro_names must be the same length as serialized.
|
name
|
The name of the op. |
Returns | |
|---|---|
| A map of feature names to tensors. |
View source on GitHub