View source on GitHub |
Parses avro
records into a dict
of tensors.
tfio.experimental.columnar.parse_avro(
serialized, reader_schema, features, avro_names=None, name=None
)
This op parses serialized avro records into a dictionary mapping keys to
Tensor
, and SparseTensor
objects. features
is a dict from keys to
VarLenFeature
, SparseFeature
, RaggedFeature
, and FixedLenFeature
objects. Each VarLenFeature
and SparseFeature
is mapped to a
SparseTensor
; each FixedLenFeature
is mapped to a Tensor
.
Each VarLenFeature
maps to a SparseTensor
of the specified type
representing a ragged matrix. Its indices are [batch, index]
where batch
identifies the example in serialized
, and index
is the value's index in
the list of values associated with that feature and example.
Each SparseFeature
maps to a SparseTensor
of the specified type
representing a Tensor of dense_shape
[batch_size] + SparseFeature.size
.
Its values
come from the feature in the examples with key value_key
.
A values[i]
comes from a position k
in the feature of an example at batch
entry batch
. This positional information is recorded in indices[i]
as
[batch, index_0, index_1, ...]
where index_j
is the k-th
value of
the feature in the example at with key SparseFeature.index_key[j]
.
In other words, we split the indices (except the first index indicating the
batch entry) of a SparseTensor
by dimension into different features of the
avro record. Due to its complexity a VarLenFeature
should be preferred
over a SparseFeature
whenever possible.
Each FixedLenFeature
df
maps to a Tensor
of the specified type (or
tf.float32
if not specified) and shape (serialized.size(),) + df.shape
.
FixedLenFeature
entries with a default_value
are optional. With no default
value, we will fail if that Feature
is missing from any example in
serialized
.
Use this within the dataset.map(parser_fn=parse_avro).
Only works for batched serialized input!
Args | |
---|---|
serialized
|
The batched, serialized string tensors. |
reader_schema
|
The reader schema. Note, this MUST match the reader schema from the avro_record_dataset. Otherwise, this op will segfault! |
features
|
A map of feature names mapped to feature information. |
avro_names
|
(Optional.) may contain descriptive names for the
corresponding serialized avro parts. These may be useful for debugging
purposes, but they have no effect on the output. If not None ,
avro_names must be the same length as serialized .
|
name
|
The name of the op. |
Returns | |
---|---|
A map of feature names to tensors. |