The op extracts fields from a serialized protocol buffers message into tensors.
tf.io.decode_proto(
    bytes,
    message_type,
    field_names,
    output_types,
    descriptor_source='local://',
    message_format='binary',
    sanitize=False,
    name=None
)
The decode_proto op extracts fields from a serialized protocol buffers
message into tensors.  The fields in field_names are decoded and converted
to the corresponding output_types if possible.
A message_type name must be provided to give context for the field names.
The actual message descriptor can be looked up either in the linked-in
descriptor pool or a filename provided by the caller using the
descriptor_source attribute.
Each output tensor is a dense tensor. This means that it is padded to hold
the largest number of repeated elements seen in the input minibatch. (The
shape is also padded by one to prevent zero-sized dimensions). The actual
repeat counts for each example in the minibatch can be found in the sizes
output. In many cases the output of decode_proto is fed immediately into
tf.squeeze if missing values are not a concern. When using tf.squeeze, always
pass the squeeze dimension explicitly to avoid surprises.
For the most part, the mapping between Proto field types and TensorFlow dtypes is straightforward. However, there are a few special cases:
- A proto field that contains a submessage or group can only be converted to - DT_STRING(the serialized submessage). This is to reduce the complexity of the API. The resulting string can be used as input to another instance of the decode_proto op.
- TensorFlow lacks support for unsigned integers. The ops represent uint64 types as a - DT_INT64with the same twos-complement bit pattern (the obvious way). Unsigned int32 values can be represented exactly by specifying type- DT_INT64, or using twos-complement if the caller specifies- DT_INT32in the- output_typesattribute.
- mapfields are not directly decoded. They are treated as- repeatedfields, of the appropriate entry type. The proto-compiler defines entry types for each map field. The type-name is the field name, converted to "CamelCase" with "Entry" appended. The- tf.train.Features.FeatureEntrymessage is an example of one of these implicit- Entrytypes.
- enumfields should be read as int32.
Both binary and text proto serializations are supported, and can be
chosen using the format attribute.
The descriptor_source attribute selects the source of protocol
descriptors to consult when looking up message_type. This may be:
- An empty string or "local://", in which case protocol descriptors are created for C++ (not Python) proto definitions linked to the binary. 
- A file, in which case protocol descriptors are created from the file, which is expected to contain a - FileDescriptorSetserialized as a string. NOTE: You can build a- descriptor_sourcefile using the- --descriptor_set_outand- --include_importsoptions to the protocol compiler- protoc.
- A "bytes:// - ", in which protocol descriptors are created from - <bytes>, which is expected to be a- FileDescriptorSetserialized as a string.
Here is an example:
The, internal, Summary.Value proto contains a
oneof {float simple_value; Image image; ...}
from google.protobuf import text_format# A Summary.Value contains: oneof {float simple_value; Image image}values = ["simple_value: 2.2","simple_value: 1.2","image { height: 128 width: 512 }","image { height: 256 width: 256 }",]values = [text_format.Parse(v, tf.compat.v1.Summary.Value()).SerializeToString()for v in values]
The following can decode both fields from the serialized strings:
sizes, [simple_value, image] = tf.io.decode_proto(values,tf.compat.v1.Summary.Value.DESCRIPTOR.full_name,field_names=['simple_value', 'image'],output_types=[tf.float32, tf.string])
The sizes has the same shape as the input, with an additional axis across the
fields that were decoded. Here the first column of sizes is the size of the
decoded simple_value field:
print(sizes)tf.Tensor([[1 0][1 0][0 1][0 1]], shape=(4, 2), dtype=int32)
The result tensors each have one more index than the input byte-strings.
The valid elements of each result tensor are indicated by
the appropriate column of sizes. The invalid elements are padded with a
default value:
print(simple_value)tf.Tensor([[2.2][1.2][0. ][0. ]], shape=(4, 1), dtype=float32)
Nested protos are extracted as string tensors:
print(image.dtype)<dtype: 'string'>print(image.shape.as_list())[4, 1]
To convert to a tf.RaggedTensor representation use:
tf.RaggedTensor.from_tensor(simple_value, lengths=sizes[:, 0]).to_list()[[2.2], [1.2], [], []]
| Args | |
|---|---|
| bytes | A Tensorof typestring.
Tensor of serialized protos with shapebatch_shape. | 
| message_type | A string. Name of the proto message type to decode. | 
| field_names | A list of strings.
List of strings containing proto field names. An extension field can be decoded
by using its full name, e.g. EXT_PACKAGE.EXT_FIELD_NAME. | 
| output_types | A list of tf.DTypes.
List of TF types to use for the respective field in field_names. | 
| descriptor_source | An optional string. Defaults to"local://".
Either the special valuelocal://or a path to a file containing
a serializedFileDescriptorSet. | 
| message_format | An optional string. Defaults to"binary".
Eitherbinaryortext. | 
| sanitize | An optional bool. Defaults toFalse.
Whether to sanitize the result or not. | 
| name | A name for the operation (optional). | 
| Returns | |
|---|---|
| A tuple of Tensorobjects (sizes, values). | |
| sizes | A Tensorof typeint32. | 
| values | A list of Tensorobjects of typeoutput_types. |