DecodeProto

public final class DecodeProto

The op extracts fields from a serialized protocol buffers message into tensors.

The `decode_proto` op extracts fields from a serialized protocol buffers message into tensors. The fields in `field_names` are decoded and converted to the corresponding `output_types` if possible.

A `message_type` name must be provided to give context for the field names. The actual message descriptor can be looked up either in the linked-in descriptor pool or a filename provided by the caller using the `descriptor_source` attribute.

Each output tensor is a dense tensor. This means that it is padded to hold the largest number of repeated elements seen in the input minibatch. (The shape is also padded by one to prevent zero-sized dimensions). The actual repeat counts for each example in the minibatch can be found in the `sizes` output. In many cases the output of `decode_proto` is fed immediately into tf.squeeze if missing values are not a concern. When using tf.squeeze, always pass the squeeze dimension explicitly to avoid surprises.

For the most part, the mapping between Proto field types and TensorFlow dtypes is straightforward. However, there are a few special cases:

- A proto field that contains a submessage or group can only be converted to `DT_STRING` (the serialized submessage). This is to reduce the complexity of the API. The resulting string can be used as input to another instance of the decode_proto op.

- TensorFlow lacks support for unsigned integers. The ops represent uint64 types as a `DT_INT64` with the same twos-complement bit pattern (the obvious way). Unsigned int32 values can be represented exactly by specifying type `DT_INT64`, or using twos-complement if the caller specifies `DT_INT32` in the `output_types` attribute.

Both binary and text proto serializations are supported, and can be chosen using the `format` attribute.

The `descriptor_source` attribute selects the source of protocol descriptors to consult when looking up `message_type`. This may be:

- An empty string or "local://", in which case protocol descriptors are created for C++ (not Python) proto definitions linked to the binary.

- A file, in which case protocol descriptors are created from the file, which is expected to contain a `FileDescriptorSet` serialized as a string. NOTE: You can build a `descriptor_source` file using the `--descriptor_set_out` and `--include_imports` options to the protocol compiler `protoc`.

- A "bytes://", in which protocol descriptors are created from ``, which is expected to be a `FileDescriptorSet` serialized as a string.

Nested Classes

class DecodeProto.Options Optional attributes for DecodeProto  

Constants

String OP_NAME The name of this op, as known by TensorFlow core engine

Public Methods

static DecodeProto
create(Scope scope, Operand<TString> bytes, String messageType, List<String> fieldNames, List<Class<? extends TType>> outputTypes, Options... options)
Factory method to create a class wrapping a new DecodeProto operation.
static DecodeProto.Options
descriptorSource(String descriptorSource)
static DecodeProto.Options
messageFormat(String messageFormat)
static DecodeProto.Options
sanitize(Boolean sanitize)
Output<TInt32>
sizes()
Tensor of int32 with shape `[batch_shape, len(field_names)]`.
List<Output<?>>
values()
List of tensors containing values for the corresponding field.

Inherited Methods

Constants

public static final String OP_NAME

The name of this op, as known by TensorFlow core engine

Constant Value: "DecodeProtoV2"

Public Methods

public static DecodeProto create (Scope scope, Operand<TString> bytes, String messageType, List<String> fieldNames, List<Class<? extends TType>> outputTypes, Options... options)

Factory method to create a class wrapping a new DecodeProto operation.

Parameters
scope current scope
bytes Tensor of serialized protos with shape `batch_shape`.
messageType Name of the proto message type to decode.
fieldNames List of strings containing proto field names. An extension field can be decoded by using its full name, e.g. EXT_PACKAGE.EXT_FIELD_NAME.
outputTypes List of TF types to use for the respective field in field_names.
options carries optional attributes values
Returns
  • a new instance of DecodeProto

public static DecodeProto.Options descriptorSource (String descriptorSource)

Parameters
descriptorSource Either the special value `local://` or a path to a file containing a serialized `FileDescriptorSet`.

public static DecodeProto.Options messageFormat (String messageFormat)

Parameters
messageFormat Either `binary` or text.

public static DecodeProto.Options sanitize (Boolean sanitize)

Parameters
sanitize Whether to sanitize the result or not.

public Output<TInt32> sizes ()

Tensor of int32 with shape `[batch_shape, len(field_names)]`. Each entry is the number of values found for the corresponding field. Optional fields may have 0 or 1 values.

public List<Output<?>> values ()

List of tensors containing values for the corresponding field. `values[i]` has datatype `output_types[i]` and shape `[batch_shape, max(sizes[...,i])]`.