Decodes each string in input into a sequence of Unicode code points.
tf.raw_ops.UnicodeDecodeWithOffsets(
    input, input_encoding, errors='replace', replacement_char=65533,
    replace_control_characters=False, Tsplits=tf.dtypes.int64, name=None
)
The character codepoints for all strings are returned using a single vector
char_values, with strings expanded to characters in row-major order.
Similarly, the character start byte offsets are returned using a single vector
char_to_byte_starts, with strings expanded in row-major order.
The row_splits tensor indicates where the codepoints and start offsets for
each input string begin and end within the char_values and
char_to_byte_starts tensors.  In particular, the values for the ith
string (in row-major order) are stored in the slice
[row_splits[i]:row_splits[i+1]]. Thus:
- char_values[row_splits[i]+j]is the Unicode codepoint for the- jth character in the- ith string (in row-major order).
- char_to_bytes_starts[row_splits[i]+j]is the start byte offset for the- jth character in the- ith string (in row-major order).
- row_splits[i+1] - row_splits[i]is the number of characters in the- ith string (in row-major order).
| Args | |
|---|---|
| input | A Tensorof typestring.
The text to be decoded. Can have any shape. Note that the output is flattened
to a vector of char values. | 
| input_encoding | A string.
Text encoding of the input strings. This is any of the encodings supported
by ICU ucnv algorithmic converters. Examples:"UTF-16", "US ASCII", "UTF-8". | 
| errors | An optional stringfrom:"strict", "replace", "ignore". Defaults to"replace".
Error handling policy when there is invalid formatting found in the input.
The value of 'strict' will cause the operation to produce a InvalidArgument
error on any invalid input formatting. A value of 'replace' (the default) will
cause the operation to replace any invalid formatting in the input with thereplacement_charcodepoint. A value of 'ignore' will cause the operation to
skip any invalid formatting in the input and produce no corresponding output
character. | 
| replacement_char | An optional int. Defaults to65533.
The replacement character codepoint to be used in place of any invalid
formatting in the input whenerrors='replace'. Any valid unicode codepoint may
be used. The default value is the default unicode replacement character is
0xFFFD or U+65533.) | 
| replace_control_characters | An optional bool. Defaults toFalse.
Whether to replace the C0 control characters (00-1F) with thereplacement_char. Default is false. | 
| Tsplits | An optional tf.DTypefrom:tf.int32, tf.int64. Defaults totf.int64. | 
| name | A name for the operation (optional). | 
| Returns | |
|---|---|
| A tuple of Tensorobjects (row_splits, char_values, char_to_byte_starts). | |
| row_splits | A Tensorof typeTsplits. | 
| char_values | A Tensorof typeint32. | 
| char_to_byte_starts | A Tensorof typeint64. |