Decodes each string in input into a sequence of Unicode code points.
tf.raw_ops.UnicodeDecode(
input,
input_encoding,
errors='replace',
replacement_char=65533,
replace_control_characters=False,
Tsplits=tf.dtypes.int64,
name=None
)
The character codepoints for all strings are returned using a single vector
char_values, with strings expanded to characters in row-major order.
The row_splits tensor indicates where the codepoints for
each input string begin and end within the char_values tensor.
In particular, the values for the ith
string (in row-major order) are stored in the slice
[row_splits[i]:row_splits[i+1]]. Thus:
char_values[row_splits[i]+j]is the Unicode codepoint for thejth character in theith string (in row-major order).row_splits[i+1] - row_splits[i]is the number of characters in theith string (in row-major order).
Args | |
|---|---|
input
|
A Tensor of type string.
The text to be decoded. Can have any shape. Note that the output is flattened
to a vector of char values.
|
input_encoding
|
A string.
Text encoding of the input strings. This is any of the encodings supported
by ICU ucnv algorithmic converters. Examples: "UTF-16", "US ASCII", "UTF-8".
|
errors
|
An optional string from: "strict", "replace", "ignore". Defaults to "replace".
Error handling policy when there is invalid formatting found in the input.
The value of 'strict' will cause the operation to produce a InvalidArgument
error on any invalid input formatting. A value of 'replace' (the default) will
cause the operation to replace any invalid formatting in the input with the
replacement_char codepoint. A value of 'ignore' will cause the operation to
skip any invalid formatting in the input and produce no corresponding output
character.
|
replacement_char
|
An optional int. Defaults to 65533.
The replacement character codepoint to be used in place of any invalid
formatting in the input when errors='replace'. Any valid unicode codepoint may
be used. The default value is the default unicode replacement character is
0xFFFD or U+65533.)
|
replace_control_characters
|
An optional bool. Defaults to False.
Whether to replace the C0 control characters (00-1F) with the
replacement_char. Default is false.
|
Tsplits
|
An optional tf.DType from: tf.int32, tf.int64. Defaults to tf.int64.
|
name
|
A name for the operation (optional). |
Returns | |
|---|---|
A tuple of Tensor objects (row_splits, char_values).
|
|
row_splits
|
A Tensor of type Tsplits.
|
char_values
|
A Tensor of type int32.
|