tf.strings.unicode_encode
Encodes each sequence of Unicode code points in input
into a string.
tf.strings.unicode_encode(
input, output_encoding, errors='replace', replacement_char=65533, name=None
)
result[i1...iN]
is the string formed by concatenating the Unicode
codepoints input[1...iN, :]
, encoded using output_encoding
.
Args |
input
|
An N+1 dimensional potentially ragged integer tensor with shape
[D1...DN, num_chars] .
|
output_encoding
|
Unicode encoding that should be used to encode each
codepoint sequence. Can be "UTF-8" , "UTF-16-BE" , or "UTF-32-BE" .
|
errors
|
Specifies the response when an invalid codepoint is encountered
(optional). One of:
'replace' : Replace invalid codepoint with the
replacement_char . (default)
'ignore' : Skip invalid codepoints.
'strict' : Raise an exception for any invalid codepoint.
|
replacement_char
|
The replacement character codepoint to be used in place of
any invalid input when errors='replace' . Any valid unicode codepoint may
be used. The default value is the default unicode replacement character
which is 0xFFFD (U+65533).
|
name
|
A name for the operation (optional).
|
Returns |
A N dimensional string tensor with shape [D1...DN] .
|
Example:
>>> input = [[71, 246, 246, 100, 110, 105, 103, 104, 116], [128522]]
>>> unicode_encode(input, 'UTF-8')
['G\xc3\xb6\xc3\xb6dnight', '\xf0\x9f\x98\x8a']
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2020-10-01 UTC.
[null,null,["Last updated 2020-10-01 UTC."],[],[]]