TensorFlow is back at Google I/O on May 14! Register now

text.normalize_utf8

Normalizes each UTF-8 string in the input tensor using the specified rule.

text.normalize_utf8(
    input, normalization_form='NFKC', name=None
)

Used in the notebooks

Used in the tutorials
Neural machine translation with attention

See http://unicode.org/reports/tr15/

Examples:

# input: <string>[num_strings]
normalize_utf8(["株式会社", "ＫＡＤＯＫＡＷＡ"])
# output: <string>[num_strings]
<tf.Tensor: shape=(2,), dtype=string, numpy=
array([b'\xe6\xa0\xaa\xe5\xbc\x8f\xe4\xbc\x9a\xe7\xa4\xbe', b'KADOKAWA'],
      dtype=object)>

Args
`input`	A `Tensor` or `RaggedTensor` of type string. (Must be UTF-8.)
`normalization_form`	One of the following string values ('NFC', 'NFKC', 'NFD', 'NFKD'). Default is 'NFKC'.
`name`	The name for this op (optional).

Returns
A `Tensor` or `RaggedTensor` of type string, with normalized contents.

text.normalize_utf8

Used in the notebooks

Examples:

Args

Returns