Normalizes each UTF-8 string in the input tensor using the specified rule.
text.normalize_utf8(
input, normalization_form='NFKC', name=None
)
Used in the notebooks
See http://unicode.org/reports/tr15/
Examples:
# input: <string>[num_strings]
normalize_utf8(["株式会社", "KADOKAWA"])
# output: <string>[num_strings]
<tf.Tensor: shape=(2,), dtype=string, numpy=
array([b'\xe6\xa0\xaa\xe5\xbc\x8f\xe4\xbc\x9a\xe7\xa4\xbe', b'KADOKAWA'],
dtype=object)>
Args |
input
|
A Tensor or RaggedTensor of type string. (Must be UTF-8.)
|
normalization_form
|
One of the following string values ('NFC', 'NFKC',
'NFD', 'NFKD'). Default is 'NFKC'.
|
name
|
The name for this op (optional).
|
Returns |
A Tensor or RaggedTensor of type string, with normalized contents.
|