|  View source on GitHub | 
Decodes each string into a sequence of code points with start offsets.
tf.strings.unicode_decode_with_offsets(
    input,
    input_encoding,
    errors='replace',
    replacement_char=65533,
    replace_control_characters=False,
    name=None
)
Used in the notebooks
| Used in the guide | 
|---|
This op is similar to tf.strings.decode(...), but it also returns the
start offset for each character in its respective string.  This information
can be used to align the characters with the original byte sequence.
Returns a tuple (codepoints, start_offsets) where:
- codepoints[i1...iN, j]is the Unicode codepoint for the- jth character in- input[i1...iN], when decoded using- input_encoding.
- start_offsets[i1...iN, j]is the start byte offset for the- jth character in- input[i1...iN], when decoded using- input_encoding.
| Returns | |
|---|---|
| A tuple of N+1dimensional tensors(codepoints, start_offsets).
 The returned tensors are  | 
Example:
input = [s.encode('utf8') for s in (u'G\xf6\xf6dnight', u'\U0001f60a')]result = tf.strings.unicode_decode_with_offsets(input, 'UTF-8')result[0].to_list() # codepoints[[71, 246, 246, 100, 110, 105, 103, 104, 116], [128522]]result[1].to_list() # offsets[[0, 1, 3, 5, 6, 7, 8, 9, 10], [0]]