![]() |
Splits a string tensor into bytes.
Inherits From: SplitterWithOffsets
, Splitter
text.ByteSplitter()
Methods
split
split(
input
)
Splits a string tensor into bytes.
The strings are split bytes. Thus, some unicode characters may be split into multiple bytes.
Example:
ByteSplitter().split("hello")
<tf.Tensor: shape=(5,), dtype=uint8, numpy=array([104, 101, 108, 108, 111],
dtype=uint8)>
Args | |
---|---|
input
|
A RaggedTensor or Tensor of strings with any shape.
|
Returns | |
---|---|
A RaggedTensor of bytes. The returned shape is the shape of the
input tensor with an added ragged dimension for the bytes that make up
each string.
|
split_with_offsets
split_with_offsets(
input
)
Splits a string tensor into bytes.
The strings are split bytes. Thus, some unicode characters may be split into multiple bytes.
Example:
splitter = ByteSplitter()
bytes, starts, ends = splitter.split_with_offsets("hello")
print(bytes.numpy(), starts.numpy(), ends.numpy())
[104 101 108 108 111] [0 1 2 3 4] [1 2 3 4 5]
Args | |
---|---|
input
|
A RaggedTensor or Tensor of strings with any shape.
|
Returns | |
---|---|
A RaggedTensor of bytest. The returned shape is the shape of the
input tensor with an added ragged dimension for the bytes that make up
each string.
|
Returns | |
---|---|
A tuple (bytes, offsets) where:
|