View source on GitHub |
Splits a string tensor into bytes.
Inherits From: SplitterWithOffsets
, Splitter
text.ByteSplitter()
Methods
split
split(
input
)
Splits a string tensor into bytes.
The strings are split bytes. Thus, some unicode characters may be split into multiple bytes.
Example:
ByteSplitter().split("hello")
<tf.Tensor: shape=(5,), dtype=uint8, numpy=array([104, 101, 108, 108, 111],
dtype=uint8)>
Args | |
---|---|
input
|
A RaggedTensor or Tensor of strings with any shape.
|
Returns | |
---|---|
A RaggedTensor of bytes. The returned shape is the shape of the
input tensor with an added ragged dimension for the bytes that make up
each string.
|
split_by_offsets
split_by_offsets(
input, start_offsets, end_offsets
)
Splits a string tensor into sub-strings.
The strings are split based upon the provided byte offsets.
Example:
splitter = ByteSplitter()
substrings = splitter.split_by_offsets("hello", [0, 4], [4, 5])
print(substrings.numpy())
[b'hell' b'o']
Args | |
---|---|
input
|
Tensor or RaggedTensor of strings of any shape to split.
|
start_offsets
|
Tensor or RaggedTensor of byte offsets to start splits
on (inclusive). This should be one more than the rank of input .
|
end_offsets
|
Tensor or RaggedTensor of byte offsets to end splits
on (exclusive). This should be one more than the rank of input .
|
Returns | |
---|---|
A RaggedTensor or Tensor of substrings. The returned shape is the
shape of the offsets.
|
split_with_offsets
split_with_offsets(
input
)
Splits a string tensor into bytes.
The strings are split bytes. Thus, some unicode characters may be split into multiple bytes.
Example:
splitter = ByteSplitter()
bytes, starts, ends = splitter.split_with_offsets("hello")
print(bytes.numpy(), starts.numpy(), ends.numpy())
[104 101 108 108 111] [0 1 2 3 4] [1 2 3 4 5]
Args | |
---|---|
input
|
A RaggedTensor or Tensor of strings with any shape.
|
Returns | |
---|---|
A RaggedTensor of bytes. The returned shape is the shape of the
input tensor with an added ragged dimension for the bytes that make up
each string.
|
Returns | |
---|---|
A tuple (bytes, offsets) where:
|