View source on GitHub |
Split input
by delimiters that match a regex pattern.
text.regex_split(
input,
delim_regex_pattern,
keep_delim_regex_pattern='',
name=None
)
regex_split
will split input
using delimiters that match a
regex pattern in delim_regex_pattern
. Here is an example:
text_input=["hello there"]
# split by whitespace
regex_split(input=text_input,
delim_regex_pattern="\s")
<tf.RaggedTensor [[b'hello', b'there']]>
By default, delimiters are not included in the split string results.
Delimiters may be included by specifying a regex pattern
keep_delim_regex_pattern
. For example:
text_input=["hello there"]
# split by whitespace
regex_split(input=text_input,
delim_regex_pattern="\s",
keep_delim_regex_pattern="\s")
<tf.RaggedTensor [[b'hello', b' ', b'there']]>
If there are multiple delimiters in a row, there are no empty splits emitted. For example:
text_input=["hello there"] # Note the two spaces between the words.
# split by whitespace
regex_split(input=text_input,
delim_regex_pattern="\s")
<tf.RaggedTensor [[b'hello', b'there']]>
See https://github.com/google/re2/wiki/Syntax for the full list of supported expressions.
Returns | |
---|---|
A RaggedTensors containing of type string containing the split string pieces. |