text.regex_split

Split input by delimiters that match a regex pattern.

regex_split will split input using delimiters that match a regex pattern in delim_regex_pattern. Here is an example:

text_input=["hello there"]
# split by whitespace
regex_split(input=text_input,
            delim_regex_pattern="\s")
<tf.RaggedTensor [[b'hello', b'there']]>

By default, delimiters are not included in the split string results. Delimiters may be included by specifying a regex pattern keep_delim_regex_pattern. For example:

text_input=["hello there"]
# split by whitespace
regex_split(input=text_input,
            delim_regex_pattern="\s",
            keep_delim_regex_pattern="\s")
<tf.RaggedTensor [[b'hello', b' ', b'there']]>

If there are multiple delimiters in a row, there are no empty splits emitted. For example:

text_input=["hello  there"]  #  Note the two spaces between the words.
# split by whitespace
regex_split(input=text_input,
            delim_regex_pattern="\s")
<tf.RaggedTensor [[b'hello', b'there']]>

See https://github.com/google/re2/wiki/Syntax for the full list of supported expressions.

input A Tensor or RaggedTensor of string input.
delim_regex_pattern A string containing the regex pattern of a delimiter.
keep_delim_regex_pattern (optional) Regex pattern of delimiters that should be kept in the result.
name (optional) Name of the op.

A RaggedTensors containing of type string containing the split string pieces.