TensorFlow is back at Google I/O on May 14! Register now

text.boise_tags_to_offsets

Converts the token offsets and BOISE tags into span offsets and span type.

text.boise_tags_to_offsets(
    token_begin_offsets, token_end_offsets, boise_tags
)

In the BOISE scheme there is a set of 5 labels for each type:

(B)egin: meaning the beginning of the span type.
(O)utside: meaning the token is outside of any span type
(I)nside: the token is inside the span
(S)ingleton: the entire span consists of this single token.
(E)nd: this token is the end of the span.

For example, given the following example string and entity:

content = "Who let the dogs out" entity = "dogs" tokens = ["Who", "let", "the", "dogs", "out"] token_begin_offsets = [0, 4, 8, 12, 17] token_end_offsets = [3, 7, 11, 16, 20] span_begin_offsets = [12] span_end_offsets = [16] span_type = ["animal"]

When given the token begin/end offsets and BOISE tags for an input text sequence, this function translates them into entity span begin/end offsets and span types.

Example:

>>> token_begin_offsets = tf.ragged.constant(
...   [[0, 4, 8, 12, 17], [0, 4, 8, 12]])
>>> token_end_offsets = tf.ragged.constant(
...   [[3, 7, 11, 16, 20], [3, 7, 11, 16]])
>>> boise_tags = tf.ragged.constant(
...   [['O', 'B-animal', 'I-animal', 'E-animal', 'O'],
...    ['O', 'O', 'O', 'S-loc']])
>>> (span_begin_offsets, span_end_offsets, span_type) = (
...   tf_text.boise_tags_to_offsets(token_begin_offsets, token_end_offsets,
...     boise_tags))
>>> span_begin_offsets
<tf.RaggedTensor [[4], [12]]>
>>> span_end_offsets
<tf.RaggedTensor [[16], [16]]>
>>> span_type
<tf.RaggedTensor [[b'animal'], [b'loc']]>

Args
`token_begin_offsets`	A `RaggedTensor` or `Tensor` of token begin byte offsets of int32 or int64.
`token_end_offsets`	A `RaggedTensor` or `Tensor` of token end byte offsets of int32 or int64.
`boise_tags`	A `RaggedTensor` of BOISE tag strings in the same dimension as the token begin and end offsets.

Returns
A tuple containing `span_begin_offsets`, `span_end_offsets` and `span_type`. `span_begin_offsets` is a `RaggedTensor` or `Tensor` of span begin byte offsets of int32 or int64. `span_end_offsets` is a `RaggedTensor` or `Tensor` of span end byte offsets of int32 or int64. `span_type` is a `RaggedTensor` or `Tensor` of span type strings.

text.boise_tags_to_offsets

Example:

Args

Returns