tf.nn.ctc_greedy_decoder
Stay organized with collections
Save and categorize content based on your preferences.
Performs greedy decoding on the logits given in input (best path).
tf.nn.ctc_greedy_decoder(
inputs, sequence_length, merge_repeated=True, blank_index=None
)
Given a tensor as inputs
, the blank_index
parameter defines the class
index of the blank symbol.
For example:
If blank_index
is equal to 1:
inf = float("inf")
logits = tf.constant([[[ 0., -inf, -inf],
[ -2.3, -inf, -0.1]],
[[ -inf, -0.5, -inf],
[ -inf, -inf, -0.1]],
[[ -inf, -inf, -inf],
[ -0.1, -inf, -2.3]]])
seq_lens = tf.constant([2, 3])
outputs = tf.nn.ctc_greedy_decoder(
logits,
seq_lens,
blank_index=1)
Notes:
- Regardless of the value of
merge_repeated
, if an index of a
given time and batch corresponds to the blank_index
, no new
element is emitted.
- Default
blank_index
is (num_classes - 1)
, unless overriden.
If merge_repeated
is True
, merge repeated classes in output.
This means that if consecutive logits' maximum indices are the same,
only the first of these is emitted. The sequence A B B * B * B
(where '*'
is the blank label) becomes
A B B B
if merge_repeated=True
.
A B B B B
if merge_repeated=False
.
Args |
inputs
|
3-D float Tensor sized [max_time, batch_size, num_classes] .
The logits.
|
sequence_length
|
1-D int32 vector containing sequence lengths, having size
[batch_size] .
|
merge_repeated
|
Boolean. Default: True.
|
blank_index
|
(Optional). Default: num_classes - 1 . Define the class index
to use for the blank label. Negative values will start from num_classes,
ie, -1 will reproduce the ctc_greedy_decoder behavior of using
num_classes - 1 for the blank symbol, which corresponds to the default.
|
Returns |
A tuple (decoded, neg_sum_logits) where
|
decoded
|
A single-element list. decoded[0]
is an SparseTensor containing the decoded outputs s.t.:
decoded.indices : Indices matrix (total_decoded_outputs, 2) .
The rows store: [batch, time] .
decoded.values : Values vector, size (total_decoded_outputs) .
The vector stores the decoded classes.
decoded.dense_shape : Shape vector, size (2) .
The shape values are: [batch_size, max_decoded_length]
|
neg_sum_logits
|
A float matrix (batch_size x 1) containing, for the
sequence found, the negative of the sum of the greatest logit at each
timeframe.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2023-03-17 UTC.
[null,null,["Last updated 2023-03-17 UTC."],[],[],null,["# tf.nn.ctc_greedy_decoder\n\n\u003cbr /\u003e\n\n|--------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v2.8.4/tensorflow/python/ops/ctc_ops.py#L286-L367) |\n\nPerforms greedy decoding on the logits given in input (best path).\n\n#### View aliases\n\n\n**Compat aliases for migration**\n\nSee\n[Migration guide](https://www.tensorflow.org/guide/migrate) for\nmore details.\n\n[`tf.compat.v1.nn.ctc_greedy_decoder`](https://www.tensorflow.org/api_docs/python/tf/nn/ctc_greedy_decoder)\n\n\u003cbr /\u003e\n\n tf.nn.ctc_greedy_decoder(\n inputs, sequence_length, merge_repeated=True, blank_index=None\n )\n\nGiven a tensor as `inputs`, the `blank_index` parameter defines the class\nindex of the blank symbol.\n\n#### For example:\n\nIf `blank_index` is equal to 1: \n\n inf = float(\"inf\")\n logits = tf.constant([[[ 0., -inf, -inf],\n [ -2.3, -inf, -0.1]],\n [[ -inf, -0.5, -inf],\n [ -inf, -inf, -0.1]],\n [[ -inf, -inf, -inf],\n [ -0.1, -inf, -2.3]]])\n seq_lens = tf.constant([2, 3])\n outputs = tf.nn.ctc_greedy_decoder(\n logits,\n seq_lens,\n blank_index=1)\n\n#### Notes:\n\n- Regardless of the value of `merge_repeated`, if an index of a given time and batch corresponds to the `blank_index`, no new element is emitted.\n- Default `blank_index` is `(num_classes - 1)`, unless overriden.\n\nIf `merge_repeated` is `True`, merge repeated classes in output.\nThis means that if consecutive logits' maximum indices are the same,\nonly the first of these is emitted. The sequence `A B B * B * B` (where '\\*'\nis the blank label) becomes\n\n- `A B B B` if `merge_repeated=True`.\n- `A B B B B` if `merge_repeated=False`.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `inputs` | 3-D `float` `Tensor` sized `[max_time, batch_size, num_classes]`. The logits. |\n| `sequence_length` | 1-D `int32` vector containing sequence lengths, having size `[batch_size]`. |\n| `merge_repeated` | Boolean. Default: True. |\n| `blank_index` | (Optional). Default: `num_classes - 1`. Define the class index to use for the blank label. Negative values will start from num_classes, ie, -1 will reproduce the ctc_greedy_decoder behavior of using num_classes - 1 for the blank symbol, which corresponds to the default. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| A tuple `(decoded, neg_sum_logits)` where ||\n| `decoded` | A single-element list. `decoded[0]` is an `SparseTensor` containing the decoded outputs s.t.: \u003cbr /\u003e `decoded.indices`: Indices matrix `(total_decoded_outputs, 2)`. The rows store: `[batch, time]`. `decoded.values`: Values vector, size `(total_decoded_outputs)`. The vector stores the decoded classes. `decoded.dense_shape`: Shape vector, size `(2)`. The shape values are: `[batch_size, max_decoded_length]` |\n| `neg_sum_logits` | A `float` matrix `(batch_size x 1)` containing, for the sequence found, the negative of the sum of the greatest logit at each timeframe. |\n\n\u003cbr /\u003e"]]