View source on GitHub
|
Computes CTC (Connectionist Temporal Classification) loss.
tf.nn.ctc_loss(
labels,
logits,
label_length,
logit_length,
logits_time_major=True,
unique=None,
blank_index=None,
name=None
)
This op implements the CTC loss as presented in Graves et al., 2006
Connectionist temporal classification (CTC) is a type of neural network output and associated scoring function, for training recurrent neural networks (RNNs) such as LSTM networks to tackle sequence problems where the timing is variable. It can be used for tasks like on-line handwriting recognition or recognizing phones in speech audio. CTC refers to the outputs and scoring, and is independent of the underlying neural network structure.
Notes:
- This class performs the softmax operation for you, so
logitsshould be e.g. linear projections of outputs by an LSTM. - Outputs true repeated classes with blanks in between, and can also output repeated classes with no blanks in between that need to be collapsed by the decoder.
labelsmay be supplied as either a dense, zero-paddedTensorwith a vector of label sequence lengths OR as aSparseTensor.- On TPU: Only dense padded
labelsare supported. - On CPU and GPU: Caller may use
SparseTensoror dense paddedlabelsbut calling with aSparseTensorwill be significantly faster. - Default blank label is
0instead ofnum_labels - 1(wherenum_labelsis the innermost dimension size oflogits), unless overridden byblank_index.
tf.random.set_seed(50)batch_size = 8num_labels = 6max_label_length = 5num_frames = 12labels = tf.random.uniform([batch_size, max_label_length],minval=1, maxval=num_labels, dtype=tf.int64)logits = tf.random.uniform([num_frames, batch_size, num_labels])label_length = tf.random.uniform([batch_size], minval=2,maxval=max_label_length, dtype=tf.int64)label_mask = tf.sequence_mask(label_length, maxlen=max_label_length,dtype=label_length.dtype)labels *= label_masklogit_length = [num_frames] * batch_sizewith tf.GradientTape() as t:t.watch(logits)ref_loss = tf.nn.ctc_loss(labels=labels,logits=logits,label_length=label_length,logit_length=logit_length,blank_index=0)ref_grad = t.gradient(ref_loss, logits)
Returns | |
|---|---|
loss
|
A 1-D float Tensor of shape [batch_size], containing negative log
probabilities.
|
Raises | |
|---|---|
ValueError
|
Argument blank_index must be provided when labels is a
SparseTensor.
|
References | |
|---|---|
|
Connectionist Temporal Classification - Labeling Unsegmented Sequence Data
with Recurrent Neural Networks:
Graves et al., 2006
(pdf)
https://en.wikipedia.org/wiki/Connectionist_temporal_classification |
View source on GitHub