Join TensorFlow at Google I/O, May 11-12 Register now


Computes LCS-based similarity score between the hypotheses and references.

Used in the notebooks

Used in the tutorials

The Rouge-L metric is a score from 0 to 1 indicating how similar two sequences are, based on the length of the longest common subsequence (LCS). In particular, Rouge-L is the weighted harmonic mean (or f-measure) combining the LCS precision (the percentage of the hypothesis sequence covered by the LCS) and the LCS recall (the percentage of the reference sequence covered by the LCS).

Source: rouge-a-package-for-automatic-evaluation-of-summaries/

This method returns the F-measure, Precision, and Recall for each (hypothesis, reference) pair.

Alpha is used as a weight for the harmonic mean of precision and recall. A value of 0 means recall is more important and 1 means precision is more important. Leaving alpha unset implies alpha=.5, which is the default in the official script. Setting alpha to a negative number triggers a compatibility mode with the tensor2tensor implementation of ROUGE-L.

hypotheses = tf.ragged.constant([["a","b"]])
references = tf.ragged.constant([["b"]])
f, p, r = rouge_l(hypotheses, references, alpha=1)
print("f: %s, p: %s, r: %s" % (f, p, r))
f: tf.Tensor([0.5], shape=(1,), dtype=float32),
p: tf.Tensor([0.5], shape=(1,), dtype=float32),
r: tf.Tensor([1.], shape=(1,), dtype=float32)

hypotheses A RaggedTensor with shape [N, (hyp_sentence_len)] and integer or string values.
references A RaggedTensor with shape [N, (ref_sentence_len)] and integer or string values.
alpha optional float parameter for weighting

an (f_measure, p_measure, r_measure) tuple, where each element is a vector of floats with shape [N]. The i-th float in each vector contains the similarity measure of hypotheses[i] and references[i].