Help protect the Great Barrier Reef with TensorFlow on Kaggle

# tf.RaggedTensor

Represents a ragged tensor.

A RaggedTensor is a tensor with one or more ragged dimensions, which are dimensions whose slices may have different lengths. For example, the inner (column) dimension of rt=[[3, 1, 4, 1], [], [5, 9, 2], [6], []] is ragged, since the column slices (rt[0, :], ..., rt[4, :]) have different lengths. Dimensions whose slices all have the same length are called uniform dimensions. The outermost dimension of a RaggedTensor is always uniform, since it consists of a single slice (and so there is no possibility for differing slice lengths).

The total number of dimensions in a RaggedTensor is called its rank, and the number of ragged dimensions in a RaggedTensor is called its ragged-rank. A RaggedTensor's ragged-rank is fixed at graph creation time: it can't depend on the runtime values of Tensors, and can't vary dynamically for different session runs.

### Potentially Ragged Tensors

Many ops support both Tensors and RaggedTensors. The term "potentially ragged tensor" may be used to refer to a tensor that might be either a Tensor or a RaggedTensor. The ragged-rank of a Tensor is zero.

### Documenting RaggedTensor Shapes

When documenting the shape of a RaggedTensor, ragged dimensions can be indicated by enclosing them in parentheses. For example, the shape of a 3-D RaggedTensor that stores the fixed-size word embedding for each word in a sentence, for each sentence in a batch, could be written as [num_sentences, (num_words), embedding_size]. The parentheses around (num_words) indicate that dimension is ragged, and that the length of each element list in that dimension may vary for each item.

### Component Tensors

Internally, a RaggedTensor consists of a concatenated list of values that are partitioned into variable-length rows. In particular, each RaggedTensor consists of:

• A values tensor, which concatenates the variable-length rows into a flattened list. For example, the values tensor for [[3, 1, 4, 1], [], [5, 9, 2], [6], []] is [3, 1, 4, 1, 5, 9, 2, 6].

• A row_splits vector, which indicates how those flattened values are divided into rows. In particular, the values for row rt[i] are stored in the slice rt.values[rt.row_splits[i]:rt.row_splits[i+1]].

#### Example:

print(tf.RaggedTensor.from_row_splits(
      values=[3, 1, 4, 1, 5, 9, 2, 6],
      row_splits=[0, 4, 4, 7, 8, 8]))
<tf.RaggedTensor [[3, 1, 4, 1], [], [5, 9, 2], [6], []]>


### Alternative Row-Partitioning Schemes

In addition to row_splits, ragged tensors provide support for four other row-partitioning schemes:

• row_lengths: a vector with shape [nrows], which specifies the length of each row.

• value_rowids and nrows: value_rowids is a vector with shape [nvals], corresponding one-to-one with values, which specifies each value's row index. In particular, the row rt[row] consists of the values rt.values[j] where value_rowids[j]==row. nrows is an integer scalar that specifies the number of rows in the RaggedTensor. (nrows is used to indicate trailing empty rows.)

• row_starts: a vector with shape [nrows], which specifies the start offset of each row. Equivalent to row_splits[:-1].

• row_limits: a vector with shape [nrows], which specifies the stop offset of each row. Equivalent to row_splits[1:].

• uniform_row_length: A scalar tensor, specifying the length of every row. This row-partitioning scheme may only be used if all rows have the same length.

Example: The following ragged tensors are equivalent, and all represent the nested list [[3, 1, 4, 1], [], [5, 9, 2], [6], []].

values = [3, 1, 4, 1, 5, 9, 2, 6]
rt1 = RaggedTensor.from_row_splits(values, row_splits=[0, 4, 4, 7, 8, 8])
rt2 = RaggedTensor.from_row_lengths(values, row_lengths=[4, 0, 3, 1, 0])
rt3 = RaggedTensor.from_value_rowids(
    values, value_rowids=[0, 0, 0, 0, 2, 2, 2, 3], nrows=5)
rt4 = RaggedTensor.from_row_starts(values, row_starts=[0, 4, 4, 7, 8])
rt5 = RaggedTensor.from_row_limits(values, row_limits=[4, 4, 7, 8, 8])


### Multiple Ragged Dimensions

RaggedTensors with multiple ragged dimensions can be defined by using a nested RaggedTensor for the values tensor. Each nested RaggedTensor adds a single ragged dimension.

inner_rt = RaggedTensor.from_row_splits(  # =rt1 from above
    values=[3, 1, 4, 1, 5, 9, 2, 6], row_splits=[0, 4, 4, 7, 8, 8])
outer_rt = RaggedTensor.from_row_splits(
    values=inner_rt, row_splits=[0, 3, 3, 5])
print(outer_rt.to_list())
[[[3, 1, 4, 1], [], [5, 9, 2]], [], [[6], []]]
print(outer_rt.ragged_rank)
2


The factory function RaggedTensor.from_nested_row_splits may be used to construct a RaggedTensor with multiple ragged dimensions directly, by providing a list of row_splits tensors:

RaggedTensor.from_nested_row_splits(
    flat_values=[3, 1, 4, 1, 5, 9, 2, 6],
    nested_row_splits=([0, 3, 3, 5], [0, 4, 4, 7, 8, 8])).to_list()
[[[3, 1, 4, 1], [], [5, 9, 2]], [], [[6], []]]


### Uniform Inner Dimensions

RaggedTensors with uniform inner dimensions can be defined by using a multidimensional Tensor for values.

rt = RaggedTensor.from_row_splits(values=tf.ones([5, 3], tf.int32),
                                  row_splits=[0, 2, 5])
print(rt.to_list())
[[[1, 1, 1], [1, 1, 1]],
 [[1, 1, 1], [1, 1, 1], [1, 1, 1]]]
print(rt.shape)
(2, None, 3)


### Uniform Outer Dimensions

RaggedTensors with uniform outer dimensions can be defined by using one or more RaggedTensor with a uniform_row_length row-partitioning tensor. For example, a RaggedTensor with shape [2, 2, None] can be constructed with this method from a RaggedTensor values with shape [4, None]:

values = tf.ragged.constant([[1, 2, 3], [4], [5, 6], [7, 8, 9, 10]])
print(values.shape)
(4, None)
rt6 = tf.RaggedTensor.from_uniform_row_length(values, 2)
print(rt6)
<tf.RaggedTensor [[[1, 2, 3], [4]], [[5, 6], [7, 8, 9, 10]]]>
print(rt6.shape)
(2, 2, None)


Note that rt6 only contains one ragged dimension (the innermost dimension). In contrast, if from_row_splits is used to construct a similar RaggedTensor, then that RaggedTensor will have two ragged dimensions:

rt7 = tf.RaggedTensor.from_row_splits(values, [0, 2, 4])
print(rt7.shape)
(2, None, None)


Uniform and ragged outer dimensions may be interleaved, meaning that a tensor with any combination of ragged and uniform dimensions may be created. For example, a RaggedTensor t4 with shape [3, None, 4, 8, None, 2] could be constructed as follows:

t0 = tf.zeros([1000, 2])                           # Shape:         [1000, 2]
t1 = RaggedTensor.from_row_lengths(t0, [...])      #           [160, None, 2]
t2 = RaggedTensor.from_uniform_row_length(t1, 8)   #         [20, 8, None, 2]
t3 = RaggedTensor.from_uniform_row_length(t2, 4)   #       [5, 4, 8, None, 2]
t4 = RaggedTensor.from_row_lengths(t3, [...])      # [3, None, 4, 8, None, 2]


values A potentially ragged tensor of any dtype and shape [nvals, ...].
row_splits A 1-D integer tensor with shape [nrows+1].
cached_row_lengths A 1-D integer tensor with shape [nrows]
cached_value_rowids A 1-D integer tensor with shape [nvals].
cached_nrows A 1-D integer scalar tensor.
internal True if the constructor is being called by one of the factory methods. If false, an exception will be raised.
uniform_row_length A scalar tensor.

TypeError If a row partitioning tensor has an inappropriate dtype.
TypeError If exactly one row partitioning argument was not specified.
ValueError If a row partitioning tensor has an inappropriate shape.
ValueError If multiple partitioning arguments are specified.
ValueError If nrows is specified but value_rowids is not None.

dtype The DType of values in this tensor.
flat_values The innermost values tensor for this ragged tensor.

Concretely, if rt.values is a Tensor, then rt.flat_values is rt.values; otherwise, rt.flat_values is rt.values.flat_values.

Conceptually, flat_values is the tensor formed by flattening the outermost dimension and all of the ragged dimensions into a single dimension.

rt.flat_values.shape = [nvals] + rt.shape[rt.ragged_rank + 1:] (where nvals is the number of items in the flattened dimensions).

#### Example:

rt = tf.ragged.constant([[[3, 1, 4, 1], [], [5, 9, 2]], [], [[6], []]])
print(rt.flat_values)
tf.Tensor([3 1 4 1 5 9 2 6], shape=(8,), dtype=int32)


nested_row_splits A tuple containing the row_splits for all ragged dimensions.

rt.nested_row_splits is a tuple containing the row_splits tensors for all ragged dimensions in rt, ordered from outermost to innermost. In particular, rt.nested_row_splits = (rt.row_splits,) + value_splits where:

• value_splits = () if rt.values is a Tensor.
• value_splits = rt.values.nested_row_splits otherwise.

#### Example:

rt = tf.ragged.constant(
    [[[[3, 1, 4, 1], [], [5, 9, 2]], [], [[6], []]]])
for i, splits in enumerate(rt.nested_row_splits):
  print('Splits for dimension %d: %s' % (i+1, splits.numpy()))
Splits for dimension 1: [0 3]
Splits for dimension 2: [0 3 3 5]
Splits for dimension 3: [0 4 4 7 8 8]


ragged_rank The number of ragged dimensions in this ragged tensor.
row_splits The row-split indices for this ragged tensor's values.

rt.row_splits specifies where the values for each row begin and end in rt.values. In particular, the values for row rt[i] are stored in the slice rt.values[rt.row_splits[i]:rt.row_splits[i+1]].

#### Example:

rt = tf.ragged.constant([[3, 1, 4, 1], [], [5, 9, 2], [6], []])
print(rt.row_splits)  # indices of row splits in rt.values
tf.Tensor([0 4 4 7 8 8], shape=(6,), dtype=int64)


shape The statically known shape of this ragged tensor.

tf.ragged.constant([[0], [1, 2]]).shape
TensorShape([2, None])

tf.ragged.constant([[[0, 1]], [[1, 2], [3, 4]]], ragged_rank=1).shape
TensorShape([2, None, 2])


uniform_row_length The length of each row in this ragged tensor, or None if rows are ragged.

rt1 = tf.ragged.constant([[1, 2, 3], [4], [5, 6], [7, 8, 9, 10]])
print(rt1.uniform_row_length)  # rows are ragged.
None