Cross Layer in Deep & Cross Network to learn explicit feature interactions.

Used in the notebooks

Used in the tutorials

A layer that creates explicit and bounded-degree feature interactions efficiently. The call method accepts inputs as a tuple of size 2 tensors. The first input x0 is the base layer that contains the original features (usually the embedding layer); the second input xi is the output of the previous Cross layer in the stack, i.e., the i-th Cross layer. For the first Cross layer in the stack, x0 = xi.

The output is x_{i+1} = x0 .* (W * xi + bias + diag_scale * xi) + xi, where .* designates elementwise multiplication, W could be a full-rank matrix, or a low-rank matrix U*V to reduce the computational cost, and diag_scale increases the diagonal of W to improve training stability ( especially for the low-rank case).

  1. R. Wang et al. See Eq. (1) for full-rank and Eq. (2) for low-rank version.
  2. R. Wang et al.

# after embedding layer in a functional model:
input = tf.keras.Input(shape=(None,), name='index', dtype=tf.int64)
x0 = tf.keras.layers.Embedding(input_dim=32, output_dim=6)
x1 = Cross()(x0, x0)
x2 = Cross()(x0, x1)
logits = tf.keras.layers.Dense(units=10)(x2)
model = tf.keras.Model(input, logits)

projection_dim project dimension to reduce the computational cost. Default is None such that a full (input_dim by input_dim) matrix W is used. If enabled, a low-rank matrix W = U*V will be used, where U is of size input_dim by projection_dim and V is of size projection_dim by input_dim. projection_dim need to be smaller than input_dim/2 to improve the model efficiency. In practice, we've observed that projection_dim = d/4 consistently preserved the accuracy of a full-rank version.
diag_scale a non-negative float used to increase the diagonal of the kernel W by diag_scale, that is, W + diag_scale * I, where I is an identity matrix.
use_bias whether to add a bias term for this layer. If set to False, no bias term will be used.
preactivation Activation applied to output matrix of the layer, before multiplication with the input. Can be used to control the scale of the layer's outputs and improve stability.
kernel_initializer Initializer to use on the kernel matrix.
bias_initializer Initializer to use on the bias vector.
kernel_regularizer Regularizer to use on the kernel matrix.
bias_regularizer Regularizer to use on bias vector.

Input shape: A tuple of 2 (batch_size, input_dim) dimensional inputs. Output shape: A single (batch_size, input_dim) dimensional output.



View source

Computes the feature cross.

x0 The input tensor
x Optional second input tensor. If provided, the layer will compute crosses between x0 and x; if not provided, the layer will compute crosses between x0 and itself.

Tensor of crosses.