tfp.glm.fit_sparse

Fits a GLM using coordinate-wise FIM-informed proximal gradient descent.

tfp.glm.fit_sparse(
    model_matrix,
    response,
    model,
    model_coefficients_start,
    tolerance,
    l1_regularizer,
    l2_regularizer=None,
    maximum_iterations=None,
    maximum_full_sweeps_per_iteration=1,
    learning_rate=None,
    name=None
)

Used in the notebooks

Used in the tutorials
Generalized Linear Models

This function uses a L1- and L2-regularized, second-order quasi-Newton method to find maximum-likelihood parameters for the given model and observed data. The second-order approximations use negative Fisher information in place of the Hessian, that is,

FisherInfo = E_Y[Hessian with respect to model_coefficients of -LogLikelihood(
    Y | model_matrix, current value of model_coefficients)]

For large, sparse data sets, model_matrix should be supplied as a SparseTensor.

Args
`model_matrix`	(Batch of) matrix-shaped, `float` `Tensor` or `SparseTensor` where each row represents a sample's features. Has shape `[N, n]` where `N` is the number of data samples and `n` is the number of features per sample.
`response`	(Batch of) vector-shaped `Tensor` with the same dtype as `model_matrix` where each element represents a sample's observed response (to the corresponding row of features).
`model`	`tfp.glm.ExponentialFamily`-like instance, which specifies the link function and distribution of the GLM, and thus characterizes the negative log-likelihood which will be minimized. Must have sufficient statistic equal to the response, that is, `T(y) = y`.
`model_coefficients_start`	(Batch of) vector-shaped, `float` `Tensor` with the same dtype as `model_matrix`, representing the initial values of the coefficients for the GLM regression. Has shape `[n]` where `model_matrix` has shape `[N, n]`.
`tolerance`	scalar, `float` `Tensor` representing the tolerance for each optimization step; see the `tolerance` argument of `fit_sparse_one_step`.
`l1_regularizer`	scalar, `float` `Tensor` representing the weight of the L1 regularization term.
`l2_regularizer`	scalar, `float` `Tensor` representing the weight of the L2 regularization term. Default value: `None` (i.e., no L2 regularization).
`maximum_iterations`	Python integer specifying maximum number of iterations of the outer loop of the optimizer (i.e., maximum number of calls to `fit_sparse_one_step`). After this many iterations of the outer loop, the algorithm will terminate even if the return value `model_coefficients` has not converged. Default value: `1`.
`maximum_full_sweeps_per_iteration`	Python integer specifying the maximum number of coordinate descent sweeps allowed in each iteration. Default value: `1`.
`learning_rate`	scalar, `float` `Tensor` representing a multiplicative factor used to dampen the proximal gradient descent steps. Default value: `None` (i.e., factor is conceptually `1`).
`name`	Python string representing the name of the TensorFlow operation. The default name is `"fit_sparse"`.

Returns
`model_coefficients`	(Batch of) `Tensor` of the same shape and dtype as `model_coefficients_start`, representing the computed model coefficients which minimize the regularized negative log-likelihood.
`is_converged`	scalar, `bool` `Tensor` indicating whether the minimization procedure converged across all batches within the specified number of iterations. Here convergence means that an iteration of the inner loop (`fit_sparse_one_step`) returns `True` for its `is_converged` output value.
`iter`	scalar, `int` `Tensor` indicating the actual number of iterations of the outer loop of the optimizer completed (i.e., number of calls to `fit_sparse_one_step` before achieving convergence).

Example

  import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions

def make_dataset(n, d, link, scale=1., dtype=np.float32):
  model_coefficients = tfd.Uniform(
      low=np.array(-1, dtype), high=np.array(1, dtype)).sample(
          d, seed=42)
  radius = np.sqrt(2.)
  model_coefficients *= radius / tf.linalg.norm(model_coefficients)
  mask = tf.random.shuffle(tf.range(d)) < tf.to_int32(0.5 * tf.to_float(d))
  model_coefficients = tf.where(mask, model_coefficients,
                                tf.zeros_like(model_coefficients))
  model_matrix = tfd.Normal(
      loc=np.array(0, dtype), scale=np.array(1, dtype)).sample(
          [n, d], seed=43)
  scale = tf.convert_to_tensor(scale, dtype)
  linear_response = tf.matmul(model_matrix,
                              model_coefficients[..., tf.newaxis])[..., 0]
  if link == 'linear':
    response = tfd.Normal(loc=linear_response, scale=scale).sample(seed=44)
  elif link == 'probit':
    response = tf.cast(
        tfd.Normal(loc=linear_response, scale=scale).sample(seed=44) > 0,
                   dtype)
  elif link == 'logit':
    response = tfd.Bernoulli(logits=linear_response).sample(seed=44)
  else:
    raise ValueError('unrecognized true link: {}'.format(link))
  return model_matrix, response, model_coefficients, mask

with tf.Session() as sess:
  x_, y_, model_coefficients_true_, _ = sess.run(make_dataset(
      n=int(1e5), d=100, link='probit'))

  model = tfp.glm.Bernoulli()
  model_coefficients_start = tf.zeros(x_.shape[-1], np.float32)

  model_coefficients, is_converged, num_iter = tfp.glm.fit_sparse(
      model_matrix=tf.convert_to_tensor(x_),
      response=tf.convert_to_tensor(y_),
      model=model,
      model_coefficients_start=model_coefficients_start,
      l1_regularizer=800.,
      l2_regularizer=None,
      maximum_iterations=10,
      maximum_full_sweeps_per_iteration=10,
      tolerance=1e-6,
      learning_rate=None)

  model_coefficients_, is_converged_, num_iter_ = sess.run([
      model_coefficients, is_converged, num_iter])

  print("is_converged:", is_converged_)
  print("    num_iter:", num_iter_)
  print("\nLearned / True")
  print(np.concatenate(
      [[model_coefficients_], [model_coefficients_true_]], axis=0).T)

# ==>
# is_converged: True
#     num_iter: 1
#
# Learned / True
# [[ 0.          0.        ]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [ 0.11195257  0.12484948]
#  [ 0.          0.        ]
#  [ 0.05191106  0.06394956]
#  [-0.15090358 -0.15325639]
#  [-0.18187316 -0.18825999]
#  [-0.06140942 -0.07994166]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [ 0.14474444  0.15810856]
#  [ 0.          0.        ]
#  [-0.25249591 -0.24260855]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [-0.03888761 -0.06755984]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [-0.0192222  -0.04169233]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [ 0.01434913  0.03568212]
#  [-0.11336883 -0.12873614]
#  [ 0.          0.        ]
#  [-0.24496339 -0.24048163]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [ 0.04088281  0.06565224]
#  [-0.12784363 -0.13359821]
#  [ 0.05618424  0.07396613]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [ 0.         -0.01719233]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [-0.00076072 -0.03607186]
#  [ 0.21801499  0.21146794]
#  [-0.02161094 -0.04031265]
#  [ 0.0918689   0.10487888]
#  [ 0.0106154   0.03233612]
#  [-0.07817317 -0.09725142]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [-0.23725343 -0.24194022]
#  [ 0.          0.        ]
#  [-0.08725718 -0.1048776 ]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [-0.02114314 -0.04145789]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [-0.02710908 -0.04590397]
#  [ 0.15293184  0.15415154]
#  [ 0.2114463   0.2088728 ]
#  [-0.10969634 -0.12368613]
#  [ 0.         -0.01505797]
#  [-0.01140458 -0.03234904]
#  [ 0.16051085  0.1680062 ]
#  [ 0.09816848  0.11094204]

References

[1]: Jerome Friedman, Trevor Hastie and Rob Tibshirani. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 2010. https://www.jstatsoft.org/article/view/v033i01/v33i01.pdf

[2]: Guo-Xun Yuan, Chia-Hua Ho and Chih-Jen Lin. An Improved GLMNET for L1-regularized Logistic Regression. Journal of Machine Learning Research, 13, 2012. http://www.jmlr.org/papers/volume13/yuan12a/yuan12a.pdf

tfp.glm.fit_sparse

Used in the notebooks

Args

Returns

Example

References