# tfp.sts.SparseLinearRegression

Formal representation of a sparse linear regression.

Inherits From: `StructuralTimeSeries`

This model defines a time series given by a sparse linear combination of covariate time series provided in a design matrix:

``````observed_time_series = matmul(design_matrix, weights)
``````

This is identical to `tfp.sts.LinearRegression`, except that `SparseLinearRegression` uses a parameterization of a Horseshoe prior  to encode the assumption that many of the `weights` are zero, i.e., many of the covariate time series are irrelevant. See the mathematical details section below for further discussion. The prior parameterization used by `SparseLinearRegression` is more suitable for inference than that obtained by simply passing the equivalent `tfd.Horseshoe` prior to `LinearRegression`; when sparsity is desired, `SparseLinearRegression` will likely yield better results.

This component does not itself include observation noise; it defines a deterministic distribution with mass at the point `matmul(design_matrix, weights)`. In practice, it should be combined with observation noise from another component such as `tfp.sts.Sum`, as demonstrated below.

#### Examples

Given `series1`, `series2` as `Tensors` each of shape `[num_timesteps]` representing covariate time series, we create a regression model that conditions on these covariates:

``````regression = tfp.sts.SparseLinearRegression(
design_matrix=tf.stack([series1, series2], axis=-1),
weights_prior_scale=0.1)
``````

The `weights_prior_scale` determines the level of sparsity; small scales encourage the weights to be sparse. In some cases, such as when the likelihood is iid Gaussian with known scale, the prior scale can be analytically related to the expected number of nonzero weights ; however, this is not the case in general for STS models.

If the design matrix has batch dimensions, by default the model will create a matching batch of weights. For example, if ```design_matrix.shape == [ num_users, num_timesteps, num_features]```, by default the model will fit separate weights for each user, i.e., it will internally represent `weights.shape == [num_users, num_features]`. To share weights across some or all batch dimensions, you can manually specify the batch shape for the weights:

``````# design_matrix.shape == [num_users, num_timesteps, num_features]
regression = tfp.sts.SparseLinearRegression(
design_matrix=design_matrix,
weights_batch_shape=[])  # weights.shape -> [num_features]
``````

#### Mathematical Details

The basic horseshoe prior  is defined as a Cauchy-normal scale mixture:

``````scales[i] ~ HalfCauchy(loc=0, scale=1)
weights[i] ~ Normal(loc=0., scale=scales[i] * global_scale)`
``````

The Cauchy scale parameters puts substantial mass near zero, encouraging weights to be sparse, but their heavy tails allow weights far from zero to be estimated without excessive shrinkage. The horseshoe can be thought of as a continuous relaxation of a traditional 'spike-and-slab' discrete sparsity prior, in which the latent Cauchy scale mixes between 'spike' (`scales[i] ~= 0`) and 'slab' (`scales[i] >> 0`) regimes.

Following the recommendations in , `SparseLinearRegression` implements a horseshoe with the following adaptations:

• The Cauchy prior on `scales[i]` is represented as an InverseGamma-Normal compound.
• The `global_scale` parameter is integrated out following a ```Cauchy(0., scale=weights_prior_scale)``` hyperprior, which is also represented as an InverseGamma-Normal compound.
• All compound distributions are implemented using a non-centered parameterization.

The compound, non-centered representation defines the same marginal prior as the original horseshoe (up to integrating out the global scale), but allows samplers to mix more efficiently through the heavy tails; for variational inference, the compound representation implicity expands the representational power of the variational model.

Note that we do not yet implement the regularized ('Finnish') horseshoe, proposed in  for models with weak likelihoods, because the likelihood in STS models is typically Gaussian, where it's not clear that additional regularization is appropriate. If you need this functionality, please email tfprobability@tensorflow.org.

The full prior parameterization implemented in `SparseLinearRegression` is as follows:

``````# Sample global_scale from Cauchy(0, scale=weights_prior_scale).
global_scale_variance ~ InverseGamma(alpha=0.5, beta=0.5)
global_scale_noncentered ~ HalfNormal(loc=0, scale=1)
global_scale = (global_scale_noncentered *
sqrt(global_scale_variance) *
weights_prior_scale)

# Sample local_scales from Cauchy(0, 1).
local_scale_variances[i] ~ InverseGamma(alpha=0.5, beta=0.5)
local_scales_noncentered[i] ~ HalfNormal(loc=0, scale=1)
local_scales[i] = local_scales_noncentered[i] * sqrt(local_scale_variances[i])

weights[i] ~ Normal(loc=0., scale=local_scales[i] * global_scale)
``````

#### References

: Carvalho, C., Polson, N. and Scott, J. Handling Sparsity via the Horseshoe. AISTATS (2009). http://proceedings.mlr.press/v5/carvalho09a/carvalho09a.pdf : Juho Piironen, Aki Vehtari. Sparsity information and regularization in the horseshoe and other shrinkage priors (2017). https://arxiv.org/abs/1707.01694

`design_matrix` float `Tensor` of shape ```concat([batch_shape, [num_timesteps, num_features]])```. This may also optionally be an instance of `tf.linalg.LinearOperator`.
`weights_prior_scale` float `Tensor` defining the scale of the Horseshoe prior on regression weights. Small values encourage the weights to be sparse. The shape must broadcast with `weights_batch_shape`. Default value: `0.1`.
`weights_batch_shape` if `None`, defaults to `design_matrix.batch_shape_tensor()`. Must broadcast with the batch shape of `design_matrix`. Default value: `None`.
`name` the name of this model component. Default value: 'SparseLinearRegression'.

`batch_shape` Static batch shape of models represented by this component.
`design_matrix` LinearOperator representing the design matrix.
`latent_size` Python `int` dimensionality of the latent space in this model.
`name` Name of this model component.
`parameters` List of Parameter(name, prior, bijector) namedtuples for this model.
`weights_prior_scale`

## Methods

### `batch_shape_tensor`

View source

Runtime batch shape of models represented by this component.

Returns
`batch_shape` `int` `Tensor` giving the broadcast batch shape of all model parameters. This should match the batch shape of derived state space models, i.e., `self.make_state_space_model(...).batch_shape_tensor()`.

### `joint_log_prob`

View source

Build the joint density `log p(params) + log p(y|params)` as a callable.

Args
`observed_time_series` Observed `Tensor` trajectories of shape `sample_shape + batch_shape + [num_timesteps, 1]` (the trailing `1` dimension is optional if `num_timesteps > 1`), where `batch_shape` should match `self.batch_shape` (the broadcast batch shape of all priors on parameters for this structural time series model). May optionally be an instance of `tfp.sts.MaskedTimeSeries`, which includes a mask `Tensor` to specify timesteps with missing observations.

Returns
`log_joint_fn` A function taking a `Tensor` argument for each model parameter, in canonical order, and returning a `Tensor` log probability of shape `batch_shape`. Note that, unlike `tfp.Distributions` `log_prob` methods, the `log_joint` sums over the `sample_shape` from y, so that `sample_shape` does not appear in the output log_prob. This corresponds to viewing multiple samples in `y` as iid observations from a single model, which is typically the desired behavior for parameter inference.

### `make_state_space_model`

View source

Instantiate this model as a Distribution over specified `num_timesteps`.

Args
`num_timesteps` Python `int` number of timesteps to model.
`param_vals` a list of `Tensor` parameter values in order corresponding to `self.parameters`, or a dict mapping from parameter names to values.
`initial_state_prior` an optional `Distribution` instance overriding the default prior on the model's initial state. This is used in forecasting ("today's prior is yesterday's posterior").
`initial_step` optional `int` specifying the initial timestep to model. This is relevant when the model contains time-varying components, e.g., holidays or seasonality.

Returns
`dist` a `LinearGaussianStateSpaceModel` Distribution object.

### `params_to_weights`

View source

Build regression weights from model parameters.

### `prior_sample`

View source

Sample from the joint prior over model parameters and trajectories.

Args
`num_timesteps` Scalar `int` `Tensor` number of timesteps to model.
`initial_step` Optional scalar `int` `Tensor` specifying the starting timestep. Default value: 0.
`params_sample_shape` Number of possible worlds to sample iid from the parameter prior, or more generally, `Tensor` `int` shape to fill with iid samples. Default value: `[]` (i.e., draw a single sample and don't expand the shape).
`trajectories_sample_shape` For each sampled set of parameters, number of trajectories to sample, or more generally, `Tensor` `int` shape to fill with iid samples. Default value: `[]` (i.e., draw a single sample and don't expand the shape).
`seed` Python `int` random seed.

Returns
`trajectories` `float` `Tensor` of shape `trajectories_sample_shape + params_sample_shape + [num_timesteps, 1]` containing all sampled trajectories.
`param_samples` list of sampled parameter value `Tensor`s, in order corresponding to `self.parameters`, each of shape `params_sample_shape + prior.batch_shape + prior.event_shape`.