Main Class and Method

BayesBridge and Gibbs Sampler

class bayesbridge.BayesBridge(model, prior)

Implement Gibbs sampler for Bayesian bridge sparse regression.

Parameters
  • model (RegressionModel object) –

  • prior (RegressionCoefPrior object) –

gibbs(n_iter, n_burnin=0, thin=1, seed=None, init={'global_scale': 0.1}, params_to_save=('coef', 'global_scale', 'logp'), coef_sampler_type=None, n_status_update=0, options=None, _add_iter_mode=False)

Generate posterior samples under the specified model and prior.

Parameters
  • n_iter (int) – Total number of MCMC iterations i.e. burn-ins + saved posterior draws

  • n_burnin (int) – Number of burn-in samples to be discarded

  • thin (int) – Number of iterations per saved samples for “thinning” MCMC to reduce the output size. In other words, the function saves an MCMC sample every thin iterations, returning floor(n_iter / thin) samples.

  • seed (int) – Seed for random number generator.

  • init (dict of numpy arrays) – Specifies, partially or completely, the initial state of Markov chain. The partial option allows either specifying the global scale or regression coefficients. The former is the recommended default option since the global scale parameter is easier to choose, representing the prior expected magnitude of regression coefficients and . (But the latter might make sense if some preliminary estimate of coefficients are available.) Other parameters are then initialized through a combination of heuristics and conditional optimization.

  • coef_sampler_type ({None, 'cholesky', 'cg', 'hmc'}) – Specifies the sampling method used to update regression coefficients. If None, the method is chosen via a crude heuristic based on the model type, as well as size and sparsity level of design matrix. For linear and logistic models with large and sparse design matrix, the conjugate gradient sampler (‘cg’) is preferred over the Cholesky decomposition based sampler (‘cholesky’). For other models, only Hamiltonian Monte Carlo (‘hmc’) can be used.

  • params_to_save ({'all', tuple or list of str}) – Specifies which parameters to save during MCMC iterations. By default, the most relevant parameters — regression coefficients, global scale, posterior log-density — are saved. Use all to save all the parameters (but beaware of the extra memory requirement), including local scale and, depending on the model, precision ( inverse variance) of observations.

  • n_status_update (int) – Number of updates to print on stdout during the sampler run.

  • options (None, dict, SamplerOptions) – SamplerOptions class or a dict whose keywords are used as inputs to the class.

Returns

  • samples (dict) – Contains MCMC samples of the parameters as specified by params_to_save. The last dimension of the arrays correspond to MCMC iterations; for example, samples['coef'][:, 0] is the first MCMC sample of regression coefficients.

  • mcmc_info (dict) – Contains information on the MCMC run and sampler settings, enough in particular to reproduce and resume the sampling process.

Model and Prior

bayesbridge.RegressionModel(outcome, X, family='linear', add_intercept=None, center_predictor=True)

Prepare input data to BayesBridge, with pre-processings as needed.

For the Cox model, the observations (rows of X) are reordered to optimize likelihood, gradient, and Hessian evaluations.

Parameters
  • outcome (1-d numpy array, tuple of two 1-d numpy arrays) – n_success or (n_success, n_trial) if family == ‘logistic’. If the input is a single array, then outcome is assumed binary. (event_time, censoring_time) if family == ‘cox’.

  • X (numpy array or scipy sparse matrix) –

  • family (str, {'linear', 'logit', 'cox'}) –

  • add_intercept (bool, None) – If None, add intercept except when family == ‘cox’

  • center_predictor (bool) –

class bayesbridge.RegressionCoefPrior(bridge_exponent=0.5, n_fixed_effect=0, sd_for_intercept=inf, sd_for_fixed_effect=inf, regularizing_slab_size=inf, global_scale_prior_hyper_param=None, _global_scale_parametrization='coef_magnitude')

Encapisulate prior information for BayesBridge.

Parameters
  • bridge_exponent (float < 2) – Exponent of the bridge prior on regression coefficients. For example, the value of 2 (albeit unsupported) would correspond to Gaussian prior and of 1 double-exponential as in Bayesian Lasso.

  • n_fixed_effect (int) – Number of predictors — other than intercept and placed at the first columns of the design matrices — whose coefficients are estimated with Gaussian priors of pre-specified standard deviation(s).

  • sd_for_intercept (float) – Standard deviation of Gaussian prior on the intercept. Inf corresponds to an uninformative flat prior.

  • sd_for_fixed_effect (float, numpy array) – Standard deviation(s) of Gaussian prior(s) on fixed effects. If an array, the length must be the same as n_fixed_effect. Inf corresponds to an uninformative flat prior.

  • regularizing_slab_size (float) – Standard deviation of the Gaussian tail-regularizer on the bridge prior. Used to impose soft prior constraints on a range of regression coefficients in case the data provides limited information (e.g. when complete separation occurs). One may, for example, set the slab size by first choosing a value which regression coefficients are very unlikely to exceed in magnitude and then dividing the value by 1.96.

  • global_scale_prior_hyper_param (dict, None) – Should contain pair of keys ‘log10_mean’ and ‘log10_sd’, specifying the prior mean and standard deviation of log10(global_scale). If None, the default reference prior for a scale parameter is used.

  • _global_scale_parametrization (str, {'raw', 'coef_magnitude'}) – If ‘coef_magnitude’, scale the local and global scales so that the global scale parameter coincide with the prior expected magnitude of regression coefficients.