Demo¶
[1]:
import numpy as np
import scipy as sp
import scipy.sparse
import matplotlib.pyplot as plt
%matplotlib inline
[2]:
from bayesbridge import BayesBridge, RegressionModel, RegressionCoefPrior
from simulate_data import simulate_design, simulate_outcome
from util import mcmc_summarizer
BayesBridge supports both dense (numpy array) and sparse (scipy sparse matrix) design matrices.¶
[3]:
n_obs, n_pred = 10 ** 4, 10 ** 3
X = simulate_design(
n_obs, n_pred,
binary_frac=.9,
binary_pred_freq=.2,
shuffle_columns=True,
format_='sparse',
seed=111
)
[4]:
beta_true = np.zeros(n_pred)
beta_true[:5] = 1.5
beta_true[5:10] = 1.
beta_true[10:15] = .5
n_trial = np.ones(X.shape[0]) # Binary outcome.
y = simulate_outcome(
X, beta_true, intercept=0.,
n_trial=n_trial, model='logit', seed=1
)
Specify regression model and prior.¶
BayesBridge uses a prior \(\pi(\beta_j \, | \, \tau) \propto \tau^{-1} \exp\left( - \, \left| \tau^{-1} \beta_j \right|^\alpha \right)\) for \(0 < \alpha \leq 1\). The default bridge exponent is \(\alpha = 1 / 2\).
[5]:
model = RegressionModel(
y, X, family='logit',
add_intercept=True, center_predictor=True,
# Do *not* manually add intercept to or center X.
)
prior = RegressionCoefPrior(
bridge_exponent=.5,
n_fixed_effect=0,
# Number of coefficients with Gaussian priors of pre-specified sd.
sd_for_intercept=float('inf'),
# Set it to float('inf') for a flat prior.
sd_for_fixed_effect=1.,
regularizing_slab_size=2.,
# Weakly constrain the magnitude of coefficients under bridge prior.
)
bridge = BayesBridge(model, prior)
Run the Gibbs sampler.¶
[6]:
samples, mcmc_info = bridge.gibbs(
n_iter=250, n_burnin=0, thin=1,
init={'global_scale': .01},
coef_sampler_type='cg',
seed=111
)
Check convergence by looking at the traceplot for posterior log-density.
[7]:
plt.figure(figsize=(10, 4))
plt.rcParams['font.size'] = 20
plt.plot(samples['logp'])
plt.xlabel('MCMC iteration')
plt.ylabel('Posterior log density')
plt.show()
Restart MCMC from the last iteration with ‘gibbs_additional_iter()’.¶
[8]:
samples, mcmc_info = bridge.gibbs_additional_iter(
mcmc_info, n_add_iter=250
)
Add more samples (while keeping the previous ones) with ‘merge=True’.
[9]:
samples, mcmc_info = bridge.gibbs_additional_iter(
mcmc_info, n_add_iter=750, merge=True, prev_samples=samples
)
coef_samples = samples['coef'][1:, :] # Extract all but the intercept
Check mixing of regression coefficients and their posterior marginals.¶
Typically the convergence is quick and mixing of the regression coefficients is adequate.
[10]:
plt.figure(figsize=(12, 5))
plt.rcParams['font.size'] = 20
plt.plot(coef_samples[[0, 5, 10, 15], :].T)
plt.xlabel('MCMC iteration')
plt.ylabel(r'$\beta_j$', rotation=0, labelpad=10)
plt.show()
[11]:
plt.figure(figsize=(14, 5))
plt.rcParams['font.size'] = 20
n_coef_to_plot = 25
mcmc_summarizer.plot_conf_interval(
coef_samples, conf_level=.95,
n_coef_to_plot=n_coef_to_plot, marker_scale=1.4
);
plt.plot(
beta_true[:n_coef_to_plot], '--', color='tab:orange',
label='True value'
)
plt.xlabel(r'Coefficient index $j$')
plt.ylabel(r'$\beta_j$', rotation=0, labelpad=10)
plt.xticks([0, 5, 10, 15, 20])
plt.legend(frameon=False)
plt.show()