Generalised Linear Mixed Models¶
Introduction¶
A GLMM can be described in two parts. The first part consists in the latent variable
where \(\mathbf u \sim \mathcal N(\mathbf 0, v_0\mathrm I_k)\) is a vector of random effects and \(\epsilon_i\) are iid Normal random variables with variance \(v_1\) each. The second part connects the latent variable to the observed one:
where \(g(\cdot)\) is a link function 3 and \(\text{ExpFam}(\cdot)\) is an exponential-family distribution 4 with mean \(\mu_i\). The marginal likelihood is thus given by
We use \(\mathrm K\) for refering to
\(\mathrm G\mathrm G^{\intercal}\) when appropriate.
The module glimix_core.glmm
implements two algorithms for parameter fitting via
Maximum Likelihood 1: Expectation Propagation approximation 2 when the
likelihood is not normally distributed (refer to GLMMExpFam
);
a closed-form solution otherwise (refer to GLMMNormal
).
Exponential family likelihood¶
- class glimix_core.glmm.GLMMExpFam(y, lik, X, QS=None, n_int=1000, rtol=1.49e-05, atol=1.49e-08)[source]¶
Generalised Linear Gaussian Processes implementation.
It implements inference over GLMMs via the Expectation Propagation [Min01] algorithm. It currently supports the
"Bernoulli"
,"Probit"
,"Binomial"
, and"Poisson"
likelihoods. (For heterogeneous Normal likelihood, please refer toglimix_core.glmm.GLMMNormal
for a closed-form inference.)- Parameters
y (array_like) – Outcome variable.
lik (tuple) – Likelihood definition. The first item is one of the following likelihood names:
"Bernoulli"
,"Binomial"
,"Normal"
, and"Poisson"
. For Binomial, the second item is an array of outcomes.X (array_like) – Covariates.
QS (tuple) – Economic eigendecomposition.
Example
>>> from numpy import dot, sqrt, zeros >>> from numpy.random import RandomState >>> >>> from numpy_sugar.linalg import economic_qs >>> >>> from glimix_core.glmm import GLMMExpFam >>> >>> random = RandomState(0) >>> nsamples = 10 >>> >>> X = random.randn(nsamples, 2) >>> G = random.randn(nsamples, 100) >>> K = dot(G, G.T) >>> ntrials = random.randint(1, 100, nsamples) >>> z = dot(G, random.randn(100)) / sqrt(100) >>> >>> successes = zeros(len(ntrials), int) >>> for i in range(len(ntrials)): ... successes[i] = sum(z[i] + 0.2 * random.randn(ntrials[i]) > 0) >>> >>> QS = economic_qs(K) >>> >>> glmm = GLMMExpFam(successes, ('binomial', ntrials), X, QS) >>> print('Before: %.2f' % glmm.lml()) Before: -16.40 >>> glmm.fit(verbose=False) >>> print('After: %.2f' % glmm.lml()) After: -13.43
- property beta¶
Fixed-effect sizes.
- Returns
\(\boldsymbol\beta\).
- Return type
- copy()¶
Create a copy of this object.
- covariance()[source]¶
Covariance of the prior.
- Returns
\(v_0 \mathrm K + v_1 \mathrm I\).
- Return type
- property delta¶
Get or set the ratio of variance between
K
andI
.- Returns
\(\delta\).
- Return type
- fit(verbose=True, factr=100000.0, pgtol=1e-07)[source]¶
Maximise the marginal likelihood.
- Parameters
verbose (bool) –
True
for progress output;False
otherwise. Defaults toTrue
.factr (float, optional) – The iteration stops when
(f^k - f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= factr * eps
, whereeps
is the machine precision.pgtol (float, optional) – The iteration will stop when
max{|proj g_i | i = 1, ..., n} <= pgtol
wherepg_i
is the i-th component of the projected gradient.
Notes
Please, refer to
scipy.optimize.fmin_l_bfgs_b()
for further information aboutfactr
andpgtol
.
- fix(var_name)[source]¶
Prevent a variable to be adjusted.
- Parameters
var_name (str) – Variable name.
- gradient()[source]¶
Gradient of the log of the marginal likelihood.
- Returns
Map between variables to their gradient values.
- Return type
- mean()¶
Mean of the prior.
- Returns
\(\mathrm X\boldsymbol\beta\).
- Return type
- property name¶
Name of this function.
Heterogeneous Normal likelihood¶
- class glimix_core.glmm.GLMMNormal(eta, tau, X, QS=None)[source]¶
LMM with heterogeneous Normal likelihoods.
We model
\[\tilde{\boldsymbol\mu} \sim \mathcal N(\mathrm X\boldsymbol\beta, v_0 \mathrm K + v_1 \mathrm I + \tilde{\Sigma}),\]where \(\tilde{\boldsymbol\mu}\) and \(\tilde{\Sigma}\) are typically given by EP approximation to non-Normal likelihood (please, refer to
glimix_core.expfam.GLMMExpFam
). If that is the case, \(\tilde{\Sigma}\) is a diagonal matrix with non-negative values. Those EP parameters are given to this class via their natural forms:eta
andtau
.- Parameters
eta (array_like) – \([\tilde{\mu}_0\tilde{\sigma}^{-2}_0 \quad \tilde{\mu}_1\tilde{\sigma}^{-2}_1 \quad\cdots\quad \tilde{\mu}_n\tilde{\sigma}^{-2}_n]\).
tau (array_like) – \([\tilde{\sigma}^{-2}_0\quad\tilde{\sigma}^{-2}_1 \quad\cdots\quad \tilde{\sigma}^{-2}_n]\).
X (array_like) – Covariates.
QS (tuple) – Economic eigendecomposition of \(\mathrm K\).
- property beta¶
Fixed-effect sizes.
- Returns
\(\boldsymbol\beta\).
- Return type
- copy()¶
Create a copy of this object.
- covariance()¶
Covariance of the prior.
- Returns
\(v_0 \mathrm K + v_1 \mathrm I\).
- Return type
- property delta¶
Get or set the ratio of variance between
K
andI
.- Returns
\(\delta\).
- Return type
- fit(verbose=True, factr=100000.0, pgtol=1e-07)¶
Maximise the marginal likelihood.
- Parameters
verbose (bool) –
True
for progress output;False
otherwise. Defaults toTrue
.factr (float, optional) – The iteration stops when
(f^k - f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= factr * eps
, whereeps
is the machine precision.pgtol (float, optional) – The iteration will stop when
max{|proj g_i | i = 1, ..., n} <= pgtol
wherepg_i
is the i-th component of the projected gradient.
Notes
Please, refer to
scipy.optimize.fmin_l_bfgs_b()
for further information aboutfactr
andpgtol
.
- fix(var_name)[source]¶
Prevent a variable to be adjusted.
- Parameters
var_name (str) – Variable name.
- get_fast_scanner()[source]¶
Return
glimix_core.lmm.FastScanner
for the current delta.
- mean()¶
Mean of the prior.
- Returns
\(\mathrm X\boldsymbol\beta\).
- Return type
- property name¶
Name of this function.
References
- 1
Maximum likelihood estimation. (2017, September 9). In Wikipedia, The Free Encyclopedia. Retrieved 14:48, September 22, 2017, from https://en.wikipedia.org/w/index.php?title=Maximum_likelihood_estimation&oldid=799704694
- 2
Minka, T. P. (2001, August). Expectation propagation for approximate Bayesian inference. In Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence (pp. 362-369). Morgan Kaufmann Publishers Inc..
- 3
Wikipedia contributors. (2018, June 21). Generalized linear model. In Wikipedia, The Free Encyclopedia. Retrieved 09:45, August 6, 2018, from https://en.wikipedia.org/w/index.php?title=Generalized_linear_model&oldid=846910075#Link_function
- 4
Wikipedia contributors. (2018, June 29). Exponential family. In Wikipedia, The Free Encyclopedia. Retrieved 09:48, August 6, 2018, from https://en.wikipedia.org/w/index.php?title=Exponential_family&oldid=848114709