Generalised Linear Mixed Models

Introduction

A GLMM can be described in two parts. The first part consists in the latent variable

\[\mathbf z = \mathrm X\boldsymbol\beta + \mathrm G\mathbf u + \boldsymbol\epsilon,\]

where \(\mathbf u \sim \mathcal N(\mathbf 0, v_0\mathrm I_k)\) is a vector of random effects and \(\epsilon_i\) are iid Normal random variables with variance \(v_1\) each. The second part connects the latent variable to the observed one:

\[y_i ~|~ z_i \sim \text{ExpFam}(\mu_i = g(z_i)),\]

where \(g(\cdot)\) is a link function [3] and \(\text{ExpFam}(\cdot)\) is an exponential-family distribution [4] with mean \(\mu_i\). The marginal likelihood is thus given by

\[p(\mathbf y) = \int \prod_i \text{ExpFam}(y_i ~|~ \mu_i = g(z_i)) \mathcal N(\mathbf z ~|~ \mathrm X\boldsymbol\beta, v_0\mathrm G\mathrm G^{\intercal} + v_1\mathrm I_n) \mathrm d\mathbf z\]

We use \(\mathrm K\) for refering to \(\mathrm G\mathrm G^{\intercal}\) when appropriate. The module glimix_core.glmm implements two algorithms for parameter fitting via Maximum Likelihood [1]: Expectation Propagation approximation [2] when the likelihood is not normally distributed (refer to GLMMExpFam); a closed-form solution otherwise (refer to GLMMNormal).

Exponential family likelihood

class glimix_core.glmm.GLMMExpFam(y, lik, X, QS=None, n_int=1000, rtol=1.49e-05, atol=1.49e-08)[source]

Generalised Linear Gaussian Processes implementation.

It implements inference over GLMMs via the Expectation Propagation [Min01] algorithm. It currently supports the "Bernoulli", "Probit", "Binomial", and "Poisson" likelihoods. (For heterogeneous Normal likelihood, please refer to glimix_core.glmm.GLMMNormal for a closed-form inference.)

Parameters:
  • y (array_like) – Outcome variable.
  • lik (tuple) – Likelihood definition. The first item is one of the following likelihood names: "Bernoulli", "Binomial", "Normal", and "Poisson". For Binomial, the second item is an array of outcomes.
  • X (array_like) – Covariates.
  • QS (tuple) – Economic eigendecomposition.

Example

>>> from numpy import dot, sqrt, zeros
>>> from numpy.random import RandomState
>>>
>>> from numpy_sugar.linalg import economic_qs
>>>
>>> from glimix_core.glmm import GLMMExpFam
>>>
>>> random = RandomState(0)
>>> nsamples = 10
>>>
>>> X = random.randn(nsamples, 2)
>>> G = random.randn(nsamples, 100)
>>> K = dot(G, G.T)
>>> ntrials = random.randint(1, 100, nsamples)
>>> z = dot(G, random.randn(100)) / sqrt(100)
>>>
>>> successes = zeros(len(ntrials), int)
>>> for i in range(len(ntrials)):
...    successes[i] = sum(z[i] + 0.2 * random.randn(ntrials[i]) > 0)
>>>
>>> QS = economic_qs(K)
>>>
>>> glmm = GLMMExpFam(successes, ('binomial', ntrials), X, QS)
>>> print('Before: %.2f' % glmm.lml())
Before: -16.40
>>> glmm.fit(verbose=False)
>>> print('After: %.2f' % glmm.lml())
After: -13.43
beta

Fixed-effect sizes.

Returns:\(\boldsymbol\beta\).
Return type:numpy.ndarray
copy()

Create a copy of this object.

covariance()[source]

Covariance of the prior.

Returns:\(v_0 \mathrm K + v_1 \mathrm I\).
Return type:numpy.ndarray
delta

Get or set the ratio of variance between K and I.

Returns:\(\delta\).
Return type:float
fit(verbose=True, factr=100000.0, pgtol=1e-07)[source]

Maximise the marginal likelihood.

Parameters:
  • verbose (bool) – True for progress output; False otherwise. Defaults to True.
  • factr (float, optional) – The iteration stops when (f^k - f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= factr * eps, where eps is the machine precision.
  • pgtol (float, optional) – The iteration will stop when max{|proj g_i | i = 1, ..., n} <= pgtol where pg_i is the i-th component of the projected gradient.

Notes

Please, refer to scipy.optimize.fmin_l_bfgs_b() for further information about factr and pgtol.

fix(var_name)[source]

Prevent a variable to be adjusted.

Parameters:var_name (str) – Variable name.
gradient()[source]

Gradient of the log of the marginal likelihood.

Returns:Map between variables to their gradient values.
Return type:dict
isfixed(var_name)

Return whether a variable it is fixed or not.

Parameters:var_name (str) – Variable name.
lml()

Log of the marginal likelihood.

Returns:\(\log p(\mathbf y)\)
Return type:float
mean()

Mean of the prior.

Returns:\(\mathrm X\boldsymbol\beta\).
Return type:numpy.ndarray
posteriori_covariance()

Covariance of the estimated posteriori.

posteriori_mean()

Mean of the estimated posteriori.

This is also the maximum a posteriori estimation of the latent variable.

scale

Get or set the overall variance.

Returns:\(s\).
Return type:float
unfix(var_name)[source]

Let a variable be adjusted.

Parameters:var_name (str) – Variable name.
v0

First variance.

Returns:\(v_0 = s (1 - \delta)\)
Return type:float
v1

Second variance.

Returns:\(v_1 = s \delta\)
Return type:float

Heterogeneous Normal likelihood

class glimix_core.glmm.GLMMNormal(eta, tau, X, QS=None)[source]

LMM with heterogeneous Normal likelihoods.

We model

\[\tilde{\boldsymbol\mu} \sim \mathcal N(\mathrm X\boldsymbol\beta, v_0 \mathrm K + v_1 \mathrm I + \tilde{\Sigma}),\]

where \(\tilde{\boldsymbol\mu}\) and \(\tilde{\Sigma}\) are typically given by EP approximation to non-Normal likelihood (please, refer to glimix_core.expfam.GLMMExpFam). If that is the case, \(\tilde{\Sigma}\) is a diagonal matrix with non-negative values. Those EP parameters are given to this class via their natural forms: eta and tau.

Parameters:
  • eta (array_like) – \([\tilde{\mu}_0\tilde{\sigma}^{-2}_0 \quad \tilde{\mu}_1\tilde{\sigma}^{-2}_1 \quad\cdots\quad \tilde{\mu}_n\tilde{\sigma}^{-2}_n]\).
  • tau (array_like) – \([\tilde{\sigma}^{-2}_0\quad\tilde{\sigma}^{-2}_1 \quad\cdots\quad \tilde{\sigma}^{-2}_n]\).
  • X (array_like) – Covariates.
  • QS (tuple) – Economic eigendecomposition of \(\mathrm K\).
beta

Fixed-effect sizes.

Returns:\(\boldsymbol\beta\).
Return type:numpy.ndarray
copy()

Create a copy of this object.

covariance()

Covariance of the prior.

Returns:\(v_0 \mathrm K + v_1 \mathrm I\).
Return type:numpy.ndarray
delta

Get or set the ratio of variance between K and I.

Returns:\(\delta\).
Return type:float
fit(verbose=True, factr=100000.0, pgtol=1e-07)

Maximise the marginal likelihood.

Parameters:
  • verbose (bool) – True for progress output; False otherwise. Defaults to True.
  • factr (float, optional) – The iteration stops when (f^k - f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= factr * eps, where eps is the machine precision.
  • pgtol (float, optional) – The iteration will stop when max{|proj g_i | i = 1, ..., n} <= pgtol where pg_i is the i-th component of the projected gradient.

Notes

Please, refer to scipy.optimize.fmin_l_bfgs_b() for further information about factr and pgtol.

fix(var_name)[source]

Prevent a variable to be adjusted.

Parameters:var_name (str) – Variable name.
get_fast_scanner()[source]

Return glimix_core.lmm.FastScanner for the current delta.

gradient()[source]

Evaluate the gradient at the args point.

Parameters:args (tuple) – Point at the gradient evaluation. The length of this tuple() is defined by the user.
Returns:Map between variables to their gradient values.
Return type:dict
isfixed(var_name)

Return whether a variable it is fixed or not.

Parameters:var_name (str) – Variable name.
lml()

Log of the marginal likelihood.

Returns:\(\log p(\mathbf y)\)
Return type:float
mean()

Mean of the prior.

Returns:\(\mathrm X\boldsymbol\beta\).
Return type:numpy.ndarray
posteriori_covariance()

Covariance of the estimated posteriori.

posteriori_mean()

Mean of the estimated posteriori.

This is also the maximum a posteriori estimation of the latent variable.

scale

Get or set the overall variance.

Returns:\(s\).
Return type:float
unfix(var_name)[source]

Let a variable be adjusted.

Parameters:var_name (str) – Variable name.
v0

First variance.

Returns:\(v_0 = s (1 - \delta)\)
Return type:float
v1

Second variance.

Returns:\(v_1 = s \delta\)
Return type:float

References

[1]Maximum likelihood estimation. (2017, September 9). In Wikipedia, The Free Encyclopedia. Retrieved 14:48, September 22, 2017, from https://en.wikipedia.org/w/index.php?title=Maximum_likelihood_estimation&oldid=799704694
[2]Minka, T. P. (2001, August). Expectation propagation for approximate Bayesian inference. In Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence (pp. 362-369). Morgan Kaufmann Publishers Inc..
[3]Wikipedia contributors. (2018, June 21). Generalized linear model. In Wikipedia, The Free Encyclopedia. Retrieved 09:45, August 6, 2018, from https://en.wikipedia.org/w/index.php?title=Generalized_linear_model&oldid=846910075#Link_function
[4]Wikipedia contributors. (2018, June 29). Exponential family. In Wikipedia, The Free Encyclopedia. Retrieved 09:48, August 6, 2018, from https://en.wikipedia.org/w/index.php?title=Exponential_family&oldid=848114709