Generalised Linear Mixed Models

Introduction

A GLMM can be described in two parts. The first part consists in the latent variable

\[\mathbf z = \mathrm X\boldsymbol\beta + \mathrm G\mathbf u + \boldsymbol\epsilon,\]

where \(\mathbf u \sim \mathcal N(\mathbf 0, v_0\mathrm I_k)\) is a vector of random effects and \(\epsilon_i\) are iid Normal random variables with variance \(v_1\) each. The second part connects the latent variable to the observed one:

\[y_i ~|~ z_i \sim \text{ExpFam}(\mu_i = g(z_i)),\]

where \(g(\cdot)\) is a link function 3 and \(\text{ExpFam}(\cdot)\) is an exponential-family distribution 4 with mean \(\mu_i\). The marginal likelihood is thus given by

\[p(\mathbf y) = \int \prod_i \text{ExpFam}(y_i ~|~ \mu_i = g(z_i)) \mathcal N(\mathbf z ~|~ \mathrm X\boldsymbol\beta, v_0\mathrm G\mathrm G^{\intercal} + v_1\mathrm I_n) \mathrm d\mathbf z\]

We use \(\mathrm K\) for refering to \(\mathrm G\mathrm G^{\intercal}\) when appropriate. The module glimix_core.glmm implements two algorithms for parameter fitting via Maximum Likelihood 1: Expectation Propagation approximation 2 when the likelihood is not normally distributed (refer to GLMMExpFam); a closed-form solution otherwise (refer to GLMMNormal).

Exponential family likelihood

class glimix_core.glmm.GLMMExpFam(y, lik, X, QS=None, n_int=1000, rtol=1.49e-05, atol=1.49e-08)[source]

Generalised Linear Gaussian Processes implementation.

It implements inference over GLMMs via the Expectation Propagation [Min01] algorithm. It currently supports the "Bernoulli", "Probit", "Binomial", and "Poisson" likelihoods. (For heterogeneous Normal likelihood, please refer to glimix_core.glmm.GLMMNormal for a closed-form inference.)

Parameters
  • y (array_like) – Outcome variable.

  • lik (tuple) – Likelihood definition. The first item is one of the following likelihood names: "Bernoulli", "Binomial", "Normal", and "Poisson". For Binomial, the second item is an array of outcomes.

  • X (array_like) – Covariates.

  • QS (tuple) – Economic eigendecomposition.

Example

>>> from numpy import dot, sqrt, zeros
>>> from numpy.random import RandomState
>>>
>>> from numpy_sugar.linalg import economic_qs
>>>
>>> from glimix_core.glmm import GLMMExpFam
>>>
>>> random = RandomState(0)
>>> nsamples = 10
>>>
>>> X = random.randn(nsamples, 2)
>>> G = random.randn(nsamples, 100)
>>> K = dot(G, G.T)
>>> ntrials = random.randint(1, 100, nsamples)
>>> z = dot(G, random.randn(100)) / sqrt(100)
>>>
>>> successes = zeros(len(ntrials), int)
>>> for i in range(len(ntrials)):
...    successes[i] = sum(z[i] + 0.2 * random.randn(ntrials[i]) > 0)
>>>
>>> QS = economic_qs(K)
>>>
>>> glmm = GLMMExpFam(successes, ('binomial', ntrials), X, QS)
>>> print('Before: %.2f' % glmm.lml())
Before: -16.40
>>> glmm.fit(verbose=False)
>>> print('After: %.2f' % glmm.lml())
After: -13.43
property beta

Fixed-effect sizes.

Returns

\(\boldsymbol\beta\).

Return type

numpy.ndarray

copy()

Create a copy of this object.

covariance()[source]

Covariance of the prior.

Returns

\(v_0 \mathrm K + v_1 \mathrm I\).

Return type

numpy.ndarray

property delta

Get or set the ratio of variance between K and I.

Returns

\(\delta\).

Return type

float

fit(verbose=True, factr=100000.0, pgtol=1e-07)[source]

Maximise the marginal likelihood.

Parameters
  • verbose (bool) – True for progress output; False otherwise. Defaults to True.

  • factr (float, optional) – The iteration stops when (f^k - f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= factr * eps, where eps is the machine precision.

  • pgtol (float, optional) – The iteration will stop when max{|proj g_i | i = 1, ..., n} <= pgtol where pg_i is the i-th component of the projected gradient.

Notes

Please, refer to scipy.optimize.fmin_l_bfgs_b() for further information about factr and pgtol.

fix(var_name)[source]

Prevent a variable to be adjusted.

Parameters

var_name (str) – Variable name.

gradient()[source]

Gradient of the log of the marginal likelihood.

Returns

Map between variables to their gradient values.

Return type

dict

lml()

Log of the marginal likelihood.

Returns

\(\log p(\mathbf y)\)

Return type

float

mean()

Mean of the prior.

Returns

\(\mathrm X\boldsymbol\beta\).

Return type

numpy.ndarray

property name

Name of this function.

posteriori_covariance()[source]

Covariance of the estimated posteriori.

posteriori_mean()[source]

Mean of the estimated posteriori.

This is also the maximum a posteriori estimation of the latent variable.

property scale

Get or set the overall variance.

Returns

\(s\).

Return type

float

unfix(var_name)[source]

Let a variable be adjusted.

Parameters

var_name (str) – Variable name.

property v0

First variance.

Returns

\(v_0 = s (1 - \delta)\)

Return type

float

property v1

Second variance.

Returns

\(v_1 = s \delta\)

Return type

float

Heterogeneous Normal likelihood

class glimix_core.glmm.GLMMNormal(eta, tau, X, QS=None)[source]

LMM with heterogeneous Normal likelihoods.

We model

\[\tilde{\boldsymbol\mu} \sim \mathcal N(\mathrm X\boldsymbol\beta, v_0 \mathrm K + v_1 \mathrm I + \tilde{\Sigma}),\]

where \(\tilde{\boldsymbol\mu}\) and \(\tilde{\Sigma}\) are typically given by EP approximation to non-Normal likelihood (please, refer to glimix_core.expfam.GLMMExpFam). If that is the case, \(\tilde{\Sigma}\) is a diagonal matrix with non-negative values. Those EP parameters are given to this class via their natural forms: eta and tau.

Parameters
  • eta (array_like) – \([\tilde{\mu}_0\tilde{\sigma}^{-2}_0 \quad \tilde{\mu}_1\tilde{\sigma}^{-2}_1 \quad\cdots\quad \tilde{\mu}_n\tilde{\sigma}^{-2}_n]\).

  • tau (array_like) – \([\tilde{\sigma}^{-2}_0\quad\tilde{\sigma}^{-2}_1 \quad\cdots\quad \tilde{\sigma}^{-2}_n]\).

  • X (array_like) – Covariates.

  • QS (tuple) – Economic eigendecomposition of \(\mathrm K\).

property beta

Fixed-effect sizes.

Returns

\(\boldsymbol\beta\).

Return type

numpy.ndarray

copy()

Create a copy of this object.

covariance()

Covariance of the prior.

Returns

\(v_0 \mathrm K + v_1 \mathrm I\).

Return type

numpy.ndarray

property delta

Get or set the ratio of variance between K and I.

Returns

\(\delta\).

Return type

float

fit(verbose=True, factr=100000.0, pgtol=1e-07)

Maximise the marginal likelihood.

Parameters
  • verbose (bool) – True for progress output; False otherwise. Defaults to True.

  • factr (float, optional) – The iteration stops when (f^k - f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= factr * eps, where eps is the machine precision.

  • pgtol (float, optional) – The iteration will stop when max{|proj g_i | i = 1, ..., n} <= pgtol where pg_i is the i-th component of the projected gradient.

Notes

Please, refer to scipy.optimize.fmin_l_bfgs_b() for further information about factr and pgtol.

fix(var_name)[source]

Prevent a variable to be adjusted.

Parameters

var_name (str) – Variable name.

get_fast_scanner()[source]

Return glimix_core.lmm.FastScanner for the current delta.

lml()

Log of the marginal likelihood.

Returns

\(\log p(\mathbf y)\)

Return type

float

mean()

Mean of the prior.

Returns

\(\mathrm X\boldsymbol\beta\).

Return type

numpy.ndarray

property name

Name of this function.

property scale

Get or set the overall variance.

Returns

\(s\).

Return type

float

unfix(var_name)[source]

Let a variable be adjusted.

Parameters

var_name (str) – Variable name.

property v0

First variance.

Returns

\(v_0 = s (1 - \delta)\)

Return type

float

property v1

Second variance.

Returns

\(v_1 = s \delta\)

Return type

float

References

1

Maximum likelihood estimation. (2017, September 9). In Wikipedia, The Free Encyclopedia. Retrieved 14:48, September 22, 2017, from https://en.wikipedia.org/w/index.php?title=Maximum_likelihood_estimation&oldid=799704694

2

Minka, T. P. (2001, August). Expectation propagation for approximate Bayesian inference. In Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence (pp. 362-369). Morgan Kaufmann Publishers Inc..

3

Wikipedia contributors. (2018, June 21). Generalized linear model. In Wikipedia, The Free Encyclopedia. Retrieved 09:45, August 6, 2018, from https://en.wikipedia.org/w/index.php?title=Generalized_linear_model&oldid=846910075#Link_function

4

Wikipedia contributors. (2018, June 29). Exponential family. In Wikipedia, The Free Encyclopedia. Retrieved 09:48, August 6, 2018, from https://en.wikipedia.org/w/index.php?title=Exponential_family&oldid=848114709