# Generalised Linear Mixed Models¶

## Introduction¶

A GLMM can be described in two parts. The first part consists in the latent variable

$\mathbf z = \mathrm X\boldsymbol\beta + \mathrm G\mathbf u + \boldsymbol\epsilon,$

where $$\mathbf u \sim \mathcal N(\mathbf 0, v_0\mathrm I_k)$$ is a vector of random effects and $$\epsilon_i$$ are iid Normal random variables with variance $$v_1$$ each. The second part connects the latent variable to the observed one:

$y_i ~|~ z_i \sim \text{ExpFam}(\mu_i = g(z_i)),$

where $$g(\cdot)$$ is a link function  and $$\text{ExpFam}(\cdot)$$ is an exponential-family distribution  with mean $$\mu_i$$. The marginal likelihood is thus given by

$p(\mathbf y) = \int \prod_i \text{ExpFam}(y_i ~|~ \mu_i = g(z_i)) \mathcal N(\mathbf z ~|~ \mathrm X\boldsymbol\beta, v_0\mathrm G\mathrm G^{\intercal} + v_1\mathrm I_n) \mathrm d\mathbf z$

We use $$\mathrm K$$ for refering to $$\mathrm G\mathrm G^{\intercal}$$ when appropriate. The module glimix_core.glmm implements two algorithms for parameter fitting via Maximum Likelihood : Expectation Propagation approximation  when the likelihood is not normally distributed (refer to GLMMExpFam); a closed-form solution otherwise (refer to GLMMNormal).

## Exponential family likelihood¶

class glimix_core.glmm.GLMMExpFam(y, lik, X, QS=None, n_int=1000, rtol=1.49e-05, atol=1.49e-08)[source]

Generalised Linear Gaussian Processes implementation.

It implements inference over GLMMs via the Expectation Propagation [Min01] algorithm. It currently supports the "Bernoulli", "Probit", "Binomial", and "Poisson" likelihoods. (For heterogeneous Normal likelihood, please refer to glimix_core.glmm.GLMMNormal for a closed-form inference.)

Parameters: y (array_like) – Outcome variable. lik (tuple) – Likelihood definition. The first item is one of the following likelihood names: "Bernoulli", "Binomial", "Normal", and "Poisson". For Binomial, the second item is an array of outcomes. X (array_like) – Covariates. QS (tuple) – Economic eigendecomposition.

Example

>>> from numpy import dot, sqrt, zeros
>>> from numpy.random import RandomState
>>>
>>> from numpy_sugar.linalg import economic_qs
>>>
>>> from glimix_core.glmm import GLMMExpFam
>>>
>>> random = RandomState(0)
>>> nsamples = 10
>>>
>>> X = random.randn(nsamples, 2)
>>> G = random.randn(nsamples, 100)
>>> K = dot(G, G.T)
>>> ntrials = random.randint(1, 100, nsamples)
>>> z = dot(G, random.randn(100)) / sqrt(100)
>>>
>>> successes = zeros(len(ntrials), int)
>>> for i in range(len(ntrials)):
...    successes[i] = sum(z[i] + 0.2 * random.randn(ntrials[i]) > 0)
>>>
>>> QS = economic_qs(K)
>>>
>>> glmm = GLMMExpFam(successes, ('binomial', ntrials), X, QS)
>>> print('Before: %.2f' % glmm.lml())
Before: -16.40
>>> glmm.fit(verbose=False)
>>> print('After: %.2f' % glmm.lml())
After: -13.43

beta

Fixed-effect sizes.

Returns: $$\boldsymbol\beta$$. numpy.ndarray
copy()

Create a copy of this object.

covariance()[source]

Covariance of the prior.

Returns: $$v_0 \mathrm K + v_1 \mathrm I$$. numpy.ndarray
delta

Get or set the ratio of variance between K and I.

Returns: $$\delta$$. float
fit(verbose=True, factr=100000.0, pgtol=1e-07)[source]

Maximise the marginal likelihood.

Parameters: verbose (bool) – True for progress output; False otherwise. Defaults to True. factr (float, optional) – The iteration stops when (f^k - f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= factr * eps, where eps is the machine precision. pgtol (float, optional) – The iteration will stop when max{|proj g_i | i = 1, ..., n} <= pgtol where pg_i is the i-th component of the projected gradient.

Notes

Please, refer to scipy.optimize.fmin_l_bfgs_b() for further information about factr and pgtol.

fix(var_name)[source]

Prevent a variable to be adjusted.

Parameters: var_name (str) – Variable name.
gradient()[source]

Gradient of the log of the marginal likelihood.

Returns: Map between variables to their gradient values. dict
isfixed(var_name)

Return whether a variable it is fixed or not.

Parameters: var_name (str) – Variable name.
lml()

Log of the marginal likelihood.

Returns: $$\log p(\mathbf y)$$ float
mean()

Mean of the prior.

Returns: $$\mathrm X\boldsymbol\beta$$. numpy.ndarray
posteriori_covariance()

Covariance of the estimated posteriori.

posteriori_mean()

Mean of the estimated posteriori.

This is also the maximum a posteriori estimation of the latent variable.

scale

Get or set the overall variance.

Returns: $$s$$. float
unfix(var_name)[source]

Parameters: var_name (str) – Variable name.
v0

First variance.

Returns: $$v_0 = s (1 - \delta)$$ float
v1

Second variance.

Returns: $$v_1 = s \delta$$ float

## Heterogeneous Normal likelihood¶

class glimix_core.glmm.GLMMNormal(eta, tau, X, QS=None)[source]

LMM with heterogeneous Normal likelihoods.

We model

$\tilde{\boldsymbol\mu} \sim \mathcal N(\mathrm X\boldsymbol\beta, v_0 \mathrm K + v_1 \mathrm I + \tilde{\Sigma}),$

where $$\tilde{\boldsymbol\mu}$$ and $$\tilde{\Sigma}$$ are typically given by EP approximation to non-Normal likelihood (please, refer to glimix_core.expfam.GLMMExpFam). If that is the case, $$\tilde{\Sigma}$$ is a diagonal matrix with non-negative values. Those EP parameters are given to this class via their natural forms: eta and tau.

Parameters: eta (array_like) – $$[\tilde{\mu}_0\tilde{\sigma}^{-2}_0 \quad \tilde{\mu}_1\tilde{\sigma}^{-2}_1 \quad\cdots\quad \tilde{\mu}_n\tilde{\sigma}^{-2}_n]$$. tau (array_like) – $$[\tilde{\sigma}^{-2}_0\quad\tilde{\sigma}^{-2}_1 \quad\cdots\quad \tilde{\sigma}^{-2}_n]$$. X (array_like) – Covariates. QS (tuple) – Economic eigendecomposition of $$\mathrm K$$.
beta

Fixed-effect sizes.

Returns: $$\boldsymbol\beta$$. numpy.ndarray
copy()

Create a copy of this object.

covariance()

Covariance of the prior.

Returns: $$v_0 \mathrm K + v_1 \mathrm I$$. numpy.ndarray
delta

Get or set the ratio of variance between K and I.

Returns: $$\delta$$. float
fit(verbose=True, factr=100000.0, pgtol=1e-07)

Maximise the marginal likelihood.

Parameters: verbose (bool) – True for progress output; False otherwise. Defaults to True. factr (float, optional) – The iteration stops when (f^k - f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= factr * eps, where eps is the machine precision. pgtol (float, optional) – The iteration will stop when max{|proj g_i | i = 1, ..., n} <= pgtol where pg_i is the i-th component of the projected gradient.

Notes

Please, refer to scipy.optimize.fmin_l_bfgs_b() for further information about factr and pgtol.

fix(var_name)[source]

Prevent a variable to be adjusted.

Parameters: var_name (str) – Variable name.
get_fast_scanner()[source]

Return glimix_core.lmm.FastScanner for the current delta.

gradient()[source]

Evaluate the gradient at the args point.

Parameters: args (tuple) – Point at the gradient evaluation. The length of this tuple() is defined by the user. Map between variables to their gradient values. dict
isfixed(var_name)

Return whether a variable it is fixed or not.

Parameters: var_name (str) – Variable name.
lml()

Log of the marginal likelihood.

Returns: $$\log p(\mathbf y)$$ float
mean()

Mean of the prior.

Returns: $$\mathrm X\boldsymbol\beta$$. numpy.ndarray
posteriori_covariance()

Covariance of the estimated posteriori.

posteriori_mean()

Mean of the estimated posteriori.

This is also the maximum a posteriori estimation of the latent variable.

scale

Get or set the overall variance.

Returns: $$s$$. float
unfix(var_name)[source]

Parameters: var_name (str) – Variable name.
v0

First variance.

Returns: $$v_0 = s (1 - \delta)$$ float
v1

Second variance.

Returns: $$v_1 = s \delta$$ float

References

  Maximum likelihood estimation. (2017, September 9). In Wikipedia, The Free Encyclopedia. Retrieved 14:48, September 22, 2017, from https://en.wikipedia.org/w/index.php?title=Maximum_likelihood_estimation&oldid=799704694
  Minka, T. P. (2001, August). Expectation propagation for approximate Bayesian inference. In Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence (pp. 362-369). Morgan Kaufmann Publishers Inc..
  Wikipedia contributors. (2018, June 21). Generalized linear model. In Wikipedia, The Free Encyclopedia. Retrieved 09:45, August 6, 2018, from https://en.wikipedia.org/w/index.php?title=Generalized_linear_model&oldid=846910075#Link_function
  Wikipedia contributors. (2018, June 29). Exponential family. In Wikipedia, The Free Encyclopedia. Retrieved 09:48, August 6, 2018, from https://en.wikipedia.org/w/index.php?title=Exponential_family&oldid=848114709