Generalised Linear Mixed Models¶

Introduction¶

A GLMM can be described in two parts. The first part consists in the latent variable

\[\mathbf z = \mathrm X\boldsymbol\beta + \mathrm G\mathbf u + \boldsymbol\epsilon,\]

where \(\mathbf u \sim \mathcal N(\mathbf 0, v_0\mathrm I_k)\) is a vector of random effects and \(\epsilon_i\) are iid Normal random variables with variance \(v_1\) each. The second part connects the latent variable to the observed one:

\[y_i ~|~ z_i \sim \text{ExpFam}(\mu_i = g(z_i)),\]

where \(g(\cdot)\) is a link function 3 and \(\text{ExpFam}(\cdot)\) is an exponential-family distribution 4 with mean \(\mu_i\). The marginal likelihood is thus given by

\[p(\mathbf y) = \int \prod_i \text{ExpFam}(y_i ~|~ \mu_i = g(z_i)) \mathcal N(\mathbf z ~|~ \mathrm X\boldsymbol\beta, v_0\mathrm G\mathrm G^{\intercal} + v_1\mathrm I_n) \mathrm d\mathbf z\]

We use \(\mathrm K\) for refering to \(\mathrm G\mathrm G^{\intercal}\) when appropriate. The module glimix_core.glmm implements two algorithms for parameter fitting via Maximum Likelihood 1: Expectation Propagation approximation 2 when the likelihood is not normally distributed (refer to GLMMExpFam); a closed-form solution otherwise (refer to GLMMNormal).

Exponential family likelihood¶

class glimix_core.glmm.GLMMExpFam(y, lik, X, QS=None, n_int=1000, rtol=1.49e-05, atol=1.49e-08)[source]¶

Generalised Linear Gaussian Processes implementation.

It implements inference over GLMMs via the Expectation Propagation [Min01] algorithm. It currently supports the "Bernoulli", "Probit", "Binomial", and "Poisson" likelihoods. (For heterogeneous Normal likelihood, please refer to glimix_core.glmm.GLMMNormal for a closed-form inference.)

Parameters

y (array_like) – Outcome variable.
lik (tuple) – Likelihood definition. The first item is one of the following likelihood names: "Bernoulli", "Binomial", "Normal", and "Poisson". For Binomial, the second item is an array of outcomes.
X (array_like) – Covariates.
QS (tuple) – Economic eigendecomposition.

Example

>>> from numpy import dot, sqrt, zeros
>>> from numpy.random import RandomState
>>>
>>> from numpy_sugar.linalg import economic_qs
>>>
>>> from glimix_core.glmm import GLMMExpFam
>>>
>>> random = RandomState(0)
>>> nsamples = 10
>>>
>>> X = random.randn(nsamples, 2)
>>> G = random.randn(nsamples, 100)
>>> K = dot(G, G.T)
>>> ntrials = random.randint(1, 100, nsamples)
>>> z = dot(G, random.randn(100)) / sqrt(100)
>>>
>>> successes = zeros(len(ntrials), int)
>>> for i in range(len(ntrials)):
...    successes[i] = sum(z[i] + 0.2 * random.randn(ntrials[i]) > 0)
>>>
>>> QS = economic_qs(K)
>>>
>>> glmm = GLMMExpFam(successes, ('binomial', ntrials), X, QS)
>>> print('Before: %.2f' % glmm.lml())
Before: -16.40
>>> glmm.fit(verbose=False)
>>> print('After: %.2f' % glmm.lml())
After: -13.43

property beta¶

Fixed-effect sizes.

Returns: \(\boldsymbol\beta\).
Return type: numpy.ndarray

copy()¶: Create a copy of this object.

covariance()[source]¶

Covariance of the prior.

Returns: \(v_0 \mathrm K + v_1 \mathrm I\).
Return type: numpy.ndarray

property delta¶

Get or set the ratio of variance between K and I.

Returns: \(\delta\).
Return type: float

fit(verbose=True, factr=100000.0, pgtol=1e-07)[source]¶

Maximise the marginal likelihood.

Parameters

verbose (bool) – True for progress output; False otherwise. Defaults to True.
factr (float, optional) – The iteration stops when (f^k - f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= factr * eps, where eps is the machine precision.
pgtol (float, optional) – The iteration will stop when max{|proj g_i | i = 1, ..., n} <= pgtol where pg_i is the i-th component of the projected gradient.

Notes

Please, refer to scipy.optimize.fmin_l_bfgs_b() for further information about factr and pgtol.

fix(var_name)[source]¶

Prevent a variable to be adjusted.

Parameters: var_name (str) – Variable name.

gradient()[source]¶

Gradient of the log of the marginal likelihood.

Returns: Map between variables to their gradient values.
Return type: dict

lml()¶

Log of the marginal likelihood.

Returns: \(\log p(\mathbf y)\)
Return type: float

mean()¶

Mean of the prior.

Returns: \(\mathrm X\boldsymbol\beta\).
Return type: numpy.ndarray

property name¶: Name of this function.

posteriori_covariance()[source]¶: Covariance of the estimated posteriori.

posteriori_mean()[source]¶

Mean of the estimated posteriori.

This is also the maximum a posteriori estimation of the latent variable.

property scale¶

Get or set the overall variance.

Returns: \(s\).
Return type: float

unfix(var_name)[source]¶

Let a variable be adjusted.

Parameters: var_name (str) – Variable name.

property v0¶

First variance.

Returns: \(v_0 = s (1 - \delta)\)
Return type: float

property v1¶

Second variance.

Returns: \(v_1 = s \delta\)
Return type: float

Heterogeneous Normal likelihood¶

class glimix_core.glmm.GLMMNormal(eta, tau, X, QS=None)[source]¶

LMM with heterogeneous Normal likelihoods.

We model

\[\tilde{\boldsymbol\mu} \sim \mathcal N(\mathrm X\boldsymbol\beta, v_0 \mathrm K + v_1 \mathrm I + \tilde{\Sigma}),\]

where \(\tilde{\boldsymbol\mu}\) and \(\tilde{\Sigma}\) are typically given by EP approximation to non-Normal likelihood (please, refer to glimix_core.expfam.GLMMExpFam). If that is the case, \(\tilde{\Sigma}\) is a diagonal matrix with non-negative values. Those EP parameters are given to this class via their natural forms: eta and tau.

Parameters

eta (array_like) – \([\tilde{\mu}_0\tilde{\sigma}^{-2}_0 \quad \tilde{\mu}_1\tilde{\sigma}^{-2}_1 \quad\cdots\quad \tilde{\mu}_n\tilde{\sigma}^{-2}_n]\).
tau (array_like) – \([\tilde{\sigma}^{-2}_0\quad\tilde{\sigma}^{-2}_1 \quad\cdots\quad \tilde{\sigma}^{-2}_n]\).
X (array_like) – Covariates.
QS (tuple) – Economic eigendecomposition of \(\mathrm K\).

property beta¶

Fixed-effect sizes.

Returns: \(\boldsymbol\beta\).
Return type: numpy.ndarray

copy()¶: Create a copy of this object.

covariance()¶

Covariance of the prior.

Returns: \(v_0 \mathrm K + v_1 \mathrm I\).
Return type: numpy.ndarray

property delta¶

Get or set the ratio of variance between K and I.

Returns: \(\delta\).
Return type: float

fit(verbose=True, factr=100000.0, pgtol=1e-07)¶

Maximise the marginal likelihood.

Parameters

verbose (bool) – True for progress output; False otherwise. Defaults to True.
factr (float, optional) – The iteration stops when (f^k - f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= factr * eps, where eps is the machine precision.
pgtol (float, optional) – The iteration will stop when max{|proj g_i | i = 1, ..., n} <= pgtol where pg_i is the i-th component of the projected gradient.

Notes

Please, refer to scipy.optimize.fmin_l_bfgs_b() for further information about factr and pgtol.

fix(var_name)[source]¶

Prevent a variable to be adjusted.

Parameters: var_name (str) – Variable name.

get_fast_scanner()[source]¶: Return glimix_core.lmm.FastScanner for the current delta.

lml()¶

Log of the marginal likelihood.

Returns: \(\log p(\mathbf y)\)
Return type: float

mean()¶

Mean of the prior.

Returns: \(\mathrm X\boldsymbol\beta\).
Return type: numpy.ndarray

property name¶: Name of this function.

property scale¶

Get or set the overall variance.

Returns: \(s\).
Return type: float

unfix(var_name)[source]¶

Let a variable be adjusted.

Parameters: var_name (str) – Variable name.

property v0¶

First variance.

Returns: \(v_0 = s (1 - \delta)\)
Return type: float

property v1¶

Second variance.

Returns: \(v_1 = s \delta\)
Return type: float

References

1: Maximum likelihood estimation. (2017, September 9). In Wikipedia, The Free Encyclopedia. Retrieved 14:48, September 22, 2017, from https://en.wikipedia.org/w/index.php?title=Maximum_likelihood_estimation&oldid=799704694
2: Minka, T. P. (2001, August). Expectation propagation for approximate Bayesian inference. In Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence (pp. 362-369). Morgan Kaufmann Publishers Inc..
3: Wikipedia contributors. (2018, June 21). Generalized linear model. In Wikipedia, The Free Encyclopedia. Retrieved 09:45, August 6, 2018, from https://en.wikipedia.org/w/index.php?title=Generalized_linear_model&oldid=846910075#Link_function
4: Wikipedia contributors. (2018, June 29). Exponential family. In Wikipedia, The Free Encyclopedia. Retrieved 09:48, August 6, 2018, from https://en.wikipedia.org/w/index.php?title=Exponential_family&oldid=848114709