Linear Mixed Models¶

Linear mixed models (LMMs) are a generalisation of linear models 1 to allow the outcome to be described as a summation of both fixed and random effects 2. LMM inference is implemented by the glimix_core.lmm module and is described here.

Introduction¶

A LMM can be described as

𝐲 = X𝜷 + G𝐮 + 𝛜,

where 𝐮 ∼ 𝓝(𝟎, v₀I) is a vector of random effects and 𝛜 are iid Normal random variables with zero-mean and variance v₁ each. The outcome-vector is thus distributed according to

𝐲 ∼ 𝓝(X𝜷, v₀GGᵀ + v₁I).

The LMM class provides a FastLMM 3 implementation to perform inference over the variance parameters v₀ and v₁ and over the vector 𝜷 of fixed-effect sizes. Let K = GGᵀ and observe that K can be any symmetric positive definite matrix. An instance of this class is created by providing the outcome 𝐲, the covariates X, and the covariance K via its economic eigendecomposition. Here is an example:

>>> from numpy import ones
>>> from numpy_sugar.linalg import economic_qs_linear
>>>
>>> from glimix_core.lmm import LMM
>>>
>>> G = [[1, 2], [3, -1], [1.1, 0.5], [0.5, -0.4]]
>>> QS = economic_qs_linear(G)
>>> X = ones((4, 1))
>>> y = [-1, 2, 0.3, 0.5]
>>> lmm = LMM(y, X, QS)
>>> lmm.fit(verbose=False)
>>> lmm.lml()  
-2.2726234086180557

The method LMM.fit() is called to optimise the marginal likelihood over the fixed-effect sizes 𝜷 and over the variances v₀ and v₁. The resulting values for the above inference are:

>>> lmm.beta  
array([0.06646503])
>>> lmm.v0  
0.33736446158226896
>>> lmm.v1  
0.012503600451739165

We also provide FastScanner, an approximated LMM implementation for performing an even faster inference across several (millions, for example) covariates independently. More detail about this approach is given in the Association scan section.

Multi-Trait¶

This package also provides a variant of LMM that models multiple outcomes (or traits) of the same set of samples 4. Let p be the number of traits. The outcome matrix Y is the concatenation of p vectors:

Y = [𝐲₀ 𝐲₁ ... 𝐲ₚ].

The mean definition will involve three matrices:

M = (A ⊗ X) vec(B),

where vec(·) stacks the columns of the input matrix into a single-column matrix. B is a c×p matrix of effect sizes for which c is the number of covariates. A is a p×p design matrix that determines the covariance between the traits over the mean vector. X is a n×p design matrix of covariates.

The covariance matrix is defined by

K = C₀ ⊗ GGᵀ + C₁ ⊗ I.

C₀ and C₁ are p×p symmetric matrices whose values will be optimized. GGᵀ gives the covariance between samples, while (C₀ ⊗ GGᵀ) gives the covariance between samples when traits are taken into account.

The outcome, mean, and covariance-matrix together define the distribution

vec(Y) ~ N((A ⊗ X) vec(B), K = C₀ ⊗ GGᵀ + C₁ ⊗ I).

The parameters of the multi-trait LMM to be fit via maximum likelihood are the matrices B, C₀, and C₁.

>>> from numpy.random import RandomState
>>> from glimix_core.lmm import Kron2Sum
>>>
>>> random = RandomState(0)
>>> n = 15
>>> p = 2
>>> c = 3
>>> Y = random.randn(n, p)
>>> A = random.randn(p, p)
>>> A = A.dot(A.T)
>>> X = random.randn(n, c)
>>> G = random.randn(n, 4)
>>>
>>> mlmm = Kron2Sum(Y, A, X, G, restricted=False)
>>> mlmm.fit(verbose=False)
>>> mlmm.lml()  
-42.08309530985927

We also provide KronFastScanner for performing an even faster inference across several (millions, for example) covariates independently. Please, follow the next section for details.

Association scan¶

Let X be a n×c matrix, Mⱼ a n×mⱼ matrix for the j-th candidate set, and 𝐲 an array of outcome:

𝐲 ∼ 𝓝(X𝜷ⱼ + Mⱼ𝛂ⱼ, sⱼ(v₀GGᵀ + v₁I))

The parameters 𝜷ⱼ, 𝛂ⱼ, and sⱼ are fit via maximum likelihood, while the remaining parameters v₀ and v₁ are held fixed. The v₀ and v₁ values are first found by applying LMM.

>>> scanner = lmm.get_fast_scanner()
>>> M = [[1.5, 0.1], [-0.2, 0.4], [0.0, 1.0], [-3.4, 0.6]]
>>> r = scanner.scan(M)
>>> r["lml"]
-0.7322976913217882
>>> r["effsizes0"]
array([-0.42323051])
>>> r["effsizes1"]
array([-0.05913491,  0.37079162])
>>> r["scale"]
0.4629376687687552

For the null case (i.e., when there is not candidate set Mⱼ), the log of the marginal likelihood and the values of 𝜷 and s can be found as follows.

>>> scanner.null_lml()  
-2.272623408618055
>>> scanner.null_beta  
array([0.06646503])
>>> scanner.null_scale  
1.0

We also provide a fast scanner for the multi-trait case, KronFastScanner. Its model is given by

vec(Y) ∼ 𝓝(vec(Y) | (A ⊗ X)vec(𝚩ⱼ) + (Aⱼ ⊗ Xⱼ)vec(𝚨ⱼ), sⱼ(C₀ ⊗ GGᵀ + C₁ ⊗ I)).

As before, the parameters C₀ and C₁ are set to the values found by Kron2Sum. A candidate set is defined by providing the matrices Aⱼ and Xⱼ. The parameters 𝚩ⱼ, 𝚨ⱼ, and sⱼ are found via maximum likelihood.

>>> mscanner = mlmm.get_fast_scanner()
>>> A = random.randn(2, n)
>>> X = random.randn(n, 3)
>>> r = mscanner.scan(A, X)
>>> r["lml"]
-42.74668875515792
>>> r["effsizes0"][0, 0]
0.0027782236846561684
>>> r["effsizes1"][0, 0]
-0.012738296905015198
>>> r["scale"]
1.0452327204999015

API¶

`FastScanner`(y, X, QS, v)	Approximated fast inference over several covariates.
`Kron2Sum`(Y, A, X, G[, rank, restricted])	LMM for multi-traits fitted via maximum likelihood.
`KronFastScanner`(Y, A, X, G, terms)	Approximated fast inference over several covariates.
`LMM`(y, X[, QS, restricted])	Fast Linear Mixed Models inference via maximum likelihood.

References

1: Wikipedia contributors. (2018, May 22). Linear model. In Wikipedia, The Free Encyclopedia. Retrieved 16:00, August 5, 2018, from https://en.wikipedia.org/w/index.php?title=Linear_model&oldid=842479751.
2: Introduction to linear mixed models. UCLA: Institute for Digital Research and Education. Retrieved from August 5, 2018, from https://stats.idre.ucla.edu/other/mult-pkg/introduction-to-linear-mixed-models/.
3: Lippert, Christoph, Listgarten, Jennifer, Liu, Ying, Kadie, Carl M, Davidson, Robert I & Heckerman, David (2011). FaST linear mixed models for genome-wide association studies. Nature methods, 8, 833-835.
4: Casale, F. P., Horta, D., Rakitsch, B., & Stegle, O. (2017). Joint genetic analysis using variant sets reveals polygenic gene-context interactions. PLoS genetics, 13(4), e1006693.