Sage Journals: Discover world-class research

Abstract

A new approach to represent P-splines as a mixed model is presented. The corresponding matrices are sparse allowing the new approach can find the optimal values of the penalty parameters in a computationally efficient manner. Whereas the new mixed model P-splines formulation is similar to the original P-splines, a key difference is that the fixed effects are modelled explicitly, and extra constraints are added to the random part of the model. An important feature ensuring that the entire computation is fast is a sparse implementation of the Automated Differentiation of the Cholesky algorithm. It is shown by means of two examples that the new approach is fast compared to existing methods. The methodology has been implemented in the R-package LMMsolver available on CRAN (https://CRAN.R-project.org/package=LMMsolver).

Keywords

Automated differentiation B-splines Cholesky REML

1 Introduction

Penalized regression using B-splines (known as P-splines; Eilers and Marx, 1996, 2021) is computationally efficient, because of the local character of B-splines. The corresponding linear equations are sparse and can be solved quickly. However, the main problem is to find the optimal value for the tuning or penalty parameter. A good way to approach this problem is to use mixed models and restricted maximum likelihood (REML; Patterson and Thompson, 1971). Several methods have been proposed to transform the original P-spline model to a mixed model (Eilers, 1999; Currie and Durban, 2002; Lee and Durbán, 2011). A disadvantage of existing transformations to mixed models is that the local character of the B-splines is lost, which reduces computational efficiency.

The proposed transformations have in common that they start from the P-splines formulation and decompose into a fixed and a random term to obtain a mixed model. The same idea can be used for tensor product P-splines (Currie et al., 2006; Lee and Durbán, 2011; Lee et al., 2013; Rodríguez-Ávarez et al., 2015; 2018; Wood et al., 2013; Wood, 2017). There are many different ways to represent tensor product P-splines as a mixed model; Piepho et al. (2022) give an overview of the different two-dimensional P-splines models.

Here, a new and efficient approach is outlined. In Boer et al., (2020) it was shown that the linear variance (LV) mixed model (Williams, 1986) is equivalent to P-splines with first-degree B-splines and first-order differences. This idea will be generalized here, to show that P-splines with $k$ th-order difference penalties are equivalent to a class of mixed models. The extension to tensor product P-splines is relatively straightforward.

Another property of the LV model is that it is computationally attractive, as the precision matrix is sparse. It will be shown that the P-splines model is equivalent to a class of mixed models with a sparse precision matrix. In this article we will show that the sparse mixed model P-splines are computationally efficient.

The layout of the article is as follows. In Section 2 we will discuss in detail the equivalence between the linear variance mixed model and a special case of P-splines. Section 3 outlines a brief overview of B-splines, and gives some results that will be used to derive the connection between mixed models and P-splines. In Section 4 it will be shown how the idea presented in Section 2 can be generalized to P-splines. Section 5 shows how this can be further extended to mixed model tensor product P-splines. In Section 6 an implementation of the sparse mixed model in the LMMsolver R-package is presented, and its computational efficiency is demonstrated by means of two examples. The article concludes with a brief discussion in Section 7. The R-scripts to reproduce the figures are available in the Supplementary Material.

2 Linear variance model and Whittaker smoother

The Whittaker smoother (Whittaker, 1923) and the Linear Variance (LV) mixed model introduced by Williams (1986) are equivalent, as shown by Boer et al. (2020). Here we will use a new approach to show that the two models are equivalent with the aim to develop a computational efficient method to estimate the optimal penalty parameters using mixed models that can be further extended to P-splines. The Whittaker smoother is a special case of P-splines (Frasso and Eilers, 2015), and originally third-order differences were used by Whittaker (1923). Here we will use a simpler variant of Whittakers proposal, using first-order differences.

Suppose there are $n$ observations at equidistant points $i = 1,2, \dots, n$ . Let $y_{i}$ be the observation at point $i$ . The LV mixed model is given by

y_{i} = μ + u_{i} + e_{i}, i = 1, \dots, n,

(2.1)

where $μ$ is the intercept, the parameters $e_{i}$ represent the residual error and are assumed to be independent and identically distributed, $e_{i} \sim N (0, σ_{e}^{2})$ , where $σ_{e}^{2}$ is the residual variance. The covariances between $u_{i}$ and $u_{j}$ are given by

cov (u_{i}, u_{j}) = α (n - 1 - | i - j |) σ_{u}^{2},

(2.2)

where $α$ is a scaling factor, and $σ_{u}^{2}$ is the variance corresponding to the LV model. Whereas Williams (1986) used $α = 1$ , here we will use $α = \frac{1}{2}$ , to simplify the notation to show the equivalence with the Whittaker smoother. As can be seen from equation (2.2) the covariance decreases linearly with distance between two observation points $i$ and $j$ , and the covariance between first and last observations is equal to zero.

In this article we will formulate mixed models in terms of penalty and precision matrix (the inverse of the covariance matrix), which is a more modern representation (Lindgren et al., 2022). The parameter $λ = σ_{e}^{2} / σ_{u}^{2}$ is called the tuning or penalty parameter (Eilers and Marx, 1996; Hastie et al., 2009). From equations (7) and (A2) in Boer et al. (2020) it follows that the objective function to be minized for equations (2.1) with respect to parameters $μ$ and $u_{i}$ is given by

S_{1} (μ, u_{1}, \dots, u_{n}) = \sum_{i = 1}^{n} {(y_{i} - (μ + u_{i}))}^{2} + λ \sum_{i = 2}^{n} {(Δ u_{i})}^{2} + λ {(\sum_{i = 1}^{n} c_{i} u_{i})}^{2},

(2.3)

where $Δ u_{i} = u_{i} - u_{i - 1}$ is the first-order difference, $c_{1} = c_{n} = 1 / \sqrt{n - 1}$ and $c_{i} = 0$ for $i = 2, \dots, n - 1$ .

To make the connection between the LV model and the Whittaker smoother, we define the following invertible linear transformation:

\begin{array}{l} a_{i} & = μ + u_{i}, i = 1,2, \dots, n, \\ η & = \sum_{i = 1}^{n} c_{i} u_{i}, \sum_{i = 1}^{n} c_{i} \neq 0. \end{array}

(2.4)

Using equation (2.4) and

Δ a_{i} = Δ (μ + u_{i}) = Δ u_{i}

(2.5)

we obtain

S_{2} (a_{1}, \dots, a_{n}, η) = \sum_{i = 1}^{n} {(y_{i} - a_{i})}^{2} + λ \sum_{i = 2}^{n} {(Δ a_{i})}^{2} + λ η^{2} .

(2.6)

Equation (2.6) can be decomposed as $S_{2} (a_{1}, \dots, a_{n}, η) = S_{3} (a_{1}, \dots, a_{n}) + S_{4} (η)$ where

S_{3} (a_{1}, \dots, a_{n}) = \sum_{i = 1}^{n} {(y_{i} - a_{i})}^{2} + λ \sum_{i = 2}^{n} {(Δ a_{i})}^{2}

(2.7)

and $S_{4} (η) = λ η^{2}$ . To find the minimum of $S_{2}$ the functions $S_{3}$ and $S_{4}$ can be minimized separately. The objective function $S_{4}$ has trivial solution $\hat{η} = 0$ . Equation (2.7) is the objective function to be minimized for the Whittaker smoother with first-order differences. This shows that equations (2.3) and (2.7) are equivalent. The LV model is not the only mixed model equivalent to the Whittaker smoother: other values of $c_{i}$ can also be chosen, provided that $\sum_{i} c_{i} \neq 0$ . An example is the Random Walk (RW) mixed model, with $c_{1} = 1$ and $c_{i} = 0$ for $i = 2, \dots, n$ , cf. equation (A3) in Boer et al. (2020).

The LV and the RW mixed models have a sparse precision matrix and they can be solved in a computational efficient way (Boer et al., 2020). This approach can be further extended to show that there are mixed models, with sparse precision matrices, that are equivalent to P-splines. The method can be generalized to tensor product P-splines, as shown in Section 5.

3 B-splines

In this section we will give a brief overview of B-splines; for more details see for example De Boor (1978) and Lyche et al. (2018). Let the domain $[x_{min}, x_{max}]$ , with $x_{min} < x_{max}$ and $x_{min}, x_{max} \in ℝ$ , be divided in $n_{seg}$ intervals, each of length $h = (x_{max} - x_{min}) / n_{seg}$ . The knots for the B-spline basis are given by

κ_{j} = x_{min} - (p + 1) h + j h, j = 1, \dots, n_{seg} + 2 p + 1,

(3.1)

where $p$ is the degree of the B-splines functions, denoted by $B_{j, p} (x)$ . The B-spline functions have local support, that is, $B_{j, p} (x) = 0$ if $x \notin [κ_{j}, κ_{j + p + 1})$ . The number of B-splines is equal to $q = n_{seg} + p$ . Let $g (x)$ be a linear combination of the B-splines

g (x) = \sum_{j = 1}^{q} u_{j} B_{j, p} (x), x \in [x_{min}, x_{max}].

(3.2)

The $k$ th-order derivative of $g (x)$ can be expressed as follows (De Boor, 1978; Eilers and Marx, 1996)

\frac{d g (x)}{d x^{k}} = \frac{1}{h^{k}} \sum_{j = k + 1}^{q} Δ^{k} u_{j} B_{j, p - k} (x), k \leq p, x \in [x_{min}, x_{max}],

(3.3)

where $Δ^{k} u_{j}$ is the $k$ -th order difference, recursively defined as $Δ^{k} u_{j} = Δ^{k - 1} (Δ u_{j})$ .

Let $f (x)$ be a $(k - 1)$ -degree polynomial with coefficients $β_{0}, \dots, β_{k - 1}$ . The representation of $f (x)$ in terms of B-splines is given by

f (x) = \sum_{i = 0}^{k - 1} β_{i} \sum_{j = 1}^{q} ξ_{i, j, p} B_{j, p} (x), k - 1 \leq p, x \in [x_{min}, x_{max}].

(3.4)

where $ξ_{i, j, p}$ are constants. Analytic expressions for $ξ_{i, j, p}$ can be found in Lyche et al. (2018). Taking the $k$ -th order derivative of equation (3.4) and using equation (3.3) gives

Δ^{k} ξ_{i, j, p} = 0 for i < k,

(3.5)

a result that will be used to establish, in Sections 4 and 5, the equivalence of a class of mixed models with P-splines.

The idea of representing polynomials in terms of B-splines can be extended a bit further, and this idea will be used to derive our main result in Section 4. Suppose we are interested in the interpolating $(k - 1)$ -degree polynomial (Press et al., 2002, p. 123) defined by $k$ pairs $(x_{*, i}, y_{*, i})$ . For $k = 1$ we will use $x_{*,1} = (x_{min} + x_{max}) / 2$ . For $k > 1$ the points $x_{*, i}$ are taken equidistant on the interval $[x_{min}, x_{max}]$ :

x_{*, i} = x_{min} + (i - 1) \frac{x_{max} - x_{min}}{k - 1}, i = 1, \dots, k .

(3.6)

Let $B_{*}$ be a $k \times q$ matrix, with $(i, j)$ th entry $B_{j, p} (x_{*, i})$ , and let $G$ be a $q \times k$ matrix with $(j, i)$ th entry $ξ_{i, j, p}$ . The $k \times k$ matrix $M = B_{*} G$ is a Vandermonde matrix

M = B_{*} G = (\begin{matrix} 1 & x_{*,1} & x_{*,1}^{2} & \dots & x_{*,1}^{k - 1} \\ 1 & x_{*,2} & x_{*,2}^{2} & \dots & x_{*,2}^{k - 1} \\ \dots & \dots & \dots & \dots & \dots \\ 1 & x_{*, k} & x_{*, k}^{2} & \dots & x_{*, k}^{k - 1} \end{matrix})

with determinant (Harville, 1998, section 13.6)

| M | = | B_{*} G | = \prod_{1 \leq i < j \leq k} (x_{*, j} - x_{*, i}) \neq 0.

(3.7)

The determinant is unequal to zero, since the points $x_{*, i}$ are distinct.

4 A new formulation for P-splines

Using the B-splines properties described in the previous section, we can generalize the results from Section 2 to P-splines. First we introduce some notation. Let $y = (y_{1}, y_{2}, \dots, y_{n})^{'}$ be the response variable, depending on the variable $x = (x_{1}, x_{2}, \dots, x_{n})^{'}$ with $x_{i} \in [x_{min}, x_{max}]$ for $i = 1, \dots, n$ . Let $B$ be a $n \times q$ matrix, with $(i, j)$ th entry $B_{j, p} (x_{i})$ . The matrix $X$ is defined by $X = B G = [1 | x^{1} | \dots | x^{k - 1}]$ .

In the next step we will define a model equivalent with P-splines, following the same steps as in Section 2. To make this connection more clear, we will use the same subscripts for the objective functions to be minimized, cf. equations (2.3), (2.6), and (2.7). The objective function $S_{1}$ to be minimized is given by

S_{1} (β, u) = | | y - B (G β + u) {| |}^{2} + λ u' D' D u + λ u' B_{*}' B_{*} u,

(4.1)

where $β = (β_{0}, \dots, β_{k - 1})^{'}$ , $u = (u_{1}, \dots, u_{q})^{'}$ , and $D$ is a $(q - k) \times q$ matrix, with $k$ th-order differences. Next, we define the linear transformation

(\begin{matrix} a \\ η \end{matrix}) = (\begin{matrix} I_{q} & G \\ B_{*} & 0 \end{matrix}) (\begin{matrix} u \\ β \end{matrix}) = (\begin{matrix} u + G β \\ B_{*} u \end{matrix}),

(4.2)

with $a = (a_{1}, \dots, a_{q})^{'}$ and $η = (η_{1}, \dots, η_{k})^{'}$ . This transformation is invertible since

|(\begin{matrix} I_{q} & G \\ B_{*} & 0 \end{matrix})| = |0 - B_{*} I_{q} G| = (- {1)}^{k} |B_{*} G| = (- {1)}^{k} |M| \neq 0,

where we use the properties of block matrices in the first step (Harville, 1998, section 13.3), and equation (3.7) in the second and third steps. For the differences we have

D a = D (G β + u) = D u,

(4.3)

using $D G = 0$ , which is equation (3.5) in matrix notation.

By substitution of equation (4.2) into equation (4.1) and using equation (4.3) we obtain

S_{2} (a, η) = | | y - B a {| |}^{2} + λ a' D^{'} D a + λ η' η .

(4.4)

Equation (4.4) can be decomposed as $S_{2} (a, η) = S_{3} (a) + S_{4} (η)$ , where

S_{3} (a) = | | y - B a {| |}^{2} + λ a' D^{'} D a

(4.5)

is the objective function to be minimized for P-splines (Eilers and Marx, 1996), and $S_{4} (η) = λ η' η$ achieves its minimum at $\hat{η} = 0$ . It follows that equations (4.1) and (4.5) are equivalent. The advantage of the new formulation is that we now have a computationally efficient formulation as a mixed model. This will be explained in more detail in Section 6.

5 Extension to tensor product P-splines

In this section we will extend the new formulation of P-splines to higher dimensions. We generalise the notation that was used for the one-dimensional case. Let $M$ be the number of dimensions, and $x_{m}$ the covariate for dimension $m$ . The matrix $B_{m}$ is the $n \times q_{m}$ matrix corresponding to dimension $m$ , $D_{m}$ is a $(q_{m} - k_{m}) \times q_{m}$ matrix, $G_{m}$ is a $q_{m} \times k_{m}$ matrix, and $B_{*, m}$ is a $k_{m} \times q_{m}$ matrix. Let $q = \prod_{m = 1}^{M} q_{m}$ be the total number of parameters, and $k = \prod_{m = 1}^{M} k_{m}$ . The Kronecker product is denoted by $\otimes$ and the row-wise Kronecker product (Eilers and Marx, 2003) is denoted by $□$ . The matrices $B$ , $B_{*}$ and $G$ are defined by

B = B_{1} □ \dots □ B_{M}, B_{*} = \otimes_{m = 1}^{M} B_{*, m}, G = \otimes_{m = 1}^{M} G_{m} .

The fixed part of the model is given by $X = B G$ . For the difference penalty corresponding to dimension $m$ we have:

D_{m} = \otimes_{j = 1}^{M} H_{j}, H_{j} = \{\begin{matrix} D_{j} & j = m \\ I_{q_{j}} & j \neq m \end{matrix},

For the product between $G$ and $D_{m}$ we obtain:

D_{m} G = (\otimes_{j = 1}^{M} H_{j}) (\otimes_{j = 1}^{M} G_{j}) = \otimes_{j = 1}^{M} H_{j} G_{j} = 0, m = 1, \dots, M,

using that $H_{m} G_{m} = D_{m} G_{m} = 0$ .

Define the tensor product objective function $S_{1}$ to be minimized:

S_{1} (β, u) = | | y - B G β - B u {| |}^{2} + \sum_{m = 1}^{M} λ_{m} u' D_{m}^{'} D_{m} u + \sum_{m = 1}^{M} λ_{m} u' B_{*}' B_{*} u

(5.1)

Using the linear transformation defined in equation (4.2) and

D_{m} a = D_{m} (G β + u) = D_{m} u, m = 1, \dots, M

we obtain

S_{2} (a, η) = | | y - B a {| |}^{2} + \sum_{m = 1}^{M} λ_{m} a' D_{m}^{'} D_{m} a + (\sum_{m = 1}^{M} λ_{m}) η' η .

(5.2)

Since $S_{2}$ can be decomposed as $S_{2} (a, η) = S_{3} (a) + S_{4} (η)$ and $S_{4}$ attains its minimum at $\hat{η} = 0$ , we obtain the tensor product P-splines model introduced by Eilers and Marx (2003)

S_{3} (a) = | | y - B a {| |}^{2} + \sum_{m = 1}^{M} λ_{m} a' D_{m}^{'} D_{m} a .

(5.3)

This shows that equations (5.1) and (5.3) are equivalent. In Section 6 we will present an efficient mixed model formulation of equation (5.1). Rodríguez-Ávarez et al. (2015) use another transformation of tensor P-splines to a mixed model, which is based on the spectral decomposition of the difference penalty matrices $D_{m}^{'} D_{m}$ . For a detailed explanation of this mixed model representation for data on a multidimensional grid, see Section 6 in Currie et al., (2006). As will be shown in Section 6, the new approach presented in this article is considerably faster then the algorithm in Rodríguez-Ávarez et al. (2015).

6 The LMMsolver R-package

The LMMsolver R-package on CRAN is a general Linear Mixed Model (LMM) solver. The aim of the LMMsolver package is to provide an efficient and flexible system to estimate variance parameters using restricted maximum likelihood or REML (Patterson and Thompson, 1971). The package was developed specifically for models where the mixed model equations are sparse, including the mixed model P-splines introduced in the previous sections. An example of the use of LMMsolver for a standard mixed model and allowing for heterogeneous residual errors is in the R-package statgenMPP (Li et al., 2022). In this article we will only give the mixed model for the tensor P-splines, although extra fixed or random terms can be added to the mixed model in LMMsolver.

6.1 Solving the linear mixed equations

The penalized regression model defined by equation (5.1) can be formulated as a mixed model:

y = X β + Z u + e, u \sim N (0, Σ), e \sim N (0, R),

(6.1)

with $X = B G$ , $Z = B$ and

R^{- 1} = θ_{0} I_{n}, Σ^{- 1} = \sum_{m = 1}^{M} θ_{m} (D_{m}^{'} D_{m} + B_{*}' B_{*}),

(6.2)

where $θ_{0} = 1 / σ_{e}^{2}$ and $θ_{m} = λ_{m} / σ_{e}^{2} (m = 1, \dots, M)$ are precision parameters to be estimated. For $M > 1$ equation (6.1) is not a standard mixed model, it has overlapping penalties (Currie et al., 2006; Rodríguez-Ávarez et al., 2015; Rodríguez-Ávarez et al., 2019). The mixed model equations corresponding to equation (6.1) are defined by

(\begin{matrix} X' R^{- 1} X & X' R^{- 1} Z \\ Z' R^{- 1} X & Z' R^{- 1} Z + Σ^{- 1} \end{matrix}) (\begin{matrix} \hat{β} \\ \hat{u} \end{matrix}) = (\begin{matrix} X' R^{- 1} y \\ Z' R^{- 1} y \end{matrix}) .

(6.3)

The matrix on the left hand side of equation (6.3) is called the mixed model coefficient matrix and will be denoted by $C$ . This matrix is sparse and therefore equation (6.3) can be solved in a computational efficient way using the Cholesky decomposition of $C$ (Furrer and Sain, 2010).

The followings equations are used to update the precision parameters $θ_{i}$ (see Appendix A for the derivation):

\begin{array}{l} θ_{0} & = \frac{n - ρ_{0}}{\hat{e}' \hat{e}} \\ θ_{i} & = \frac{τ_{i} - ρ_{i}}{\hat{u}' (D_{i}^{'} D_{i} + B_{*}' B_{*}) \hat{u}} i = 1, \dots, M, \end{array},

(6.4)

where $\hat{e} = y - X \hat{β} - Z \hat{u}$ and

\begin{array}{l} ρ_{i} & = θ_{i} \frac{\partial \log | C |}{\partial θ_{i}} = θ_{i} trace (C^{- 1} \frac{\partial C}{\partial θ_{i}}), i = 0, \dots, M \\ τ_{i} & = θ_{i} \frac{\partial \log | Σ^{- 1} |}{\partial θ_{i}} = θ_{i} trace (Σ \frac{\partial Σ^{- 1}}{\partial θ_{i}}), i = 1, \dots, M . \end{array}

(6.5)

The iterative algorithm defined by equations (6.3) and (6.5) is an extension of the modified Henderson algorithm described in Harville (1977).

An important element needed is to calculate $ρ_{i}$ and $τ_{i}$ defined by equation (6.5) in an efficient way, avoiding the calculation of the matrices $C^{- 1}$ and $Σ$ , which are not sparse. One way to do this is to calculate the so-called sparse inverse (Takahashi, 1973; Misztal and Perez-Enciso, 1993; Johnson and Thompson, 1995; Lindgren et al., 2022). In LMMsolver another, but closely related, method was implemented, using Automated Differentiation of the Cholesky algorithm (Smith, 1995). Backward differentiation was implemented, which calculates the partial derivatives of the likelihood efficiently (Meyer and Smith, 1996; Smith, 1995). The chol() function in the spam package was used to permute the row and columns of $C$ and $Σ^{- 1}$ in an optimal way, see Furrer and Sain (2010) for details. The automated differentation was implemented using supernodal Cholesky factorization (Ng and Peyton, 1993, Furrer and Sain, 2010). The implementation was written in C++ using the Rcpp package (Eddelbuettel and Balamuta, 2018).

6.2 Example 1: United States precipitation data set

In this section we will compare the performance of LMMsolver with the R-packages mgcv (Wood, 2017) and SOP (Rodríguez-Ávarez et al., 2019). The method in the SOP package uses the same tensor P-splines model as in LMMsolver. For a detailed comparison between tensor P-splines in mgcv and SOP see Rodríguez-Ávarez et al. (2015). We will use two-dimensional P-splines for the USA precipitation data set (Rodríguez-Ávarez et al., 2015). There are $n = 5, 906$ observations, with longitude-latitude positions of monitoring stations, and total precipitation in millimeters and a standardization of this raw observation for April 1948. This data set can be found in the R-package spam (Furrer and Sain, 2010), under the name USprecip.

All computations were performed in R4.2.3 (R Core Team 2023) and a 2.90GHz Intel Core i5-9400 CPU with 24GB of RAM and Windows10 operating system. Version 1.8.42 of mgcv, version 1.0 of SOP, and version 1.0.5 of LMMsolver were used. For mgcv we used the bam() function, with method=‘fREML’. Cubic B-splines with second-order differences were used for both latitude and longitude. The differences between the estimated effective dimensions (Rodríguez-Ávarez et al., 2015) for the SOP and LMMsolver packages were less than $5 \cdot 10^{- 3}$ . We compared the computation times for different number of segments. The same number was used in both dimensions.

Figure 1A shows the sparse structure for the mixed model coefficient matrix corresponding to equation (5.1) for $40$ segments.

Figure 1

Panel A: The mixed model coefficient matrix $C$ for the new approach for the US precipitation example with $40$ segments for both latitude and longitude. The non-zeros are indicated in red, showing that $C$ is sparse. Panel B: Computation time on a logarithmic scale for the three methods as function of the number of segments in both dimensions: LMMsolver is much faster, especially if the number of segments is high.

In Figure 1B the computation time is compared for different number of segments, showing that LMMsolver is much faster than mgcv and SOP. For example, if the number of segments is 40, the computation time for mgcv is 600 seconds (= 10 minutes), for SOP 140 seconds, and for LMMsolver only one second. The difference in computation time between mgcv and SOP is a factor $4$ , consistent with the results in Rodríguez-Ávarez et al., (2015).

There are two main reasons why LMMsolver is fast for this two-dimensional mixed model P-splines example. First, it retains the sparse structure of the original two-dimensional P-splines as in Eilers and Marx (2003). The second reason is an efficient implementation of the Automated Differentiation of the Cholesky Algorithm in LMMsolver, which avoids the calculations of the inverse of matrices for the derivatives of the REML log-likelihood (Smith, 1995; Meyer and Smith, 1996).

6.3 Example 2: Three-dimensional simulated data

In this section we will use a simulated three-dimensional example to illustrate that LMMsolver can analyze large datasets in an efficient way. One hundred thousand ( $n = 100,000$ ) values of covariates $z_{1}$ , $z_{2}$ , $z_{3}$ were simulated independently from a uniform distribution on the interval $[0,1]$ , and the response was generated from the equation (Wood, 2006; Rodríguez-Ávarez et al., 2015)

y = 1.5 exp (- \frac{{(z_{1} - 0.2)}^{2}}{5} - \frac{{(z_{2} - 0.5)}^{2}}{3} - \frac{{(z_{3} - 0.9)}^{2}}{4})

+ 0.5 exp (- \frac{{(z_{1} - 0.3)}^{2}}{4} - \frac{{(z_{2} - 0.7)}^{2}}{2} - \frac{{(z_{3} - 0.4)}^{2}}{6})

+ exp (- \frac{{(z_{1} - 0.1)}^{2}}{5} - \frac{{(z_{2} - 0.3)}^{2}}{5} - \frac{{(z_{3} - 0.7)}^{2}}{4}) + ϵ,

with $ϵ \sim N (0,1)$ . Figure 2A shows the structure of mixed model coefficient matrix $C$ , which is less sparse than the two-dimensional case shown in Figure 1A. Therefore the differences in computation time between LMMsolver and the packages mgcv and SOP is less than in the two-dimensional case, but still substantial: if ten segments are used in each direction, the computation time for mgcv is 38 minutes, for SOP 10 minutes, and for LMMsolver half a minute.

7 Discussion

In this article we have shown that there is a mixed model formulation for P-splines that adheres closely to the original proposal by Eilers and Marx (1996). The main difference is that in the new approach there is an explicit term for the fixed part, with an additional term for the penalty. An important feature is that the precision matrices and the mixed model equations are sparse. Therefore the sparse matrix algebra developed for mixed models can be used.

An additional advantage of the new approach is that the mixed model P-splines formulation can be further extended to Generalized Additive Models (GAM; Hastie et al., 2009). In standard P-splines a small ridge penalty is added to make the system identifiable (Eilers and Marx, 2021). In the approach described here, this problem does not arise: the column corresponding to $β_{0}$ can be left out of the model, and one intercept term is modelled. This idea is similar to leaving out one column for a factor in the fixed part of the model. GAMs are implemented in LMMsolver.

The mixed model P-splines formulation extends the P-splines framework developed by Paul Eilers and Brian Marx (1996, 2021). An interesting aspect of the mixed model P-splines formulation is that it takes advantage of the properties of the two key components of P-splines: B-splines and discrete difference penalties.

Figure 2

Panel A: The mixed model coefficient matrix $C$ for the new approach for the three-dimensional simulated data with $5$ segments for each dimension. The non-zeros are indicated in red. Panel B: Computation time on a logarithmic scale for LMMsolver, mgcv, and SOP as function of the number of segments in each dimension.

Appendix A

In this appendix we will give a derivation of equation (6.4). The restricted log-likelihood of equation (6.1) is given by (Meyer and Smith, 1996):

\log L = - \frac{1}{2} (const + \log | R | + \log | Σ | + \log | C | + y' P y)

(A.1)

where $P = V^{- 1} - V^{- 1} X {(X' V^{- 1} X)}^{- 1} X' V^{- 1}$ , and $V = Z Σ Z' + R$ . The matrix $P$ has the following properties (Johnson and Thompson, 1995): $X' P = 0$ , $P y = R^{- 1} \hat{e}$ , and $Z' P y = Σ^{- 1} \hat{u}$ . Using these properties of $P$ it follows that

y' P y = (X \hat{b} + Z \hat{u} + \hat{e})^{'} P y = \hat{u}' Σ^{- 1} \hat{u} + \hat{e}' R^{- 1} \hat{e},* 6 p t

(A.2)

and therefore equation (A.1) can be rewritten as:

\log L = \frac{1}{2} (const + \log | R^{- 1} | + \log | Σ^{- 1} | - \log | C | - \hat{e}' R^{- 1} \hat{e} - \hat{u}' Σ^{- 1} \hat{u}) * 6 p t

(A.3)

with matrices $R^{- 1}$ , $Σ^{- 1}$ , and $C$ being sparse for the mixed model P-splines model, and therefore $\log L$ and the partial derivatives with respect to precision parameters $θ_{m}$ can be calculated in a computationally efficient way (Meyer and Smith, 1996).

The partial derivatives of equation (A.3) are given by

\begin{array}{l} \frac{\partial \log L}{\partial θ_{0}} & = \frac{1}{2 θ_{0}} (n - ρ_{0}) - \hat{e}' \hat{e} \\ \frac{\partial \log L}{\partial θ_{i}} & = \frac{1}{2 θ_{i}} (τ_{i} - ρ_{i}) - \hat{u}' (D_{i}^{'} D_{i} + B_{*}' B_{*}) \hat{u} i = 1, \dots, M, \end{array} * 6 p t

(A.4)

where the parameters $ρ_{i}$ and $τ_{i}$ are defined by equation (6.5).

Setting the partial derivatives in equation (A.4) equal to zero gives the expressions for $θ_{i}$ ( $i = 0,1, \dots, M$ ) in equation (6.4).

Supplementary materials

Supplementary materials for this article are available online.

Supplemental Material for Tensor product P-splines using a sparse mixed model formulation by Martin P. Boer, in Statistical Modelling

Footnotes

Acknowledgements

I would like to thank Bart-Jan van Rossum for his contribution to the LMMsolver package. I am indebted to Hugo van den Berg for useful comments on earlier drafts. I would like to thank two reviewers and Guest Editor Paul Eilers for their helpful comments.

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The author received no financial support for the research, authorship and/or publication of this article.

References

Boer

, H-P

Piepho

and Williams

(2020) Linear variance, P-splines and neighbour differences for spatial adjustment in field trials: How are they related? Journal of Agricultural, Biological and Environmental Statistics , 25, 676–698.

Currie

and Durbán

(2002) Flexible smoothing with P-splines: A unified approach. Statistical Modelling , 2, 333–349.

Currie

, Durbán

and Eilers

PHC

(2006) Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 68, 259–280.

De Boor

(1978) A practical guide to splines , vol 27. Springer-Verlag New York.

Eddelbuettel

and Balamuta

(2018) Extending R with C++: A brief introduction to Rcpp. The American Statistician , 72, 28–36.

Eilers

PHC

(1999) Discussion on: The analysis of designed experiments and longitudinal data by using smoothing splines. Journal of the Royal Statistical Society, Series C , 48, 307–308.

Eilers

PHC

and Marx

(1996) Flexible smoothing with B-splines and penalties. Statistical Science , 11, 89–121.

Eilers

PHC

and Marx

(2003) Multivariate calibration with temperature interaction using two-dimensional penalized signal regression. Chemometrics and Intelligent Laboratory Systems , 66, 159–174.

Eilers

PHC

and Marx

(2021) Practical smoothing: The joys of P-splines . Cambridge University Press.

10.

Frasso

and Eilers

PHC

(2015) L-and V-curves for optimal smoothing. Statistical Modelling , 15, 91–111.

11.

Furrer

and Sain

(2010) Spam: A sparse matrix R package with emphasis on MCMC methods for gaussian markov random fields. Journal of Statistical Software , 36, 1–25. doi: 10.18637/jss.v036.i10.

12.

Harville

(1977) Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American Statistical Association , 72, 320–338.

13.

Harville

(1998) Matrix algebra from a statistician’s perspective . Taylor & Francis.

14.

Hastie

, Tibshirani

and Friedman

(2009) The elements of statistical learning. Springer Series in Statistics . Springer New York, New York, NY.

15.

Johnson

and Thompson

(1995) Restricted maximum likelihood estimation of variance components for univariate animal models using sparse matrix techniques and average information. Journal of Dairy Science , 78, 449–456.

16.

Lee

and Durbán

(2011) P-spline ANOVA-type interaction models for spatio-temporal smoothing. Statistical Modelling , 11, 49–69.

17.

Lee

, Durbán

and Eilers

(2013) Efficient two-dimensional smoothing with P-spline anova mixed models and nested bases. Computational Statistics & Data Analysis , 61, 22–37.

18.

, Boer

, van Rossum

, Zheng

, Joosen

and Van Eeuwijk

(2022) statgenMPP: an R package implementing an IBD-based mixed model approach for QTL mapping in a wide range of multi-parent populations. Bioinformatics , 38, 5134–5136.

19.

Lindgren

, Bolin

and Rue

(2022) The SPDE approach for Gaussian and non-Gaussian fields: 10 years and still running. Spatial Statistics , 50, 100599.

20.

Lyche

, Manni

and Speleers

(2018) Foundations of spline theory: B-splines, spline approximation, and hierarchical refinement. Lecture Notes in Mathematics , 2219, 1–76.

21.

Meyer

and Smith

(1996) Restricted maximum likelihood estimation for animal models using derivatives of the likelihood. Genetics Selection Evolution , 28, 23–49. doi: 10.1051/gse:19960102.

22.

Misztal

and Perez-Enciso

(1993) Sparse matrix inversion for restricted maximum likelihood estimation of variance components by expectation-maximization. Journal of Dairy Science , 76, 1479–1483.

23.

and Peyton

(1993) Block sparse Cholesky algorithms on advanced uniprocessor computers. SIAM Journal on Scientific Computing , 14, 1034–1056.

24.

Patterson

and Thompson

(1971) Recovery of inter-block information when block sizes are unequal. Biometrika , 58, 545–554.

25.

Piepho

, Boer

and Williams

(2022) Two-dimensional P-spline smoothing for spatial analysis of plant breeding trials. Biometrical Journal , 64, 835–857.

26.

Press

, Teukolsky

, Vettering

and Flannery

(2002) Numerical recipes in C++: The art of scientific computing (2nd ed). Cambridge University Press.

27.

Rodríuez-Álvarez

, Lee

, Kneib

, Durbán

and Eilers

(2015) Fast smoothing parameter separation in multidimensional generalized P-splines: The SAP algorithm. Statistics and Computing , 25, 941–957.

28.

Rodríguez-Álvarez

, Boer

, van Eeuwijk

and Eilers

PHC

(2018) Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spatial Statistics , 23, 52–71.

29.

Rodríguez-Álvarez

, Durbán

, Lee

and Eilers

PHC

(2019) On the estimation of variance parameters in non-standard generalised linear mixed models: Application to penalised smoothing. Statistics and Computing , 29, 483–500.

30.

Smith

(1995) Differentiation of the Cholesky algorithm. Journal of Computational and Graphical Statistics , 4, 134.

31.

Takahashi

(1973) Formation of sparse bus impedance matrix and its application to short circuit study. In Proceeding of PICA Conference June , 1973.

32.

Whittaker

(1923) On a new method of graduation. Proceedings of Edinburgh Mathematic Society , 41, 63–75.

33.

Williams

(1986) A neighbour model for field experiments. Biometrika , 73, 279–287.

34.

Wood

(2006) Low-rank scale-invariant tensor product smooths for generalized additive mixed models. Biometrics , 62, 1025–1036.

35.

Wood

(2017) Generalized additive models: An introduction with R , 2nd ed.

Boca Raton:

Chapman & Hall CRC.

36.

Wood

, Scheipl

and Faraway

(2013) Straightforward intermediate rank tensor product smoothing in mixed models. Statistics and Computing , 23, 341–360.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB