I have recently been working to ensure that my clubSandwich package works correctly on fitted lme and gls models from the nlme package, which is one of the main R packages for fitting hierarchical linear models. In the course of digging around in the guts of nlme, I noticed a bug in the getVarCov function. The purpose of the function is to extract the estimated variance-covariance matrix of the errors from a fitted lme or gls model.
It seems that this function is sensitive to the order in which the input data are sorted. This bug report noted the problem, but unfortunately their proposed fix doesn’t seem to solve the problem. In this post I’ll demonstrate the bug and a solution. (I’m posting this here because the R project’s bug reporting system is currently closed to people who were not registered as of early July, evidently due to some sort of spamming problem.)
The issue
Here’s a simple demonstration of the problem. I’ll first fit a gls model with a heteroskedastic variance function and an AR(1) auto-correlation structure (no need to worry about the substance of the specification—we’re just worried about computation here) and then extract the variances for each of the units.
Code
# Demonstrate the problem with gls model
library(nlme)
data(Ovary)
gls_raw <- gls(follicles ~ sin(2*pi*Time) + cos(2*pi*Time), data = Ovary,
correlation = corAR1(form = ~ 1 | Mare),
weights = varPower())
Mares <- levels(gls_raw$groups)
V_raw <- lapply(Mares, function(g) getVarCov(gls_raw, individual = g))
Now I’ll repeat the process using the same data, but sorted in a different order
However, the extracted variance-covariance matrices are not:
Code
all.equal(V_raw, V_sorted)
[1] "Component 1: Mean relative difference: 0.03256"
[2] "Component 3: Mean relative difference: 0.05830791"
[3] "Component 4: Mean relative difference: 0.1142209"
[4] "Component 5: Mean relative difference: 0.03619692"
[5] "Component 6: Mean relative difference: 0.09260648"
[6] "Component 8: Mean relative difference: 0.08650327"
[7] "Component 9: Mean relative difference: 0.07627162"
[8] "Component 10: Mean relative difference: 0.018103"
[9] "Component 11: Mean relative difference: 0.1020658"
Here’s the code of the relevant function:
Code
nlme:::getVarCov.gls
function (obj, individual = 1, ...)
{
S <- corMatrix(obj$modelStruct$corStruct)[[individual]]
if (!is.null(obj$modelStruct$varStruct)) {
ind <- obj$groups == individual
vw <- 1/varWeights(obj$modelStruct$varStruct)[ind]
}
else vw <- rep(1, nrow(S))
vars <- (obj$sigma * vw)^2
result <- t(S * sqrt(vars)) * sqrt(vars)
class(result) <- c("marginal", "VarCov")
attr(result, "group.levels") <- names(obj$groups)
result
}
<bytecode: 0x000000001bc39d00>
<environment: namespace:nlme>
The issue is in the 4th line of the body. getVarCov.gls assumes that varWeights(obj$modelStruct$varStruct) is sorted in the same order as obj$groups, which is not necessarily true. Instead, varWeights seem to return the weights sorted according to the grouping variable. For this example, that means that the varWeights will not depend on the order in which the groups are sorted.
The same issue comes up in getVarCov.lme. Here’s the fix and verification:
Code
# proposed patch for getVarCov.lme
getVarCov_revised_lme <- function (obj, individuals, type = c("random.effects", "conditional", "marginal"), ...) {
type <- match.arg(type)
if (any("nlme" == class(obj)))
stop("not implemented for \"nlme\" objects")
if (length(obj$group) > 1)
stop("not implemented for multiple levels of nesting")
sigma <- obj$sigma
D <- as.matrix(obj$modelStruct$reStruct[[1]]) * sigma^2
if (type == "random.effects") {
result <- D
}
else {
result <- list()
groups <- sort(obj$groups[[1]])
ugroups <- unique(groups)
if (missing(individuals))
individuals <- as.matrix(ugroups)[1, ]
if (is.numeric(individuals))
individuals <- ugroups[individuals]
for (individ in individuals) {
indx <- which(individ == ugroups)
if (!length(indx))
stop(gettextf("individual %s was not used in the fit",
sQuote(individ)), domain = NA)
if (is.na(indx))
stop(gettextf("individual %s was not used in the fit",
sQuote(individ)), domain = NA)
ind <- groups == individ
if (!is.null(obj$modelStruct$corStruct)) {
V <- corMatrix(obj$modelStruct$corStruct)[[as.character(individ)]]
}
else V <- diag(sum(ind))
if (!is.null(obj$modelStruct$varStruct))
sds <- 1/varWeights(obj$modelStruct$varStruct)[ind]
else sds <- rep(1, sum(ind))
sds <- obj$sigma * sds
cond.var <- t(V * sds) * sds
dimnames(cond.var) <- list(1:nrow(cond.var), 1:ncol(cond.var))
if (type == "conditional")
result[[as.character(individ)]] <- cond.var
else {
Z <- model.matrix(obj$modelStruct$reStruc, getData(obj))[ind,
, drop = FALSE]
result[[as.character(individ)]] <- cond.var +
Z %*% D %*% t(Z)
}
}
}
class(result) <- c(type, "VarCov")
attr(result, "group.levels") <- names(obj$groups)
result
}
lme_raw <- lme(follicles ~ sin(2*pi*Time) + cos(2*pi*Time),
random = ~ 1 | Mare,
correlation = corExp(form = ~ Time),
weights = varPower(),
data=Ovary)
lme_sorted <- update(lme_raw, data = Ovary_sorted)
all.equal(lme_raw$modelStruct, lme_sorted$modelStruct)
[1] TRUE
Code
# current getVarCov
V_raw <- lapply(Mares, function(g) getVarCov(lme_raw, individual = g, type = "marginal"))
V_sorted <- lapply(Mares, function(g) getVarCov(lme_sorted, individual = g, type = "marginal"))
all.equal(V_raw, V_sorted)
[1] "Component 1: Component 1: Mean relative difference: 0.003989954"
[2] "Component 3: Component 1: Mean relative difference: 0.003784181"
[3] "Component 4: Component 1: Mean relative difference: 0.003028662"
[4] "Component 5: Component 1: Mean relative difference: 0.0005997944"
[5] "Component 6: Component 1: Mean relative difference: 0.002350456"
[6] "Component 7: Component 1: Mean relative difference: 0.007103733"
[7] "Component 8: Component 1: Mean relative difference: 0.001887638"
[8] "Component 9: Component 1: Mean relative difference: 0.0009601843"
[9] "Component 10: Component 1: Mean relative difference: 0.004748783"
[10] "Component 11: Component 1: Mean relative difference: 0.001521097"
---title: Bug in nlme::getVarCovdate: '2016-08-10'categories:- Rstats- programming- hierarchical models- nlmecode-tools: true---```{=html}<p>I have recently been working to ensure that <a href="https://github.com/jepusto/clubSandwich">my <code>clubSandwich</code> package</a> works correctly on fitted <code>lme</code> and <code>gls</code> models from the <code>nlme</code> package, which is one of the main R packages for fitting hierarchical linear models. In the course of digging around in the guts of <code>nlme</code>, I noticed a bug in the <code>getVarCov</code> function. The purpose of the function is to extract the estimated variance-covariance matrix of the errors from a fitted <code>lme</code> or <code>gls</code> model.</p><p>It seems that this function is sensitive to the order in which the input data are sorted. <a href="https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16744">This bug report</a> noted the problem, but unfortunately their proposed fix doesn’t seem to solve the problem. In this post I’ll demonstrate the bug and a solution. (I’m posting this here because the R project’s bug reporting system is currently closed to people who were not registered as of early July, evidently due to some sort of spamming problem.)</p><div id="the-issue" class="level1"><h1>The issue</h1><p>Here’s a simple demonstration of the problem. I’ll first fit a <code>gls</code> model with a heteroskedastic variance function and an AR(1) auto-correlation structure (no need to worry about the substance of the specification—we’re just worried about computation here) and then extract the variances for each of the units.</p><div class="cell"><details open="" class="code-fold"><summary>Code</summary><div class="sourceCode cell-code" id="cb20"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"># Demonstrate the problem with gls modellibrary(nlme)data(Ovary)gls_raw <- gls(follicles ~ sin(2*pi*Time) + cos(2*pi*Time), data = Ovary, correlation = corAR1(form = ~ 1 | Mare), weights = varPower())Mares <- levels(gls_raw$groups)V_raw <- lapply(Mares, function(g) getVarCov(gls_raw, individual = g))</code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div></details></div><p>Now I’ll repeat the process using the same data, but sorted in a different order</p><div class="cell"><details open="" class="code-fold"><summary>Code</summary><div class="sourceCode cell-code" id="cb20"><pre class="sourceCode r code-with-copy"><code class="sourceCode r">Ovary_sorted <- Ovary[with(Ovary, order(Mare, Time)),]gls_sorted <- update(gls_raw, data = Ovary_sorted)V_sorted <- lapply(Mares, function(g) getVarCov(gls_sorted, individual = g))</code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div></details></div><p>The variance component estimates are essentially equal:</p><div class="cell"><details open="" class="code-fold"><summary>Code</summary><div class="sourceCode cell-code" id="cb21"><pre class="sourceCode r code-with-copy"><code class="sourceCode r">all.equal(gls_raw$modelStruct, gls_sorted$modelStruct)</code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div></details><div class="cell-output cell-output-stdout"><pre><code>[1] TRUE</code></pre></div></div><p>However, the extracted variance-covariance matrices are not:</p><div class="cell"><details open="" class="code-fold"><summary>Code</summary><div class="sourceCode cell-code" id="cb21"><pre class="sourceCode r code-with-copy"><code class="sourceCode r">all.equal(V_raw, V_sorted)</code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div></details><div class="cell-output cell-output-stdout"><pre><code>[1] "Component 1: Mean relative difference: 0.03256" [2] "Component 3: Mean relative difference: 0.05830791"[3] "Component 4: Mean relative difference: 0.1142209" [4] "Component 5: Mean relative difference: 0.03619692"[5] "Component 6: Mean relative difference: 0.09260648"[6] "Component 8: Mean relative difference: 0.08650327"[7] "Component 9: Mean relative difference: 0.07627162"[8] "Component 10: Mean relative difference: 0.018103" [9] "Component 11: Mean relative difference: 0.1020658"</code></pre></div></div><p>Here’s the code of the relevant function:</p><div class="cell"><details open="" class="code-fold"><summary>Code</summary><div class="sourceCode cell-code" id="cb21"><pre class="sourceCode r code-with-copy"><code class="sourceCode r">nlme:::getVarCov.gls</code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div></details><div class="cell-output cell-output-stdout"><pre><code>function (obj, individual = 1, ...) { S <- corMatrix(obj$modelStruct$corStruct)[[individual]] if (!is.null(obj$modelStruct$varStruct)) { ind <- obj$groups == individual vw <- 1/varWeights(obj$modelStruct$varStruct)[ind] } else vw <- rep(1, nrow(S)) vars <- (obj$sigma * vw)^2 result <- t(S * sqrt(vars)) * sqrt(vars) class(result) <- c("marginal", "VarCov") attr(result, "group.levels") <- names(obj$groups) result}<bytecode: 0x000000001bc39d00><environment: namespace:nlme></code></pre></div></div><p>The issue is in the 4th line of the body. <code>getVarCov.gls</code> assumes that <code>varWeights(obj$modelStruct$varStruct)</code> is sorted in the same order as <code>obj$groups</code>, which is not necessarily true. Instead, <code>varWeights</code> seem to return the weights sorted according to the grouping variable. For this example, that means that the <code>varWeights</code> will not depend on the order in which the groups are sorted.</p><div class="cell"><details open="" class="code-fold"><summary>Code</summary><div class="sourceCode cell-code" id="cb21"><pre class="sourceCode r code-with-copy"><code class="sourceCode r">identical(gls_raw$groups, gls_sorted$groups)</code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div></details><div class="cell-output cell-output-stdout"><pre><code>[1] FALSE</code></pre></div></div><div class="cell"><details open="" class="code-fold"><summary>Code</summary><div class="sourceCode cell-code" id="cb21"><pre class="sourceCode r code-with-copy"><code class="sourceCode r">identical(varWeights(gls_raw$modelStruct$varStruct), varWeights(gls_sorted$modelStruct$varStruct))</code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div></details><div class="cell-output cell-output-stdout"><pre><code>[1] TRUE</code></pre></div></div></div><div id="fix-for-nlmegetvarcov.gls" class="level1"><h1>Fix for <code>nlme:::getVarCov.gls</code></h1><p>I think this can be solved by either</p><ul><li>putting the <code>varWeights</code> back into the same order as the raw data or</li><li>sorting <code>obj$groups</code> before identifying the rows corresponding to the specified <code>individual</code>.</li></ul><p>Here’s a revised function that takes the second approach:</p><div class="cell"><details open="" class="code-fold"><summary>Code</summary><div class="sourceCode cell-code" id="cb20"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"># proposed patch for getVarCov.glsgetVarCov_revised_gls <- function (obj, individual = 1, ...) { S <- corMatrix(obj$modelStruct$corStruct)[[individual]] if (!is.null(obj$modelStruct$varStruct)) { ind <- sort(obj$groups) == individual vw <- 1 / varWeights(obj$modelStruct$varStruct)[ind] } else vw <- rep(1, nrow(S)) vars <- (obj$sigma * vw)^2 result <- t(S * sqrt(vars)) * sqrt(vars) class(result) <- c("marginal", "VarCov") attr(result, "group.levels") <- names(obj$groups) result}</code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div></details></div><p>Testing that it works correctly:</p><div class="cell"><details open="" class="code-fold"><summary>Code</summary><div class="sourceCode cell-code" id="cb21"><pre class="sourceCode r code-with-copy"><code class="sourceCode r">V_raw <- lapply(Mares, function(g) getVarCov_revised_gls(gls_raw, individual = g))V_sorted <- lapply(Mares, function(g) getVarCov_revised_gls(gls_sorted, individual = g))all.equal(V_raw, V_sorted)</code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div></details><div class="cell-output cell-output-stdout"><pre><code>[1] TRUE</code></pre></div></div></div><div id="fix-for-nlmegetvarcov.lme" class="level1"><h1>Fix for <code>nlme:::getVarCov.lme</code></h1><p>The same issue comes up in <code>getVarCov.lme</code>. Here’s the fix and verification:</p><div class="cell"><details open="" class="code-fold"><summary>Code</summary><div class="sourceCode cell-code" id="cb21"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"># proposed patch for getVarCov.lmegetVarCov_revised_lme <- function (obj, individuals, type = c("random.effects", "conditional", "marginal"), ...) { type <- match.arg(type) if (any("nlme" == class(obj))) stop("not implemented for \"nlme\" objects") if (length(obj$group) > 1) stop("not implemented for multiple levels of nesting") sigma <- obj$sigma D <- as.matrix(obj$modelStruct$reStruct[[1]]) * sigma^2 if (type == "random.effects") { result <- D } else { result <- list() groups <- sort(obj$groups[[1]]) ugroups <- unique(groups) if (missing(individuals)) individuals <- as.matrix(ugroups)[1, ] if (is.numeric(individuals)) individuals <- ugroups[individuals] for (individ in individuals) { indx <- which(individ == ugroups) if (!length(indx)) stop(gettextf("individual %s was not used in the fit", sQuote(individ)), domain = NA) if (is.na(indx)) stop(gettextf("individual %s was not used in the fit", sQuote(individ)), domain = NA) ind <- groups == individ if (!is.null(obj$modelStruct$corStruct)) { V <- corMatrix(obj$modelStruct$corStruct)[[as.character(individ)]] } else V <- diag(sum(ind)) if (!is.null(obj$modelStruct$varStruct)) sds <- 1/varWeights(obj$modelStruct$varStruct)[ind] else sds <- rep(1, sum(ind)) sds <- obj$sigma * sds cond.var <- t(V * sds) * sds dimnames(cond.var) <- list(1:nrow(cond.var), 1:ncol(cond.var)) if (type == "conditional") result[[as.character(individ)]] <- cond.var else { Z <- model.matrix(obj$modelStruct$reStruc, getData(obj))[ind, , drop = FALSE] result[[as.character(individ)]] <- cond.var + Z %*% D %*% t(Z) } } } class(result) <- c(type, "VarCov") attr(result, "group.levels") <- names(obj$groups) result}lme_raw <- lme(follicles ~ sin(2*pi*Time) + cos(2*pi*Time), random = ~ 1 | Mare, correlation = corExp(form = ~ Time), weights = varPower(), data=Ovary)lme_sorted <- update(lme_raw, data = Ovary_sorted)all.equal(lme_raw$modelStruct, lme_sorted$modelStruct)</code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div></details><div class="cell-output cell-output-stdout"><pre><code>[1] TRUE</code></pre></div></div><div class="cell"><details open="" class="code-fold"><summary>Code</summary><div class="sourceCode cell-code" id="cb21"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"># current getVarCovV_raw <- lapply(Mares, function(g) getVarCov(lme_raw, individual = g, type = "marginal"))V_sorted <- lapply(Mares, function(g) getVarCov(lme_sorted, individual = g, type = "marginal"))all.equal(V_raw, V_sorted)</code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div></details><div class="cell-output cell-output-stdout"><pre><code>[1] "Component 1: Component 1: Mean relative difference: 0.003989954" [2] "Component 3: Component 1: Mean relative difference: 0.003784181" [3] "Component 4: Component 1: Mean relative difference: 0.003028662" [4] "Component 5: Component 1: Mean relative difference: 0.0005997944" [5] "Component 6: Component 1: Mean relative difference: 0.002350456" [6] "Component 7: Component 1: Mean relative difference: 0.007103733" [7] "Component 8: Component 1: Mean relative difference: 0.001887638" [8] "Component 9: Component 1: Mean relative difference: 0.0009601843" [9] "Component 10: Component 1: Mean relative difference: 0.004748783"[10] "Component 11: Component 1: Mean relative difference: 0.001521097"</code></pre></div></div><div class="cell"><details open="" class="code-fold"><summary>Code</summary><div class="sourceCode cell-code" id="cb21"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"># revised getVarCov V_raw <- lapply(Mares, function(g) getVarCov_revised_lme(lme_raw, individual = g, type = "marginal"))V_sorted <- lapply(Mares, function(g) getVarCov_revised_lme(lme_sorted, individual = g, type = "marginal"))all.equal(V_raw, V_sorted)</code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div></details><div class="cell-output cell-output-stdout"><pre><code>[1] TRUE</code></pre></div></div></div><div id="session-info" class="level1"><h1>Session info</h1><div class="cell"><details open="" class="code-fold"><summary>Code</summary><div class="sourceCode cell-code" id="cb21"><pre class="sourceCode r code-with-copy"><code class="sourceCode r">sessionInfo()</code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div></details><div class="cell-output cell-output-stdout"><pre><code>R version 3.6.3 (2020-02-29)Platform: x86_64-w64-mingw32/x64 (64-bit)Running under: Windows 10 x64 (build 17763)Matrix products: defaultlocale:[1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252[4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages:[1] stats graphics grDevices utils datasets methods base other attached packages:[1] nlme_3.1-144loaded via a namespace (and not attached): [1] Rcpp_1.0.4.6 bookdown_0.14 lattice_0.20-38 digest_0.6.25 [5] grid_3.6.3 magrittr_1.5 evaluate_0.14 blogdown_0.18 [9] rlang_0.4.5 stringi_1.4.3 rmarkdown_2.1 tools_3.6.3 [13] stringr_1.4.0 xfun_0.12 yaml_2.2.0 compiler_3.6.3 [17] htmltools_0.4.0 knitr_1.28</code></pre></div></div></div>```