Bug in nlme::getVarCov

Rstats
programming
hierarchical models
nlme
Author

James E. Pustejovsky

Published

August 10, 2016

I have recently been working to ensure that my clubSandwich package works correctly on fitted lme and gls models from the nlme package, which is one of the main R packages for fitting hierarchical linear models. In the course of digging around in the guts of nlme, I noticed a bug in the getVarCov function. The purpose of the function is to extract the estimated variance-covariance matrix of the errors from a fitted lme or gls model.

It seems that this function is sensitive to the order in which the input data are sorted. This bug report noted the problem, but unfortunately their proposed fix doesn’t seem to solve the problem. In this post I’ll demonstrate the bug and a solution. (I’m posting this here because the R project’s bug reporting system is currently closed to people who were not registered as of early July, evidently due to some sort of spamming problem.)

The issue

Here’s a simple demonstration of the problem. I’ll first fit a gls model with a heteroskedastic variance function and an AR(1) auto-correlation structure (no need to worry about the substance of the specification—we’re just worried about computation here) and then extract the variances for each of the units.

Code
# Demonstrate the problem with gls model

library(nlme)
data(Ovary)

gls_raw <- gls(follicles ~ sin(2*pi*Time) + cos(2*pi*Time), data = Ovary,
               correlation = corAR1(form = ~ 1 | Mare),
               weights = varPower())

Mares <- levels(gls_raw$groups)
V_raw <- lapply(Mares, function(g) getVarCov(gls_raw, individual = g))

Now I’ll repeat the process using the same data, but sorted in a different order

Code
Ovary_sorted <- Ovary[with(Ovary, order(Mare, Time)),]
gls_sorted <- update(gls_raw, data = Ovary_sorted)

V_sorted <- lapply(Mares, function(g) getVarCov(gls_sorted, individual = g))

The variance component estimates are essentially equal:

Code
all.equal(gls_raw$modelStruct, gls_sorted$modelStruct)
[1] TRUE

However, the extracted variance-covariance matrices are not:

Code
all.equal(V_raw, V_sorted)
[1] "Component 1: Mean relative difference: 0.03256"   
[2] "Component 3: Mean relative difference: 0.05830791"
[3] "Component 4: Mean relative difference: 0.1142209" 
[4] "Component 5: Mean relative difference: 0.03619692"
[5] "Component 6: Mean relative difference: 0.09260648"
[6] "Component 8: Mean relative difference: 0.08650327"
[7] "Component 9: Mean relative difference: 0.07627162"
[8] "Component 10: Mean relative difference: 0.018103" 
[9] "Component 11: Mean relative difference: 0.1020658"

Here’s the code of the relevant function:

Code
nlme:::getVarCov.gls
function (obj, individual = 1, ...) 
{
    S <- corMatrix(obj$modelStruct$corStruct)[[individual]]
    if (!is.null(obj$modelStruct$varStruct)) {
        ind <- obj$groups == individual
        vw <- 1/varWeights(obj$modelStruct$varStruct)[ind]
    }
    else vw <- rep(1, nrow(S))
    vars <- (obj$sigma * vw)^2
    result <- t(S * sqrt(vars)) * sqrt(vars)
    class(result) <- c("marginal", "VarCov")
    attr(result, "group.levels") <- names(obj$groups)
    result
}
<bytecode: 0x000000001bc39d00>
<environment: namespace:nlme>

The issue is in the 4th line of the body. getVarCov.gls assumes that varWeights(obj$modelStruct$varStruct) is sorted in the same order as obj$groups, which is not necessarily true. Instead, varWeights seem to return the weights sorted according to the grouping variable. For this example, that means that the varWeights will not depend on the order in which the groups are sorted.

Code
identical(gls_raw$groups, gls_sorted$groups)
[1] FALSE
Code
identical(varWeights(gls_raw$modelStruct$varStruct), 
          varWeights(gls_sorted$modelStruct$varStruct))
[1] TRUE

Fix for nlme:::getVarCov.gls

I think this can be solved by either

  • putting the varWeights back into the same order as the raw data or
  • sorting obj$groups before identifying the rows corresponding to the specified individual.

Here’s a revised function that takes the second approach:

Code
# proposed patch for getVarCov.gls

getVarCov_revised_gls <- function (obj, individual = 1, ...) {
    S <- corMatrix(obj$modelStruct$corStruct)[[individual]]
    if (!is.null(obj$modelStruct$varStruct)) {
        ind <- sort(obj$groups) == individual
        vw <- 1 / varWeights(obj$modelStruct$varStruct)[ind]
    }
    else vw <- rep(1, nrow(S))
    vars <- (obj$sigma * vw)^2
    result <- t(S * sqrt(vars)) * sqrt(vars)
    class(result) <- c("marginal", "VarCov")
    attr(result, "group.levels") <- names(obj$groups)
    result
}

Testing that it works correctly:

Code
V_raw <- lapply(Mares, function(g) getVarCov_revised_gls(gls_raw, individual = g))
V_sorted <- lapply(Mares, function(g) getVarCov_revised_gls(gls_sorted, individual = g))
all.equal(V_raw, V_sorted)
[1] TRUE

Fix for nlme:::getVarCov.lme

The same issue comes up in getVarCov.lme. Here’s the fix and verification:

Code
# proposed patch for getVarCov.lme

getVarCov_revised_lme <- function (obj, individuals, type = c("random.effects", "conditional", "marginal"), ...) {
    type <- match.arg(type)
    if (any("nlme" == class(obj))) 
        stop("not implemented for \"nlme\" objects")
    if (length(obj$group) > 1) 
        stop("not implemented for multiple levels of nesting")
    sigma <- obj$sigma
    D <- as.matrix(obj$modelStruct$reStruct[[1]]) * sigma^2
    if (type == "random.effects") {
        result <- D
    }
    else {
        result <- list()
        groups <- sort(obj$groups[[1]])
        ugroups <- unique(groups)
        if (missing(individuals)) 
            individuals <- as.matrix(ugroups)[1, ]
        if (is.numeric(individuals)) 
            individuals <- ugroups[individuals]
        for (individ in individuals) {
            indx <- which(individ == ugroups)
            if (!length(indx)) 
                stop(gettextf("individual %s was not used in the fit", 
                  sQuote(individ)), domain = NA)
            if (is.na(indx)) 
                stop(gettextf("individual %s was not used in the fit", 
                  sQuote(individ)), domain = NA)
            ind <- groups == individ
            if (!is.null(obj$modelStruct$corStruct)) {
                V <- corMatrix(obj$modelStruct$corStruct)[[as.character(individ)]]
            }
            else V <- diag(sum(ind))
            if (!is.null(obj$modelStruct$varStruct)) 
                sds <- 1/varWeights(obj$modelStruct$varStruct)[ind]
            else sds <- rep(1, sum(ind))
            sds <- obj$sigma * sds
            cond.var <- t(V * sds) * sds
            dimnames(cond.var) <- list(1:nrow(cond.var), 1:ncol(cond.var))
            if (type == "conditional") 
                result[[as.character(individ)]] <- cond.var
            else {
                Z <- model.matrix(obj$modelStruct$reStruc, getData(obj))[ind, 
                  , drop = FALSE]
                result[[as.character(individ)]] <- cond.var + 
                  Z %*% D %*% t(Z)
            }
        }
    }
    class(result) <- c(type, "VarCov")
    attr(result, "group.levels") <- names(obj$groups)
    result
}

lme_raw <- lme(follicles ~ sin(2*pi*Time) + cos(2*pi*Time), 
               random = ~ 1 | Mare,
               correlation = corExp(form = ~ Time),
               weights = varPower(),
               data=Ovary)

lme_sorted <- update(lme_raw, data = Ovary_sorted)

all.equal(lme_raw$modelStruct, lme_sorted$modelStruct)
[1] TRUE
Code
# current getVarCov
V_raw <- lapply(Mares, function(g) getVarCov(lme_raw, individual = g, type = "marginal"))
V_sorted <- lapply(Mares, function(g) getVarCov(lme_sorted, individual = g, type = "marginal"))
all.equal(V_raw, V_sorted)
[1] "Component 1: Component 1: Mean relative difference: 0.003989954" 
 [2] "Component 3: Component 1: Mean relative difference: 0.003784181" 
 [3] "Component 4: Component 1: Mean relative difference: 0.003028662" 
 [4] "Component 5: Component 1: Mean relative difference: 0.0005997944"
 [5] "Component 6: Component 1: Mean relative difference: 0.002350456" 
 [6] "Component 7: Component 1: Mean relative difference: 0.007103733" 
 [7] "Component 8: Component 1: Mean relative difference: 0.001887638" 
 [8] "Component 9: Component 1: Mean relative difference: 0.0009601843"
 [9] "Component 10: Component 1: Mean relative difference: 0.004748783"
[10] "Component 11: Component 1: Mean relative difference: 0.001521097"
Code
# revised getVarCov 
V_raw <- lapply(Mares, function(g) getVarCov_revised_lme(lme_raw, individual = g, type = "marginal"))
V_sorted <- lapply(Mares, function(g) getVarCov_revised_lme(lme_sorted, individual = g, type = "marginal"))
all.equal(V_raw, V_sorted)
[1] TRUE

Session info

Code
sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] nlme_3.1-144

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6    bookdown_0.14   lattice_0.20-38 digest_0.6.25  
 [5] grid_3.6.3      magrittr_1.5    evaluate_0.14   blogdown_0.18  
 [9] rlang_0.4.5     stringi_1.4.3   rmarkdown_2.1   tools_3.6.3    
[13] stringr_1.4.0   xfun_0.12       yaml_2.2.0      compiler_3.6.3 
[17] htmltools_0.4.0 knitr_1.28
Back to top