robust variance estimation

James E. Pustejovsky


April 21, 2014

A common problem arising in many areas of meta-analysis is how to synthesize a set of effect sizes when the set includes multiple effect size estimates from the same study. It’s often not possible to obtain all of the information you’d need in order to estimate the sampling covariances between those effect sizes, yet without that information, established approaches to modeling dependent effect sizes become inaccurate. Hedges, Tipton, & Johnson (2010, HTJ hereafter) proposed the use of cluster-robust standard errors for multi-variate meta-analysis. (These are also called “sandwich” standard errors, which is up there on the list of great and evocative names for statistical procedures.) The great advantage of the sandwich approach is that it permits valid inferences for average effect sizes and meta-regression coefficients even if you don’t have correct covariance estimates (or variance estimates, for that matter).

I recently heard from Beth Tipton (who’s a graduate-school buddy) that she and her student have written an R package implementing the HTJ methods, including moment estimators for the between-study variance components. I want to try out the cluster-robust standard errors for a project I’m working on, but I also need to use REML estimators rather than the moment estimators. It turns out, it’s easy enough to do that by writing a couple of short functions. Here’s how.

First, the metafor package contains a very rich suite of meta-analytic methods, including for multi-variate meta-analysis. The only thing it lacks is sandwich standard errors. However, the sandwich package provides an efficient, well-structured framework for calculating all sorts of robust standard errors. All that’s needed are a few functions to make the packages talk to each other. Each of the functions described below takes as input a fitted multi-variate meta-analysis model, which is represented in R by an object of class

First load up the packages:


Next, I need a bread method for objects of class, which is a function that returns the \(p \times p\) matrix \(\displaystyle{m \left(\sum_{i=1}^m \mathbf{X}_j' \mathbf{W}_j \mathbf{X}_j\right)^{-1}}\). The bread function is straight-forward because it is just a multiple of the model-based covariance matrix, which objects store in the vb component:

Code <- function(obj) {
  cluster <- findCluster(obj)
  length(unique(cluster)) * obj$vb  

I also need an estfun method for objects of class, which is a function that returns an \(m \times p\) matrix where row \(j\) is equal to \(\mathbf{e}_j' \mathbf{W}_j \mathbf{X}_j\), \(j = 1,...,m\). The necessary pieces for the estfun method can also be pulled out of the components of

Code <- function(obj) {
  cluster <- droplevels(as.factor(findCluster(obj)))
  res <- residuals(obj)
  WX <- chol2inv(chol(obj$M)) %*% obj$X
  rval <- by(cbind(res, WX), cluster, 
             function(x) colSums(x[,1] * x[,-1, drop = FALSE]))
  rval <- matrix(unlist(rval), length(unique(cluster)), obj$p, byrow=TRUE)
  colnames(rval) <- colnames(obj$X)

The remaining question is how to determine which of the components in the model should be used to define independent clusters. This is a little bit tricky because there are several different methods of specifying random effects in the function. One way involves providing a list of formulas, each containing a factor associated with a unique random effect, such as random = list( ~ 1 | classroom, ~ 1 | school). If this method of specifying random effects is used, the object will have the component withS set to TRUE, and my approach is to simply take the factor with the smallest number of unique levels. This is perhaps a little bit presumptious, because the withS method could potentially be used to specify arbitrary random effects, where one level is not strictly nested inside another. However, probably the most common use will involve nested factors, so my assumption seems like a good starting point at least.

Another approach to specifying random effects is to use a formula of the form random = inner | outer, in which case the object will have the component withG set to TRUE. Here, it seems reasonable to use the outer factor for defining clusters. If both the withS and withG methods are used together, I’ll assume that the withS factors contain the outermost level.

Finally, if is used to estimate a fixed effects model without any random components, the clustering factor will have to be manually added to the object in a component called cluster. For example, if you want to cluster on the variable studyID in the dataframe dat:

rma_fit$cluster <- dat$studyID

Here’s code that implements these assumptions:

findCluster <- function(obj) {
  if (is.null(obj$cluster)) {
    if (obj$withS) {
      r <- which.min(obj$s.nlevels)
      cluster <- obj$mf.r[[r]][[obj$s.names[r]]]
    } else if (obj$withG) {
      cluster <- obj$mf.r[[1]][[obj$g.names[2]]]
    } else {
        stop("No clustering variable specified.")
  } else {
    cluster <- obj$cluster

With these three functions, you can then use metafor to fit a random effects model, sandwich to calculate the standard errors, and functions like coeftest from the package lmtest to run \(t\)-tests. As a little bonus, here’s a function for probably the most common case of how you’d use the sandwich standard errors:

RobustResults <- function(obj, adjust = TRUE) {
  cluster <- findCluster(obj)  
  vcov. <- sandwich(obj, adjust = adjust)
  df. <- length(unique(cluster)) - obj$p
  coeftest(obj, vcov. = vcov., df = df.)

See here for a file containing the full code.


Tanner-Smith & Tipton (2013) provide an application of the cluster-robust method to a fictional dataset with 68 effect sizes nested within 15 studies. They call this a “hierarchical” dependence example because each effect size estimate is drawn from an independent sample, but dependence is induced because the experiments were all done in the same lab. For comparison purposes, here are the results produced by robumeta:


HTJ <- robu(effectsize ~ 1,
       data = hierdat, modelweights = "HIER",
       studynum = studyid,
       var.eff.size = var, small = FALSE)
RVE: Hierarchical Effects Model  

Model: effectsize ~ 1 

Number of clusters = 15 
Number of outcomes = 68 (min = 1 , mean = 4.53 , median = 2 , max = 29 )
Omega.sq = 0.1560802 
Tau.sq = 0.06835547 

               Estimate StdErr t-value dfs  P(|t|>) 95% CI.L 95% CI.U Sig
1 X.Intercept.     0.25 0.0598    4.18  14 0.000925    0.122    0.378 ***
Signif. codes: < .01 *** < .05 ** < .10 *

To exactly re-produce the results with metafor, I’ll need to use the weights proposed by HTJ. In their approach, effect size \(i\) from study \(j\) receives weight equal to \(\left(v_{ij} + \hat\omega^2 + \hat\tau^2\right)^{-1}\), where \(v_{ij}\) is the sampling variance of the effect size, \(\hat\omega^2\) is an estimate of the between-sample within-study variance, and \(\hat\tau^2\) is an estimate of the between-study variance. After calculating these weights, I fit the model in metafor, calculate the sandwich covariance matrix, and replay the results:

hierdat$var_HTJ <- hierdat$var + 
  as.numeric(HTJ$mod_info$omega.sq) + 
  as.numeric(HTJ$mod_info$tau.sq) # calculate weights
meta1 <- = effectsize ~ 1, V = var_HTJ, data = hierdat, method = "FE")
meta1$cluster <- hierdat$studyid # add clustering variable to the fitted model

t test of coefficients:

        Estimate Std. Error t value  Pr(>|t|)    
intrcpt 0.249826   0.059762  4.1803 0.0009253 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The HTJ weights are not the only alternative–one could instead use weights that are exactly inverse variance under the posited model. For effect \(i\) from study \(j\), these weights would be closer to \(\left(v_{ij} + \hat\omega^2 + k_j \hat\tau^2 \right)^{-1}\). For \(\hat\tau^2 > 0\), the inverse-variance weights put proportionately less weight on studies containing many effects. These weights can be calculated in metafor as follows:

meta2 <- = effectsize ~ 1, V = var, 
                 random = list(~ 1 | esid, ~ 1 | studyid), 
                 sigma2 = c(HTJ$mod_info$omega.sq, HTJ$mod_info$tau.sq),
                 data = hierdat)

t test of coefficients:

        Estimate Std. Error t value Pr(>|t|)   
intrcpt 0.264422   0.086688  3.0503 0.008645 **
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Curiously, the robust standard error increases under a weighting scheme that is more efficient if the model is correct.

Finally, metafor provides ML and REML estimators for the between-sample and between-study random effects (the HTJ moment estimators are not available though). Here are the results based on REML estimators and the corresponding inverse-variance weights:

meta3 <- = effectsize ~ 1, V = var, 
                 random = list(~ 1 | esid, ~ 1 | studyid), 
                 data = hierdat,
                method = "REML")

Multivariate Meta-Analysis Model (k = 68; method: REML)

Variance Components:

            estim    sqrt  nlvls  fixed   factor 
sigma^2.1  0.2263  0.4757     68     no     esid 
sigma^2.2  0.0000  0.0000     15     no  studyid 

Test for Heterogeneity:
Q(df = 67) = 370.1948, p-val < .0001

Model Results:

estimate      se    zval    pval   ci.ub      
  0.2501  0.0661  3.7822  0.0002  0.1205  0.3797  *** 

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

t test of coefficients:

        Estimate Std. Error t value  Pr(>|t|)    
intrcpt 0.250071   0.059796  4.1821 0.0009222 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The between-study variance estimate is tiny, particularly when compared to the between-sample within-study estimate. Despite the difference in variance estimates, the average effect size estimate is nearly identical to the estimate based on the HTJ approach.

See here for the full code to reproduce this example.


It would be straight-forward to add a few more functions that provide robust standard errors for univariate meta-analysis models as well. All that it would take is to write bread and estfun methods for the class rma.uni.

Also, Beth has recently proposed small-sample corrections to the cluster-robust estimators, based on the bias-reduced linearization (BRL) approach of McCaffrey, Bell, & Botts (2001). It seems to me that these small-sample corrections could also be implemented using an approach similar to what I’ve done here, by building out the estfun method to provide BRL results. It would take a little more thought, but actually it would be worth doing–and treating the general case–because BRL seems like it would be useful for all sorts of models besides multi-variate meta-analysis.

Back to top