Meta-sandwich with extra mustard

robust variance estimation

James E. Pustejovsky


April 26, 2014

In an earlier post about sandwich standard errors for multi-variate meta-analysis, I mentioned that Beth Tipton has recently proposed small-sample corrections for the covariance estimators and t-tests, based on the bias-reduced linearization approach of McCaffrey, Bell, and Botts (2001). You can find her forthcoming paper on the adjustments here. My understanding is that these small-sample corrections are important because the uncorrected sandwich estimators can lead to under-statement of uncertainty and inflated type I error rates when a given meta-regression coefficient is estimated from only a small or moderately sized sample of independent studies (or clusters of studies). Moreover, it can be difficult to determine exactly when you have a large enough sample to trust the uncorrected sandwiches.

I wanted to try out these small-sample corrected sandwich estimators for a meta-analyses project that I’m working on. Beth and one of her students have written an R package called robumeta that implements the sandwich covariance estimator and small-sample corrections as described in her paper. However, for my project I want to use the metafor package, which doesn’t provide these methods. I’ve therefore created a set of functions that implement the sandwich covariance estimators and small-sample corrections for models estimated using the function in metafor. Here is the complete code. Sorry, there’s no further documentation at the moment (beyond the rest of this post).

Consistency with robumeta

In order to check that the functions are correct, I compared the results generated by robumeta with the results from metafor plus my functions. Here’s one example (I looked at a few others as well). First, the robumeta results:


robu_hier <- robu(
  effectsize ~ males + binge,
  data = hierdat,
  modelweights = "HIER",
  studynum = studyid,
  var.eff.size = var,
  small = TRUE
RVE: Hierarchical Effects Model with Small-Sample Corrections 

Model: effectsize ~ males + binge 

Number of clusters = 15 
Number of outcomes = 68 (min = 1 , mean = 4.53 , median = 2 , max = 29 )
Omega.sq = 0.1146972 
Tau.sq = 0.06797866 

               Estimate  StdErr t-value  dfs P(|t|>) 95% CI.L 95% CI.U Sig
1 X.Intercept.  -0.0989 0.32140  -0.308 1.79 0.79045  -1.6511   1.4533    
2        males   0.0020 0.00441   0.454 1.88 0.69689  -0.0182   0.0222    
3        binge   0.6799 0.12156   5.594 4.18 0.00439   0.3482   1.0117 ***
Signif. codes: < .01 *** < .05 ** < .10 *
Note: If df < 4, do not trust the results

To maintain consistency, I first need to calculate the approximate weights used in robumeta and then fit the model in metafor using these fixed weights.

devtools::source_gist(id = "11302318", filename = "metafor-BRL.R")

hierdat$var_HTJ <- hierdat$var + as.numeric(robu_hier$mod_info$omega.sq) + as.numeric(robu_hier$mod_info$tau.sq)

meta_hier <-
  yi = effectsize ~ males + binge,
  V = var_HTJ,
  data = hierdat,
  method = "FE"
meta_hier$cluster <- hierdat$studyid

            Estimate  Std. Error    t value       df    Pr(>|t|)
intrcpt -0.098869582 0.321400179 -0.3076214 1.788350 0.790446059
males    0.002002043 0.004410552  0.4539212 1.879142 0.696887075
binge    0.679929801 0.121556887  5.5935111 4.182783 0.004385654

The estimated covariance matrices match:

all.equal(sandwich(meta_hier, meat.=meatBRL), 
[1] TRUE

It can also be verified that the p-values based on the Satterthwaite degrees of freedom agree.

Use with metafor

Of course, the point of writing functions that work with objects is not to replicate robumeta results, but to take advantage of metafor’s flexibility. Rather than estimate the model with robumeta, typically one would estimate the variance components in metafor and then calculate the sandwich covariance estimates and small-sample corrections. For instance:

meta_REML <-
  yi = effectsize ~ males + binge,
  V = var,
  random = list( ~ 1 | esid, ~ 1 | studyid),
  data = hierdat,
  method = "REML"

Multivariate Meta-Analysis Model (k = 68; method: REML)

Variance Components:

            estim    sqrt  nlvls  fixed   factor 
sigma^2.1  0.1566  0.3957     68     no     esid 
sigma^2.2  0.0000  0.0000     15     no  studyid 

Test for Residual Heterogeneity:
QE(df = 65) = 297.0172, p-val < .0001

Test of Moderators (coefficients 2:3):
QM(df = 2) = 27.2659, p-val < .0001

Model Results:

         estimate      se     zval    pval   ci.ub      
intrcpt   -0.1118  0.2474  -0.4520  0.6513  -0.5966  0.3730      
males      0.0022  0.0034   0.6467  0.5178  -0.0044  0.0088      
binge      0.6744  0.1313   5.1349  <.0001   0.4170  0.9319  *** 

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
            Estimate  Std. Error    t value       df    Pr(>|t|)
intrcpt -0.111796564 0.318156355 -0.3513888 1.794988 0.762200367
males    0.002173683 0.004380026  0.4962718 1.882842 0.671549040
binge    0.674435042 0.121660936  5.5435628 4.167780 0.004585142

One advantage here is that it’s possible to compare the model-based standard errors to the robust ones. In this instance, the two are fairly similar. However, the degrees of freedom estimated in the robust results indicate that the model-based standard errors (based on normal approximations) may be much too narrow.

Differences between robumeta and my implementation

There are two important differences between the approach implemented in robumeta and the approach based on metafor and the code that I’ve provided. The first is that robumeta uses moment estimators for the variance components, whereas metafor uses restricted- or full maximum likelihood. The estimated between-study heterogeneity (and for the hierarchical effects model, the within-study heterogeneity as well) will therefore differ to some degree.

The second, and perhaps more crucial, distinction has to do with the choice of weights. Weights are used for two purposes: to estimate the fixed effects and to calculate the small-sample correction. The robumeta package uses diagonal weights for both purposes. Using diagonal weights in calculating the fixed effects means that the resulting point estimates will be equivalent to those from a weighted ordinary least squares regression:

WOLS <- lm(effectsize ~ males + binge, data = hierdat, weights = 1 / var_HTJ)
 (Intercept)        males        binge 
-0.098869582  0.002002043  0.679929801 
all.equal(coef(WOLS), as.numeric(robu_hier$b.r), check.attributes = FALSE)
[1] TRUE

A subtler point is that robumeta uses the inverse weights for purposes of calculating the small sample-correction. The small sample correction involves choosing a “working” or “target” covariance matrix towards which to adjust the sandwich estimator. If the working covariance model is correct, then the BRL covariance estimator is exactly unbiased. The working matrix is also used to determine the Satterthwaite degrees of freedom. In robumeta, the working covariance matrix is taken to be inverse of the weights, which is also a diagonal matrix. Thus, the BRL correction amounts to assuming independence among all of the effect sizes. This may sound somewhat counter-intuitive, but some simulation results (reported in Beth’s paper, referenced above) suggest that the resulting estimators perform well even when the working independence assumption is not correct.

In contrast to the robumeta weights, metafor calculates the fixed effects based on a weighting matrix that is exactly inverse variance for given estimates of the variance components. Typically, the weighting matrix will be block-diagonal but may have off-diagonal entries corresponding to effect sizes drawn from the same study. Furthermore, my implementation of BRL uses the estimated covariance matrix derived from the posited random effects structure; in other words, the working covariance structure is taken to be the same as the model specified in the metafor call. This seems sensible to me, although I do not have any evidence regarding its performance relative to the alternatives. It is possible that any gains in asymptotic efficiency from using exactly inverse variance weights are outweighed by some sort of instability in small samples. It’s also possible that the performance of the different approaches to weighting might depend on which variance component estimators are used (i.e., MOM vs. REML).

Neither implementation that I’ve described above is fully general. Following the generalized estimating equation framework, a fully general implementation would allow the user to specify an arbitrary weight matrix in addition to a working covariance structure. The weighting matrix would be used for purposes of estimating the fixed effects. The working covariance model would be estimated (based on MOM or REML or what-not) and then used for purposes of BRL adjustment. Of course, this fully general formulation may well be more complicated than what most analysts would actually need or use (especially for linear mixed models), except perhaps when dealing with complex survey data.

Back to top