Small-sample adjustments for tests of moderators and model fit using robust variance estimation in meta-regression

Elizabeth Tipton & James E. Pustejovsky

March 6, 2015

Meta-analysis and meta-regression

When one has many intervention studies conducted on a single topic, we may want to pool the results:

  • Meta-analysis lets us pools results across studies to obtain estimates of overall efficacy
  • Meta-regression lets us answer further questions about variation in efficacy.
  • For example, "Do the results vary in relation to…"
    • Features of the participants in the experiment (e.g., children, teenagers)
    • Dosage (e.g., weeks)
    • Outcomes measured (e.g., total math, subscale scores, science)
    • Study design (e.g., RCT, quasi-experiment)

Dependent effect sizes

  • In meta-analysis, studies often report multiple effect sizes
    • Outcomes from multiple tests on the same participants (e.g., math, reading)
    • Multiple measures of performance on the same participants (e.g., accuracy, response time)
    • Outcomes at multiple time points (e.g., 1-week, 1-month, 1-year)
    • Outcomes from multiple experiments (with different participants, but in the same lab)
  • Model-based meta-analysis has provided two methods for pooling:
    • Univariate meta-analysis, where each study contributes a single effect size, or
    • Multivariate meta-analysis, where the covariance structure of the multiple effect sizes is known.
  • Neither approach is ideal
    • Univariate meta-analysis results in a loss of information
    • Multi-variate meta-analysis requires information that is rarely reported in studies.
  • In this talk, we will focus on this second approach and its robust alternative.

Meta-regression model

If each study contributes multiple effect sizes, then the general meta-regression model can be written in vector form: Tj=Xjβ+ϵj for j=1,...,m studies, where

  • Tj is a vector of nj effect size estimates from study j
  • Xj is a nj×p matrix of covariates for study j
  • β is a vector of p meta-regression coefficients
  • ϵj is a vector of residual errors for study j with covariance matrix Σj

Given a set of weights, we can estimate β using weighted least squares: b=Mm∑j=1X′jWjTj,whereM=(m∑j=1X′jWjXj)−1

Model-based meta-regression

Estimating the standard error of b is more difficult.

  • If we assume that the weights are inverse variance, with Wj=Σ−1j, then Var(b)=M.
  • This is the multivariate meta-analysis approach, which is "model based."
    • It requires correct specification of the covariance matrices Σj and the associated weights Wj.
  • If the true structure of the errors is unknown or mis-specified, then Var(b) is wrong.

Robust variance estimation

Robust variance estimation (RVE; Hedges, Tipton, & Johnson, 2010) produces asymptotically valid estimates of the variance of b, even if the error structure is mis-specified.

  • RVE uses a "sandwich" estimator: VR=M(m∑j=1X′jWjeje′jWjXj)M where ej=Tj−Xjb.

Hypothesis testing

  • In large samples, we can use this variance estimator to construct hypothesis tests. For testing βs=0, z=bs/√VRss follows a standard normal distribution if m is "big enough."
  • In smaller samples, Hedges et al. (2010) suggested that a t-distribution may be more appropriate, with t=bs/√VRss(mm−p) compared to a t-distribution with m−p degrees of freedom.

Tests of multiple meta-regression coefficients

  • Some hypotheses involve more than one meta-regression coefficient
    • Test equality of several levels of a moderator
    • Test of overall model fit
  • We consider linear hypotheses of the form Cβ=c for q×p contrast matrix C and q×1 vector c.
  • We can construct a Wald test statistic: Q=(Cb−c)′(CVRC′)−1(Cb−c)
  • In large samples, we would expect Q to follow a chi-squared distribution with q degrees of freedom.
  • In smaller samples, an F-test might be better, with
    Q/q˙∼F(q,m−p) But how does this test perform?

Simulated type-I error rate of F-test

plot of chunk simulation_F

Small-sample corrections

  • The originally proposed t-tests have inflated Type-I error with fewer than 40 studies (Hedges et al., 2010; Tipton, 2013, 2014; Williams, 2012).
  • Tipton (in press) devised small-sample corrections for t-tests. These corrections involve two parts:
    • Adjustments to the variance estimator VR
    • Estimated degrees of freedom for the t-distribution
  • The focus of this paper is on developing similar small-sample methods for F-tests.

Corrections to the RVE covariance matrix

  • Corrections to the RVE estimator based on McCaffrey, Bell, & Botts' (2001) "bias-reduced linearization" approach, using a working model for the error structure: VR=M(m∑j=1X′jWjAjeje′jA′jWjXj)M where the adjustment matrices A1,...,Am are chosen so that E(VR)=M when the working model is correct.
  • Simulation results (for both the t-test and F-test) indicate that the correction helps even if the working model is incorrect.

Potential corrections for F-tests

  • The small-sample t-test developed by Tipton (in press) also adjusted the degrees of freedom.
    • These were estimated using a Satterthwaite approximation.
    • These degrees of freedom vary in relation to the sample size m, the number of parameters p and features of the covariate.
  • By extension, we will look for a degrees-of-freedom correction for F test.
  • Drawing on extant literature, we investigated a wide variety of possible corrections.
  • Eigenvalue decompositions
    • Fai-Cornelius (1996): mixed models
    • Cai-Hayes (2008): heteroskedasticity robust standard errors
  • Hotellings T-squared approximation
    • Zhang (2012, 2013): heteroskedastic ANOVA/MANOVA
    • Pan-Wall (2002): generalized estimating equations

The Winner: T2Z

  • The paper provides results for five different corrections. Here, however, we'll focus on only the one that works best.

  • The T2Z approach involves:
    • Finding the mean and variance of robust covariance matrix (under a working model)
    • Approximating the distribution of robust covariance matrix using a Wishart distribution
    • Matching the mean and total variance of the robust covariance matrix to estimate the Wishart degrees of freedom
    • Q/q tested against Hotelling's T2 distribution
  • The T2Z is best in two regards
    • It is (almost) always level-alpha
    • It is more powerful than any of the other level-alpha estimators (i.e., always has error rates closer to nominal)

Simulation results: T2Z

plot of chunk simulation_Z

Example: Wilson et al. (2011)

  • Wilson, Lipsey, Tanner-Smith, Huang, & Steinka-Fry (2011) synthesis of effects of dropout prevention/intervention programs.
    • Primary outcomes: school completion, school dropout
  • m=152 studies, containing 385 effect size estimates
    • Some studies included effect sizes for multiple outcomes, measured on the same sample
    • Some studies include effect sizes from multiple samples
  • Meta-regression model including several categorical moderators
    • Study design: 3 levels (non-experimental, matched groups, randomized experiment)
    • Outcome measure: 4 levels (school enrollment, dropout, graduation, graduation or GED)
    • Evaluator independence: 4 levels (involved in delivery, involved in planning, indirect involvment, independent)
    • Implementation quality: 3 levels (clear problems, possible problems, no apparent problems)
    • Program format: 4 levels (community-based, classroom-based, school-based, multiple formats)

Wilson et al. (2011) test results

Moderator q Naive F p-value T-squared Z d.f. p-value
Study design 2 0.23 0.796 0.22 43 0.800
Outcome measure 3 0.91 0.436 0.84 22 0.488
Evaluator independence 3 3.11 0.029 2.78 17 0.073
Implementation quality 2 14.15 <0.001 13.78 37 <0.001
Program format 3 3.85 0.011 3.65 38 0.021

Naive F test uses m−p=130 degrees of freedom.

Conclusions and future work

  • Like Tipton (in press) found with the small-sample t-test…
    • The performance of the large-sample test depends on features of the underlying covariate properties.
    • Consequently, it is hard to know a priori what constitutes a "big enough" sample.
  • We therefore recommend that small-sample corrections should always be used in practice.
  • We provide prototype software in R (upon request), and are working on implementing it fully into the robumeta R package and Stata macro and the metafor R package (Viechtbauer, 2010).
  • Future work
    • Investigate power of tests based on RVE versus model-based methods.
    • Investigate other areas of application beyond meta-analysis, including
      • Hierarchical linear models
      • Econometric panel data models

References

Cai, L., & Hayes, A. F. (2008). A new test of linear hypotheses in OLS regression under heteroscedasticity of unknown form. Journal of Educational and Behavioral Statistics, 33(1), 2140.

Fai, A. H.-T., & Cornelius, P. (1996). Approximate F-tests of multiple degree of freedom hypotheses in generalized least squares analyses of unbalanced split-plot experiments. Journal of Statistical Computation and Simulation, 54(4), 363378.

Hedges, L. V, Tipton, E., & Johnson, M. C. (2010). Robust variance estimation in meta-regression with dependent effect size estimates. Research Synthesis Methods, 1(1), 3965.

McCaffrey, D. F., Bell, R. M., & Botts, C. H. (2001). Generalizations of biased reduced linearization. In Proceedings of the Annual Meeting of the American Statistical Association.

Pan, W., & Wall, M. M. (2002). Small-sample adjustments in using the sandwich variance estimator in generalized estimating equations. Statistics in Medicine, 21(10), 142941.

Tipton, E. (in press). Small sample adjustments for robust variance estimation with meta-regression. Psychological Methods.

References (continued)

Wilson, S. J., Lipsey, M. W., Tanner-Smith, E., Huang, C. H., & Steinka-Fry, K. T. (2011). Dropout prevention and intervention programs: Effects on school completion and dropout Aaong school-aged children and youth: A systematic review. Campbell Systematic Reviews, 7(8).

Zhang, J.-T. (2012). An approximate Hotelling T2 -test for heteroscedastic one-way MANOVA. Open Journal of Statistics, 2, 111.

Zhang, J.-T. (2013). Tests of linear hypotheses in the ANOVA under heteroscedasticity. International Journal of Advanced Statistics and Probability, 1(2), 924.

Small-sample adjustments for tests of moderators and model fit using robust variance estimation in meta-regression

Additional results

Simulation results: EDT test

plot of chunk simulation_EDT

Comparison of small-sample corrections

plot of chunk sim_comparison