## Code

```
library(tidyverse)
data(corrdat, package = "robumeta")
# sort by study
<- arrange(corrdat, studyid, esid) corrdat
```

effect size

meta-analysis

dependent effect sizes

Author

James E. Pustejovsky

Published

July 2, 2019

In meta-analyses of psychology, education, and other social science research, it is very common that some of the included studies report more than one relevant effect size. For example, in a meta-analysis of intervention effects on reading outcomes, some studies may have used multiple measures of reading outcomes (each of which meets inclusion criteria), or may have measured outcomes at multiple follow-up times; some studies might have also investigated more than one version of an intervention, and it might be of interest to include effect sizes comparing each version to the no-intervention control condition; and it’s even possible that some studies may have *all* of these features, potentially contributing *lots* of effect size estimates.

These situations create a technical challenge for conducting a meta-analysis. Because effect size estimates from the same study are correlated, it’s not usually reasonable to use methods that are premised on each effect size estimate being independent (i.e., univariate methods). Instead, the analyst needs to apply methods that take into account the dependencies among estimates coming from the same study. It used to be common to use ad hoc approaches for handling dependence, such as averaging the estimates together or selecting one estimate per study and then using univariate methods (cf. Becker, 2000). More sophisticated, multivariate meta-analysis (MVMA) models that directly account for correlations among the effect size estimates had been developed (Kalaian & Raudenbush, 1996) but were challenging to implement and so rarely used (at least, that’s my impression). More recently, techniques such as multi-level meta-analysis (MLMA, Van den Noortgate et al., 2013, 2015) and robust variance estimation (RVE, Hedges et al., 2010) have emerged, which account for dependencies while using all available effect size estimates and still being feasible to implement. These new techniques of MLMA and RVE are starting to be more widely adopted in practice, and it is not implausible that they will become the standard approach in psychological and educational meta-analysis within a few years.

Given the extent of interest in MLMA and RVE, one might wonder: are the older ad hoc approaches *ever* reasonable or appropriate? I think that some are, under certain circumstances. In this post I’ll highlight one such circumstance, where aggregating effect size estimates is not only reasonable but leads to *exactly the same results* as a multivariate model. This occurs when two conditions are met:

- We are not interested in within-study heterogeneity of effects and
- Any predictors included in the model vary between studies but not within a given study (i.e., effect sizes from the same study all have the same values of the predictors).

In short, if all we care about is understanding between-study variation in effect sizes, then it is fine to aggregate them up to the study level.

To make this argument precise, let me lay out a model where it applies. For full generality, I’ll consider a meta-regression model for a collection of \(K\) studies, where study \(k\) contributes \(J_k \geq 1\) effect size estimates. Let \(T_{jk}\) denote effect size estimate \(j\) in study \(k\), with sampling variance \(S_{jk}^2\). Effect size estimates from study \(k\) maybe be correlated at the sampling level, with correlation \(\rho_{ijk}\) between effect size estimates \(i\) and \(j\) from study \(k\). I will assume that the correlations are known, although in practice one might need to just take a guess about the degree of correlation, such as by assuming \(\rho_{ijk} = 0.7\) for all pairs of estimates from each included study. Let \(\mathbf{x}_k\) be a row vector of predictor variables for study \(k\). Note that the predictors do not have a subscript \(j\) because I’m assuming here that they are constant within a study.

A multivariate meta-regression model for these data might be: \[
T_{jk} = \mathbf{x}_k \boldsymbol\beta + u_k + e_{jk},
\] where \(u_k\) is a between-study random effect with variance \(\tau^2\) and \(e_{jk}\) is the sampling error for effect size \(j\) from study \(k\), assumed to have known variance \(S_{jk}^2\). Errors from the same study are correlated, so \(\text{Cov}(e_{ik}, e_{jk}) = \rho_{ijk} S_{ik} S_{jk}\). This is a commonly considered model for dependent effect size estimates. In the paper that introduced RVE, Hedges et al. (2010) termed it the “correlated effects” model (implemented in `robumeta`

as `model = "CORR"`

, which is the default). Note that it also satisfies the conditions I outlined above: no within-study random effects, predictors that vary only between study. We can fit it using the `rma.mv()`

function in the `metafor`

package, as I will demonstrate below.

An alternative to this multivariate model would be to first average the effects within each study, then fit a univariate random effects model. Just how we do the averaging will matter: we’ll need to use inverse-variance weighting. Let \(\mathbf{T}_k\) be the \(J_k \times 1\) vector of effect size estimates from study \(k\). Let \(\mathbf{S}_k\) be the \(J_k \times J_k\) sampling covariance matrix for \(\mathbf{T}_k\), and let \(\mathbf{1}_k\) be a \(J_k \times 1\) vector of 1s. The inverse-variance weighted average of the effects from study k can then be written as \[
\bar{T}_k = V_k \mathbf{1}_k’ \mathbf{S}_k^{-1} \mathbf{T}_k,
\] where \(V_k = 1 / (\mathbf{1}_k’ \mathbf{S}_k^{-1} \mathbf{1}_k)\). The quantity \(V_k\) is also the sampling variance of \(\bar{T}_k\).^{1}

A conventional, univariate random effects model for the averaged effect sizes is \[
\bar{T}_k = \mathbf{x}_k \boldsymbol\beta + u_k + \bar{e}_k,
\] where \(\text{Var}(u_k) = \tau^2\) and \(\text{Var}(\bar{e}_k) = V_k\). This model can be fit using `rma.uni`

from `metafor`

. In fact, doing so will yield the same estimates of model parameters as fitting the multivariate model—for all intents and purposes, they are equivalent models. There are at several different ways to see that this equivalence holds. I’ll offer three, from most practical to most theoretical. (If you’d rather just take my word that this claim is true, feel free to skip down to the last section, where I comment on implications.)

One good way to check the equivalence of the univariate and multivariate models is to apply both to a dataset. I’ll use the data from a stylized example described in Tanner-Smith & Tipton (2013), looking at the effects of alcohol abuse interventions on alcohol consumption among adolescents and young adults. (The data are simulated for teaching purposes, so don’t infer anything about real life from the results below!) The data are included in the `robumeta`

package:

The data consist of 172 effect sizes from 39 studies. Some studies report effects at multiple follow-up times and/or for multiple programs compared to a common control condition, leading to dependent effect size estimates.The data also include variables encoding a variety of sample and study characteristics, such as whether the study was conducted with a college student sample and the gender composition of the sample:

```
esid studyid effectsize var binge followup males college
3 4006 1 0.2086383 0.03246468 1 51.42857 67 0
1 4016 1 0.2244635 0.03244931 1 51.42857 67 0
2 4026 1 0.3151743 0.03278697 1 51.42857 67 0
5 3513 2 0.2220929 0.01972874 0 17.14286 81 1
9 3514 2 -0.1922628 0.02031393 0 17.14286 86 1
8 3556 2 0.3273109 0.01987042 0 17.14286 81 1
```

Suppose that we are interested in estimating the differences in average effects by type of sample (college versus adolescent), controlling for the proportion of males in the study. For some reason, there is within-study variation in the percentage of males, so I’ll take the study-level average for this covariate:

We can then fit this model using a multi-variate meta-regression in metafor.

In order to estimate the model, we’ll first need to create a variance-covariance matrix for the effect size estimates in each study, which can be accomplished using `impute_covariance_matrix`

from `clubSandwich`

(further details here). I’ll assume a correlation of 0.6 between pairs of effect sizes within a given study:

```
Multivariate Meta-Analysis Model (k = 172; method: REML)
Variance Components:
estim sqrt nlvls fixed factor
sigma^2 0.0590 0.2429 39 no studyid
Test for Residual Heterogeneity:
QE(df = 169) = 815.2448, p-val < .0001
Test of Moderators (coefficients 2:3):
QM(df = 2) = 9.9016, p-val = 0.0071
Model Results:
estimate se zval pval ci.lb ci.ub
intrcpt 0.6466 0.2693 2.4007 0.0164 0.1187 1.1744 *
college 0.3703 0.1317 2.8123 0.0049 0.1122 0.6283 **
males -0.0076 0.0038 -1.9832 0.0473 -0.0152 -0.0001 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Alternately, we could aggregate the effects up to the study level and then fit a univariate meta-regression using the same moderators. Here is a function to calculate the aggregated effect size estimates and variances:

Here’s the data-munging:

```
# A tibble: 6 × 5
studyid es var males college
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0.249 0.0239 67 0
2 2 -0.0210 0.0129 81 1
3 3 0.726 0.0819 76.2 0
4 4 0.370 0.0431 80 1
5 5 -0.0911 0.0281 79 0
6 6 -0.416 0.0111 74 0
```

And here’s the meta-regression:

```
Mixed-Effects Model (k = 39; tau^2 estimator: REML)
tau^2 (estimated amount of residual heterogeneity): 0.0590 (SE = 0.0242)
tau (square root of estimated tau^2 value): 0.2429
I^2 (residual heterogeneity / unaccounted variability): 61.42%
H^2 (unaccounted variability / sampling variability): 2.59
R^2 (amount of heterogeneity accounted for): 19.12%
Test for Residual Heterogeneity:
QE(df = 36) = 96.7794, p-val < .0001
Test of Moderators (coefficients 2:3):
QM(df = 2) = 9.9016, p-val = 0.0071
Model Results:
estimate se zval pval ci.lb ci.ub
intrcpt 0.6466 0.2693 2.4007 0.0164 0.1187 1.1744 *
college 0.3703 0.1317 2.8123 0.0049 0.1122 0.6283 **
males -0.0076 0.0038 -1.9832 0.0473 -0.0152 -0.0001 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The heterogeneity estimates are nearly equal (the difference is due to using numerical optimization):

And the meta-regression coefficient estimates are identical to six decimal places:

```
intrcpt college males
0.646561371 0.370274721 -0.007633517
```

```
intrcpt college males
0.646561352 0.370274307 -0.007633519
```

`[1] "Mean relative difference: 4.243578e-07"`

For this example we arrive at the same results using either multivariate meta-analysis or univariate meta-analysis of aggregated effect size estimates.^{2} The main limitation of this illustration is generality—how can we be sure that these results aren’t just a quirk of this particular dataset? Would we get the same results for *any* dataset?

Here’s another, somewhat more general perspective on the relationship between the models: the univariate model can be *derived* directly from the multivariate one. Start with the multivariate model in matrix form: \[
\mathbf{T}_k = \mathbf{x}_k \boldsymbol\beta \mathbf{1}_k + u_k \mathbf{1}_k + \mathbf{e}_k,
\] where \(\mathbf{e}_k\) is the vector of sampling errors for study \(k\), with \(\text{Var}(\mathbf{e}_k) = \mathbf{S}_k\). Pre-multiply both sides by \(V_k \mathbf{1}_k’ \mathbf{S}_k^{-1}\) to get \[
\begin{aligned}
V_k \mathbf{1}_k’ \mathbf{S}_k^{-1} \mathbf{T}_k &= V_k \left(\mathbf{1}_k’ \mathbf{S}_k^{-1} \mathbf{1}_k\right) \mathbf{x}_k \boldsymbol\beta + u_k V_k \left(\mathbf{1}_k’ \mathbf{S}_k^{-1} \mathbf{1}_k\right) + V_k \mathbf{1}_k’ \mathbf{S}_k^{-1} \mathbf{e}_k \\
\bar{T}_k &= \mathbf{x}_k \boldsymbol\beta + u_k + \bar{e}_k,
\end{aligned}
\] where \(\text{Var}(\bar{e}_k) = V_k \mathbf{1}_k’ \mathbf{S}_k^{-1} \mathbf{S}_k \mathbf{S}_k^{-1} \mathbf{1}_k V_k = V_k\), just as in the univariate model.

This demonstrates that the parameters of the two models are the same quantities—that is, both models are estimating the same thing. But that would also hold if we used *any* weighted average of \(\mathbf{T}_k\)—it needn’t be inverse-variance. The only thing that would be different is \(\text{Var}(\bar{e}_k)\). To fully establish the equivalence of the two models, I’ll examine the likelihoods of each model.

Multivariate meta-analysis models are typically estimated by full maximum likelihood (FML) or restricted maximum likelihood methods. FML and RML are also commonly used for univariate meta-analysis. With these methods, estimates are obtained as the parameter values that maximize the log likelihood of the model, given the data (or the restricted likelihood for RML). Therefore, we can establish the exact equivalence of parameter estimates by showing that the log likelihood of the univariate and multivariate models differ by a constant value (so that the location of the maxima are identical).

For the univariate model, the log-likelihood contribution of study \(k\): \[ l^{U}_k\left(\boldsymbol\beta, \tau^2\right) = -\frac{1}{2} \log\left(\tau^2 + V_k\right) - \frac{1}{2} \frac{\left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right)^2}{\tau^2 + V_k}. \] For the multivariate model, the log-likelihood contribution of study \(k\) is: \[ l^{MV}_k\left(\boldsymbol\beta, \tau^2\right) = -\frac{1}{2} A -\frac{1}{2} B \] where \[ A = \log\left|\tau^2\mathbf{1}_k\mathbf{1}_k' + \mathbf{S}_k\right| \] and \[ B = \left(\mathbf{T}_k - \mathbf{x}_k \boldsymbol\beta \mathbf{1}_k\right)' \left(\tau^2\mathbf{1}_k\mathbf{1}_k' + \mathbf{S}_k\right)^{-1} \left(\mathbf{T}_k - \mathbf{x}_k \boldsymbol\beta \mathbf{1}_k\right). \] The term \(A\) can be rearranged as \[ A = \log\left|\left(\tau^2\mathbf{1}_k\mathbf{1}_k'\mathbf{S}_k^{-1} + \mathbf{I}_k\right) \mathbf{S}_k\right| \] where \(\mathbf{I}_k\) is a \(J_k \times J_k\) identity matrix. One of the properties of determinants is that the determinant of a product of two matrices is equal to the product of the determinants. Another is that, for two vectors \(\mathbf{u}\) and \(\mathbf{v}\), \(\left|\mathbf{I} + \mathbf{u}\mathbf{v}'\right| = 1 + \mathbf{v}'\mathbf{u}\). Applying both of these properties, it follows that \[ \begin{aligned} A &= \log\left|\left(\tau^2\mathbf{1}_k\mathbf{1}_k'\mathbf{S}_k^{-1} + \mathbf{I}_k\right) \mathbf{S}_k\right| \\ &= \log \left( \left|\mathbf{I}_k + \tau^2\mathbf{1}_k\mathbf{1}_k'\mathbf{S}_k^{-1}\right| \left|\mathbf{S}_k\right|\right) \\ &= \log \left(1 + \frac{\tau^2}{V_k}\right) + \log \left|\mathbf{S}_k\right| \\ &= \log(\tau^2 + V_k) - \log(V_k) + \log \left|\mathbf{S}_k\right|. \end{aligned} \] The \(B\) term takes a little more work. From the Sherman-Morrison identity, we have that: \[ \left(\tau^2\mathbf{1}_k\mathbf{1}_k' + \mathbf{S}_k\right)^{-1} = \mathbf{S}_k^{-1} - \mathbf{S}_k^{-1} \mathbf{1}_k \left(\frac{1}{\tau^2} + \frac{1}{V_k}\right)^{-1} \mathbf{1}_k'\mathbf{S}_k^{-1}, \tag{1}\] by which it follows that \[ \mathbf{1}_k'\left(\tau^2\mathbf{1}_k\mathbf{1}_k' + \mathbf{S}_k\right)^{-1}\mathbf{1}_k = \frac{1}{\tau^2 + V_k}. \tag{2}\] Now, rearrange the \(B\) term to get \[ \begin{aligned} B &= \left[\mathbf{T}_k - \bar{T}_k \mathbf{1}_k + \left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right) \mathbf{1}_k\right]' \left(\tau^2\mathbf{1}_k\mathbf{1}_k' + \mathbf{S}_k\right)^{-1} \left[\mathbf{T}_k - \bar{T}_k \mathbf{1}_k + \left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right) \mathbf{1}_k\right] \\ &= B_1 + 2 B_2 + B_3 \end{aligned} \] where \[ \begin{aligned} B_1 &= \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \left(\tau^2\mathbf{1}_k\mathbf{1}_k' + \mathbf{S}_k\right)^{-1} \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right) \\ B_2 &= \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \left(\tau^2\mathbf{1}_k\mathbf{1}_k' + \mathbf{S}_k\right)^{-1} \mathbf{1}_k \left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right) \\ B_3 &= \left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right) \mathbf{1}_k' \left(\tau^2\mathbf{1}_k\mathbf{1}_k' + \mathbf{S}_k\right)^{-1} \mathbf{1}_k \left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right) \end{aligned} \] Applying Equation 1 to \(B_1\), \[ \begin{aligned} B_1 &= \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \left[\mathbf{S}_k^{-1} - \mathbf{S}_k^{-1} \mathbf{1}_k \left(\frac{1}{\tau^2} + \frac{1}{V_k}\right)^{-1} \mathbf{1}_k'\mathbf{S}_k^{-1}\right] \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right) \\ &= \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \mathbf{S}_k^{-1}\left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right) - \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \mathbf{S}_k^{-1} \mathbf{1}_k \left(\frac{1}{\tau^2} + \frac{1}{V_k}\right)^{-1} \mathbf{1}_k'\mathbf{S}_k^{-1}\left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right) \\ &= \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \mathbf{S}_k^{-1}\left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right). \end{aligned} \] The second term drops out because \(\left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \mathbf{S}_k^{-1} \mathbf{1}_k = \bar{T}_k / V_k - \bar{T}_k / V_k = 0\). Along similar lines, \[ \begin{aligned} B_2 &= \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \left[\mathbf{S}_k^{-1} - \mathbf{S}_k^{-1} \mathbf{1}_k \left(\frac{1}{\tau^2} + \frac{1}{V_k}\right)^{-1} \mathbf{1}_k'\mathbf{S}_k^{-1}\right] \mathbf{1}_k \left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right) \\ &= \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \mathbf{S}_k^{-1}\mathbf{1}_k \left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right) - \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \mathbf{S}_k^{-1} \mathbf{1}_k \left(\frac{1}{\tau^2} + \frac{1}{V_k}\right)^{-1} \mathbf{1}_k'\mathbf{S}_k^{-1}\mathbf{1}_k \left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right) \\ &= 0. \end{aligned} \] Finally, the third term simplifies using Equation 2: \[ B_3 = \frac{\left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right)^2}{\tau^2 + V_k}. \] Thus, the full \(B\) term reduces to \[ B = \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \mathbf{S}_k^{-1}\left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right) + \frac{\left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right)^2}{\tau^2 + V_k} \] and the multivariate log likelihood contribution is \[ \begin{aligned} l^{MV}_k\left(\boldsymbol\beta, \tau^2\right) &= -\frac{1}{2} \log(\tau^2 + V_k) + \frac{1}{2} \log(V_k) - \frac{1}{2}\log \left|\mathbf{S}_k\right| - \frac{1}{2} \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \mathbf{S}_k^{-1}\left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right) -\frac{1}{2} \frac{\left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right)^2}{\tau^2 + V_k} \\ &= l^U_k\left(\boldsymbol\beta, \tau^2\right) + \frac{1}{2} \log(V_k) - \frac{1}{2}\log \left|\mathbf{S}_k\right| - \frac{1}{2} \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \mathbf{S}_k^{-1}\left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right). \end{aligned} \] The last three terms depend on the data (\(\mathbf{T}_k\) and \(\mathbf{S}_k\)) but not on the parameters \(\boldsymbol\beta\) or \(\tau^2\). Therefore, the univariate and multivariate likelihoods will be maximized at the same parameter values, i.e., the FML estimators are identical.

In practice, it is more common to use RML estimation rather than FML. The RML estimators maximize a different objective function that includes the full likelihood, plus an additional term. The RML objective function for the univariate model is \[ \sum_{k=1}^K l^U_k(\boldsymbol\beta, \tau^2) - \frac{1}{2} R^U(\tau^2) \] where \[ R^U(\tau^2) = \log \left|\sum_{k=1}^k\frac{\mathbf{x}_k' \mathbf{x}_k}{\tau^2 + V_k} \right|. \] For the multivariate model, the RML objective is \[ \sum_{k=1}^K l^{MV}_k(\boldsymbol\beta, \tau^2) - \frac{1}{2} R^{MV}(\tau^2). \] where \[ \begin{aligned} R^{MV}(\tau^2) &= \log \left|\sum_{k=1}^k \mathbf{x}_k'\mathbf{1}_k'\left(\tau^2\mathbf{1}_k\mathbf{1}_k' + \mathbf{S}_k\right)^{-1}\mathbf{1}_k \mathbf{x}_k \right|\\ &= \log \left|\sum_{k=1}^k\frac{\mathbf{x}_k' \mathbf{x}_k}{\tau^2 + V_k} \right| \\ &= R^U(\tau^2) \end{aligned} \] because of Equation 2. Thus, the univariate and multivariate models also have the same RML estimators.

Beyond being a good excuse to write a bunch of matrix algebra, why does any of this matter? I think there are two main implications. First, it is useful to recognize the equivalence of these models in order to understand when the multivariate model is *necessary*. If both of the conditions that I’ve described hold, then it is entirely acceptable to use aggregation rather than the more complicated multivariate model.^{3} Using the simpler univariate model might be desirable in practice because it makes the analysis easier to follow, because it makes it easier to run diagnostics or create illustrations of the results, or because of software limitations. Conversely, if either of the conditions does not hold, then there may be differences between the two approaches and the analyst will need to think carefully about which method better addresses their research questions.

A second implication is computational: because it gives the same results, the univariate model could be used as a short-cut for fitting the multivariate model. Compare the differences in computational time:

```
Unit: milliseconds
expr min lq mean median uq max neval
uni 6.2403 6.8699 8.666934 7.5398 9.6606 18.2778 100
multi 55.9095 62.5514 72.086832 68.8157 73.4212 208.4049 100
```

If the aggregation is done in advance, it is *way* quicker to fit the univariate model. The short-cut would be useful if we needed to estimate *lots* of multi-variate meta-regressions (as long as the equivalence conditions hold). For example, if we needed to bootstrap the multivariate model, we could pre-compute the aggregated effects and then just bootstrap the much simpler, much quicker univariate model.

I suspect that the results I’ve presented here can be further generalized, but this will need a bit of further investigation. For one, there are also equivalences between variance estimators: using the CR2 cluster-robust variance estimator for the multivariate model is equivalent to using the HC2 heteroskedasticity-robust variance estimator for the univariate model with aggregated effects.^{4} For another, the same sort of equivalence relationships hold even if there are additional random effects in the model, so long as the random effects are at the study level or higher levels of aggregation (e.g., lab effects, where labs are nested within studies). I’ll leave these generalizations as exercises for a future rainy day.

Becker, B. J. (2000). Multivariate meta-analysis. In S. D. Brown & H. E. A. Tinsley (Eds.), *Handbook of applied multivariate statistics and mathematical modeling* (pp. 499–525). Academic Press. https://doi.org/10.1016/B978-012691360-6/50018-5

Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). *Introduction to Meta-Analysis*. John Wiley & Sons, Ltd. https://doi.org/10.1002/9780470743386

Hedges, L. V., Tipton, E., & Johnson, M. C. (2010). Robust variance estimation in meta-regression with dependent effect size estimates. *Research Synthesis Methods*, *1*(1), 39–65. https://doi.org/10.1002/jrsm.5

Kalaian, H. a., & Raudenbush, S. W. (1996). A multivariate mixed linear model for meta-analysis. *Psychological Methods*, *1*(3), 227–235. https://doi.org/10.1037/1082-989X.1.3.227

Tanner-Smith, E. E., & Tipton, E. (2013). Robust variance estimation with dependent effect sizes: Practical considerations including a software tutorial in Stata and SPSS. *Research Synthesis Methods*, *5*(1), 1–34. https://doi.org/10.1002/jrsm.1091

Van den Noortgate, W., López-López, J. A., Marín-Martínez, F., & Sánchez-Meca, J. (2013). Three-level meta-analysis of dependent effect sizes. *Behavior Research Methods*, *45*(2), 576–594. https://doi.org/10.3758/s13428-012-0261-6

Van den Noortgate, W., López-López, J. A., Marín-Martínez, F., & Sánchez-Meca, J. (2015). Meta-analysis of multiple outcomes: A multilevel approach. *Behavior Research Methods*, *47*(4), 1274–1294. https://doi.org/10.3758/s13428-014-0527-2

A common special case is that the sampling variances for effect sizes within a given study \(k\) are

*all equal*, so that \(S_{ik} = s_{jk} = S_k\) for \(i,j = 1,...,J_ik\) and \(k = 1,...,K\). We might further posit that there is a constant sampling correlation between every pair of effect sizes within a given study, so that \(\rho_{ijk} = \rho_k\) for \(i,j = 1,...,J_ik\) and \(k = 1,...,K\). If both of these conditions hold, then the inverse-variance weighted average effect size simplifies to the arithmetic average \[ \bar{T}_k = \frac{1}{J_k} \sum_{j=1}^{J_k} T_{jk} \] with sampling variance \[ V_k = \frac{(J_k - 1)\rho_k + 1}{J} \times S_k^2 \] (cf. Borenstein et al., 2009, Eq. (24.6), p. 230).↩︎The same thing holds if we use FML rather than RML estimation—try it for yourself and see!↩︎

As RVE and MLMA become more wide-spread, I could imagine it happening that a meta-analyst who uses aggregation and a univariate model might get push-back from a reviewer, who uncritically recommends using a “more advanced” method to handle dependence. The results in this post provide a way for the meta-analyst to establish that doing so would be unnecessary.↩︎

Here’s verification with the computational example from above:

`Coef. Estimate SE t-stat d.f. (Satt) p-val (Satt) Sig. intrcpt 0.64656 0.17647 3.66 11.5 0.00345 ** college 0.37027 0.18648 1.99 11.9 0.07053 . males -0.00763 0.00287 -2.66 14.5 0.01826 *`

`Coef. Estimate SE t-stat d.f. (Satt) p-val (Satt) Sig. intrcpt 0.64656 0.17622 3.67 11.5 0.00342 ** college 0.37027 0.18597 1.99 11.9 0.06985 . males -0.00763 0.00287 -2.66 14.5 0.01808 *`

```
---
title: Sometimes, aggregating effect sizes is fine
date: '2019-07-02'
bibliography: "../meta-references.bib"
csl: "../apa.csl"
link-citations: true
categories:
- effect size
- meta-analysis
- dependent effect sizes
code-tools: true
toc: true
toc-title: Contents
---
In meta-analyses of psychology, education, and other social science research, it is very common that some of the included studies report more than one relevant effect size.
For example, in a meta-analysis of intervention effects on reading outcomes, some studies may have used multiple measures of reading outcomes (each of which meets inclusion criteria), or may have measured outcomes at multiple follow-up times; some studies might have also investigated more than one version of an intervention, and it might be of interest to include effect sizes comparing each version to the no-intervention control condition;
and it's even possible that some studies may have _all_ of these features, potentially contributing _lots_ of effect size estimates.
These situations create a technical challenge for conducting a meta-analysis.
Because effect size estimates from the same study are correlated, it's not usually reasonable to use methods that are premised on each effect size estimate being independent (i.e., univariate methods).
Instead, the analyst needs to apply methods that take into account the dependencies among estimates coming from the same study.
It used to be common to use ad hoc approaches for handling dependence, such as averaging the estimates together or selecting one estimate per study and then using univariate methods [cf. @Becker2000multivariate].
More sophisticated, multivariate meta-analysis (MVMA) models that directly account for correlations among the effect size estimates had been developed [@Kalaian1996multivariate] but were challenging to implement and so rarely used (at least, that's my impression).
More recently, techniques such as multi-level meta-analysis [MLMA, @VandenNoortgate2013threelevel; @VandenNoortgate2015metaanalysis] and robust variance estimation [RVE, @Hedges2010robust] have emerged, which account for dependencies while using all available effect size estimates and still being feasible to implement.
These new techniques of MLMA and RVE are starting to be more widely adopted in practice, and it is not implausible that they will become the standard approach in psychological and educational meta-analysis within a few years.
Given the extent of interest in MLMA and RVE, one might wonder: are the older ad hoc approaches _ever_ reasonable or appropriate?
I think that some are, under certain circumstances.
In this post I'll highlight one such circumstance, where aggregating effect size estimates is not only reasonable but leads to _exactly the same results_ as a multivariate model. This occurs when two conditions are met:
1. We are not interested in within-study heterogeneity of effects and
2. Any predictors included in the model vary between studies but not within a given study (i.e., effect sizes from the same study all have the same values of the predictors).
In short, if all we care about is understanding between-study variation in effect sizes, then it is fine to aggregate them up to the study level.
# A model that's okay to average
To make this argument precise, let me lay out a model where it applies.
For full generality, I'll consider a meta-regression model for a collection of $K$ studies, where study $k$ contributes $J_k \geq 1$ effect size estimates.
Let $T_{jk}$ denote effect size estimate $j$ in study $k$, with sampling variance $S_{jk}^2$.
Effect size estimates from study $k$ maybe be correlated at the sampling level, with correlation $\rho_{ijk}$ between effect size estimates $i$ and $j$ from study $k$.
I will assume that the correlations are known, although in practice one might need to just take a guess about the degree of correlation, such as by assuming $\rho_{ijk} = 0.7$ for all pairs of estimates from each included study.
Let $\mathbf{x}_k$ be a row vector of predictor variables for study $k$.
Note that the predictors do not have a subscript $j$ because I'm assuming here that they are constant within a study.
A multivariate meta-regression model for these data might be:
$$
T_{jk} = \mathbf{x}_k \boldsymbol\beta + u_k + e_{jk},
$$
where $u_k$ is a between-study random effect with variance $\tau^2$ and $e_{jk}$ is the sampling error for effect size $j$ from study $k$, assumed to have known variance $S_{jk}^2$.
Errors from the same study are correlated, so $\text{Cov}(e_{ik}, e_{jk}) = \rho_{ijk} S_{ik} S_{jk}$.
This is a commonly considered model for dependent effect size estimates.
In the paper that introduced RVE, @Hedges2010robust termed it the "correlated effects" model (implemented in `robumeta` as `model = "CORR"`, which is the default).
Note that it also satisfies the conditions I outlined above: no within-study random effects, predictors that vary only between study.
We can fit it using the `rma.mv()` function in the `metafor` package, as I will demonstrate below.
An alternative to this multivariate model would be to first average the effects within each study, then fit a univariate random effects model.
Just how we do the averaging will matter: we'll need to use inverse-variance weighting.
Let $\mathbf{T}_k$ be the $J_k \times 1$ vector of effect size estimates from study $k$. Let $\mathbf{S}_k$ be the $J_k \times J_k$ sampling covariance matrix for $\mathbf{T}_k$, and let $\mathbf{1}_k$ be a $J_k \times 1$ vector of 1s. The inverse-variance weighted average of the effects from study k can then be written as
$$
\bar{T}_k = V_k \mathbf{1}_k’ \mathbf{S}_k^{-1} \mathbf{T}_k,
$$
where $V_k = 1 / (\mathbf{1}_k’ \mathbf{S}_k^{-1} \mathbf{1}_k)$. The quantity $V_k$ is also the sampling variance of $\bar{T}_k$.[^specialcase]
[^specialcase]: A common special case is that the sampling variances for effect sizes within a given study $k$ are _all equal_, so that $S_{ik} = s_{jk} = S_k$ for $i,j = 1,...,J_ik$ and $k = 1,...,K$. We might further posit that there is a constant sampling correlation between every pair of effect sizes within a given study, so that $\rho_{ijk} = \rho_k$ for $i,j = 1,...,J_ik$ and $k = 1,...,K$. If both of these conditions hold, then the inverse-variance weighted average effect size simplifies to the arithmetic average
$$
\bar{T}_k = \frac{1}{J_k} \sum_{j=1}^{J_k} T_{jk}
$$
with sampling variance
$$
V_k = \frac{(J_k - 1)\rho_k + 1}{J} \times S_k^2
$$
[cf. @borenstein2009introduction, Eq. (24.6), p. 230].
A conventional, univariate random effects model for the averaged effect sizes is
$$
\bar{T}_k = \mathbf{x}_k \boldsymbol\beta + u_k + \bar{e}_k,
$$
where $\text{Var}(u_k) = \tau^2$ and $\text{Var}(\bar{e}_k) = V_k$.
This model can be fit using `rma.uni` from `metafor`.
In fact, doing so will yield the same estimates of model parameters as fitting the multivariate model---for all intents and purposes, they are equivalent models.
There are at several different ways to see that this equivalence holds.
I'll offer three, from most practical to most theoretical.
(If you'd rather just take my word that this claim is true, feel free to skip down to the [last section](#so-what), where I comment on implications.)
# Computational equivalence
One good way to check the equivalence of the univariate and multivariate models is to apply both to a dataset. I'll use the data from a stylized example described in @TannerSmith2013robust, looking at the effects of alcohol abuse interventions on alcohol consumption among adolescents and young adults. (The data are simulated for teaching purposes, so don't infer anything about real life from the results below!) The data are included in the `robumeta` package:
```{r, warning = FALSE, message = FALSE}
library(tidyverse)
data(corrdat, package = "robumeta")
# sort by study
corrdat <- arrange(corrdat, studyid, esid)
```
The data consist of `{r} nrow(corrdat)` effect sizes from `{r} length(unique(corrdat$studyid))` studies. Some studies report effects at multiple follow-up times and/or for multiple programs compared to a common control condition, leading to dependent effect size estimates.The data also include variables encoding a variety of sample and study characteristics, such as whether the study was conducted with a college student sample and the gender composition of the sample:
```{r}
head(corrdat)
```
Suppose that we are interested in estimating the differences in average effects by type of sample (college versus adolescent), controlling for the proportion of males in the study. For some reason, there is within-study variation in the percentage of males, so I'll take the study-level average for this covariate:
```{r}
corrdat <-
corrdat %>%
group_by(studyid) %>%
mutate(males = mean(males))
```
We can then fit this model using a multi-variate meta-regression in metafor.
In order to estimate the model, we'll first need to create a variance-covariance matrix for the effect size estimates in each study, which can be accomplished using `impute_covariance_matrix` from `clubSandwich` ([further details here](/posts/imputing-covariance-matrices-for-multi-variate-meta-analysis/)). I'll assume a correlation of 0.6 between pairs of effect sizes within a given study:
```{r, warning = FALSE, message = FALSE}
library(clubSandwich)
library(metafor)
V_list <- impute_covariance_matrix(vi = corrdat$var, cluster = corrdat$studyid, r = 0.6)
MV_fit <- rma.mv(effectsize ~ college + males, V = V_list,
random = ~ 1 | studyid,
data = corrdat, method = "REML")
MV_fit
```
Alternately, we could aggregate the effects up to the study level and then fit a univariate meta-regression using the same moderators. Here is a function to calculate the aggregated effect size estimates and variances:
```{r}
agg_effects <- function(yi, vi, r = 0.6) {
corr_mat <- r + diag(1 - r, nrow = length(vi))
sd_mat <- tcrossprod(sqrt(vi))
V_inv_mat <- chol2inv(chol(sd_mat * corr_mat))
V <- 1 / sum(V_inv_mat)
data.frame(es = V * sum(yi * V_inv_mat), var = V)
}
```
Here's the data-munging:
```{r}
corrdat_agg <-
corrdat %>%
group_by(studyid) %>%
summarise(
es = list(agg_effects(yi = effectsize, vi = var, r = 0.6)),
males = mean(males),
college = mean(college)
) %>%
unnest(es)
head(corrdat_agg)
```
And here's the meta-regression:
```{r}
uni_fit <- rma.uni(es ~ college + males, vi = var,
data = corrdat_agg, method = "REML")
uni_fit
```
The heterogeneity estimates are nearly equal (the difference is due to using numerical optimization):
```{r}
MV_fit$sigma2
uni_fit$tau2
```
And the meta-regression coefficient estimates are identical to six decimal places:
```{r}
coef(MV_fit)
coef(uni_fit)
all.equal(coef(MV_fit), coef(uni_fit))
```
For this example we arrive at the same results using either multivariate meta-analysis or univariate meta-analysis of aggregated effect size estimates.[^FML] The main limitation of this illustration is generality---how can we be sure that these results aren't just a quirk of this particular dataset? Would we get the same results for _any_ dataset?
[^FML]: The same thing holds if we use FML rather than RML estimation---try it for yourself and see!
# From multivariate to univariate model
Here's another, somewhat more general perspective on the relationship between the models: the univariate model can be _derived_ directly from the multivariate one. Start with the multivariate model in matrix form:
$$
\mathbf{T}_k = \mathbf{x}_k \boldsymbol\beta \mathbf{1}_k + u_k \mathbf{1}_k + \mathbf{e}_k,
$$
where $\mathbf{e}_k$ is the vector of sampling errors for study $k$, with $\text{Var}(\mathbf{e}_k) = \mathbf{S}_k$. Pre-multiply both sides by $V_k \mathbf{1}_k’ \mathbf{S}_k^{-1}$ to get
$$
\begin{aligned}
V_k \mathbf{1}_k’ \mathbf{S}_k^{-1} \mathbf{T}_k &= V_k \left(\mathbf{1}_k’ \mathbf{S}_k^{-1} \mathbf{1}_k\right) \mathbf{x}_k \boldsymbol\beta + u_k V_k \left(\mathbf{1}_k’ \mathbf{S}_k^{-1} \mathbf{1}_k\right) + V_k \mathbf{1}_k’ \mathbf{S}_k^{-1} \mathbf{e}_k \\
\bar{T}_k &= \mathbf{x}_k \boldsymbol\beta + u_k + \bar{e}_k,
\end{aligned}
$$
where $\text{Var}(\bar{e}_k) = V_k \mathbf{1}_k’ \mathbf{S}_k^{-1} \mathbf{S}_k \mathbf{S}_k^{-1} \mathbf{1}_k V_k = V_k$, just as in the univariate model.
This demonstrates that the parameters of the two models are the same quantities—that is, both models are estimating the same thing. But that would also hold if we used _any_ weighted average of $\mathbf{T}_k$---it needn't be inverse-variance. The only thing that would be different is $\text{Var}(\bar{e}_k)$. To fully establish the equivalence of the two models, I'll examine the likelihoods of each model.
# Equivalence of likelihoods
Multivariate meta-analysis models are typically estimated by full maximum likelihood (FML) or restricted maximum likelihood methods. FML and RML are also commonly used for univariate meta-analysis. With these methods, estimates are obtained as the parameter values that maximize the log likelihood of the model, given the data (or the restricted likelihood for RML). Therefore, we can establish the exact equivalence of parameter estimates by showing that the log likelihood of the univariate and multivariate models differ by a constant value (so that the location of the maxima are identical).
## Full likelihood
For the univariate model, the log-likelihood contribution of study $k$:
$$
l^{U}_k\left(\boldsymbol\beta, \tau^2\right) = -\frac{1}{2} \log\left(\tau^2 + V_k\right) - \frac{1}{2} \frac{\left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right)^2}{\tau^2 + V_k}.
$$
For the multivariate model, the log-likelihood contribution of study $k$ is:
$$
l^{MV}_k\left(\boldsymbol\beta, \tau^2\right) = -\frac{1}{2} A -\frac{1}{2} B
$$
where
$$
A = \log\left|\tau^2\mathbf{1}_k\mathbf{1}_k' + \mathbf{S}_k\right|
$$
and
$$
B = \left(\mathbf{T}_k - \mathbf{x}_k \boldsymbol\beta \mathbf{1}_k\right)' \left(\tau^2\mathbf{1}_k\mathbf{1}_k' + \mathbf{S}_k\right)^{-1} \left(\mathbf{T}_k - \mathbf{x}_k \boldsymbol\beta \mathbf{1}_k\right).
$$
The term $A$ can be rearranged as
$$
A = \log\left|\left(\tau^2\mathbf{1}_k\mathbf{1}_k'\mathbf{S}_k^{-1} + \mathbf{I}_k\right) \mathbf{S}_k\right|
$$
where $\mathbf{I}_k$ is a $J_k \times J_k$ identity matrix. One of the properties of determinants is that the determinant of a product of two matrices is equal to the product of the determinants. Another is that, for two vectors $\mathbf{u}$ and $\mathbf{v}$, $\left|\mathbf{I} + \mathbf{u}\mathbf{v}'\right| = 1 + \mathbf{v}'\mathbf{u}$. Applying both of these properties, it follows that
$$
\begin{aligned}
A &= \log\left|\left(\tau^2\mathbf{1}_k\mathbf{1}_k'\mathbf{S}_k^{-1} + \mathbf{I}_k\right) \mathbf{S}_k\right| \\
&= \log \left( \left|\mathbf{I}_k + \tau^2\mathbf{1}_k\mathbf{1}_k'\mathbf{S}_k^{-1}\right| \left|\mathbf{S}_k\right|\right) \\
&= \log \left(1 + \frac{\tau^2}{V_k}\right) + \log \left|\mathbf{S}_k\right| \\
&= \log(\tau^2 + V_k) - \log(V_k) + \log \left|\mathbf{S}_k\right|.
\end{aligned}
$$
The $B$ term takes a little more work.
From [the Sherman-Morrison identity](https://en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_formula), we have that:
$$
\left(\tau^2\mathbf{1}_k\mathbf{1}_k' + \mathbf{S}_k\right)^{-1} = \mathbf{S}_k^{-1} - \mathbf{S}_k^{-1} \mathbf{1}_k \left(\frac{1}{\tau^2} + \frac{1}{V_k}\right)^{-1} \mathbf{1}_k'\mathbf{S}_k^{-1},
$$ {#eq-Sherman}
by which it follows that
$$
\mathbf{1}_k'\left(\tau^2\mathbf{1}_k\mathbf{1}_k' + \mathbf{S}_k\right)^{-1}\mathbf{1}_k = \frac{1}{\tau^2 + V_k}.
$$ {#eq-inversevariance}
Now, rearrange the $B$ term to get
$$
\begin{aligned}
B &= \left[\mathbf{T}_k - \bar{T}_k \mathbf{1}_k + \left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right) \mathbf{1}_k\right]' \left(\tau^2\mathbf{1}_k\mathbf{1}_k' + \mathbf{S}_k\right)^{-1} \left[\mathbf{T}_k - \bar{T}_k \mathbf{1}_k + \left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right) \mathbf{1}_k\right] \\
&= B_1 + 2 B_2 + B_3
\end{aligned}
$$
where
$$
\begin{aligned}
B_1 &= \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \left(\tau^2\mathbf{1}_k\mathbf{1}_k' + \mathbf{S}_k\right)^{-1} \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right) \\
B_2 &= \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \left(\tau^2\mathbf{1}_k\mathbf{1}_k' + \mathbf{S}_k\right)^{-1} \mathbf{1}_k \left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right) \\
B_3 &= \left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right) \mathbf{1}_k' \left(\tau^2\mathbf{1}_k\mathbf{1}_k' + \mathbf{S}_k\right)^{-1} \mathbf{1}_k \left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right)
\end{aligned}
$$
Applying @eq-Sherman to $B_1$,
$$
\begin{aligned}
B_1 &= \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \left[\mathbf{S}_k^{-1} - \mathbf{S}_k^{-1} \mathbf{1}_k \left(\frac{1}{\tau^2} + \frac{1}{V_k}\right)^{-1} \mathbf{1}_k'\mathbf{S}_k^{-1}\right] \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right) \\
&= \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \mathbf{S}_k^{-1}\left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right) - \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \mathbf{S}_k^{-1} \mathbf{1}_k \left(\frac{1}{\tau^2} + \frac{1}{V_k}\right)^{-1} \mathbf{1}_k'\mathbf{S}_k^{-1}\left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right) \\
&= \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \mathbf{S}_k^{-1}\left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right).
\end{aligned}
$$
The second term drops out because $\left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \mathbf{S}_k^{-1} \mathbf{1}_k = \bar{T}_k / V_k - \bar{T}_k / V_k = 0$. Along similar lines,
$$
\begin{aligned}
B_2 &= \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \left[\mathbf{S}_k^{-1} - \mathbf{S}_k^{-1} \mathbf{1}_k \left(\frac{1}{\tau^2} + \frac{1}{V_k}\right)^{-1} \mathbf{1}_k'\mathbf{S}_k^{-1}\right] \mathbf{1}_k \left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right) \\
&= \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \mathbf{S}_k^{-1}\mathbf{1}_k \left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right) - \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \mathbf{S}_k^{-1} \mathbf{1}_k \left(\frac{1}{\tau^2} + \frac{1}{V_k}\right)^{-1} \mathbf{1}_k'\mathbf{S}_k^{-1}\mathbf{1}_k \left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right) \\
&= 0.
\end{aligned}
$$
Finally, the third term simplifies using @eq-inversevariance:
$$
B_3 = \frac{\left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right)^2}{\tau^2 + V_k}.
$$
Thus, the full $B$ term reduces to
$$
B = \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \mathbf{S}_k^{-1}\left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right) + \frac{\left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right)^2}{\tau^2 + V_k}
$$
and the multivariate log likelihood contribution is
$$
\begin{aligned}
l^{MV}_k\left(\boldsymbol\beta, \tau^2\right) &= -\frac{1}{2} \log(\tau^2 + V_k) + \frac{1}{2} \log(V_k) - \frac{1}{2}\log \left|\mathbf{S}_k\right| - \frac{1}{2} \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \mathbf{S}_k^{-1}\left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right) -\frac{1}{2} \frac{\left(\bar{T}_k - \mathbf{x}_k \boldsymbol\beta\right)^2}{\tau^2 + V_k} \\
&= l^U_k\left(\boldsymbol\beta, \tau^2\right) + \frac{1}{2} \log(V_k) - \frac{1}{2}\log \left|\mathbf{S}_k\right| - \frac{1}{2} \left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right)' \mathbf{S}_k^{-1}\left(\mathbf{T}_k - \bar{T}_k \mathbf{1}_k\right).
\end{aligned}
$$
The last three terms depend on the data ($\mathbf{T}_k$ and $\mathbf{S}_k$) but not on the parameters $\boldsymbol\beta$ or $\tau^2$. Therefore, the univariate and multivariate likelihoods will be maximized at the same parameter values, i.e., the FML estimators are identical.
## Restricted likelihood
In practice, it is more common to use RML estimation rather than FML.
The RML estimators maximize a different objective function that includes the full likelihood, plus an additional term. The RML objective function for the univariate model is
$$
\sum_{k=1}^K l^U_k(\boldsymbol\beta, \tau^2) - \frac{1}{2} R^U(\tau^2)
$$
where
$$
R^U(\tau^2) = \log \left|\sum_{k=1}^k\frac{\mathbf{x}_k' \mathbf{x}_k}{\tau^2 + V_k} \right|.
$$
For the multivariate model, the RML objective is
$$
\sum_{k=1}^K l^{MV}_k(\boldsymbol\beta, \tau^2) - \frac{1}{2} R^{MV}(\tau^2).
$$
where
$$
\begin{aligned}
R^{MV}(\tau^2) &= \log \left|\sum_{k=1}^k \mathbf{x}_k'\mathbf{1}_k'\left(\tau^2\mathbf{1}_k\mathbf{1}_k' + \mathbf{S}_k\right)^{-1}\mathbf{1}_k \mathbf{x}_k \right|\\
&= \log \left|\sum_{k=1}^k\frac{\mathbf{x}_k' \mathbf{x}_k}{\tau^2 + V_k} \right| \\
&= R^U(\tau^2)
\end{aligned}
$$
because of @eq-inversevariance. Thus, the univariate and multivariate models also have the same RML estimators.
# So what?
Beyond being a good excuse to write a bunch of matrix algebra, why does any of this matter? I think there are two main implications. First, it is useful to recognize the equivalence of these models in order to understand when the multivariate model is _necessary_. If both of the conditions that I've described hold, then it is entirely acceptable to use aggregation rather than the more complicated multivariate model.[^reviewer2] Using the simpler univariate model might be desirable in practice because it makes the analysis easier to follow, because it makes it easier to run diagnostics or create illustrations of the results, or because of software limitations. Conversely, if either of the conditions does not hold, then there may be differences between the two approaches and the analyst will need to think carefully about which method better addresses their research questions.
[^reviewer2]: As RVE and MLMA become more wide-spread, I could imagine it happening that a meta-analyst who uses aggregation and a univariate model might get push-back from a reviewer, who uncritically recommends using a "more advanced" method to handle dependence. The results in this post provide a way for the meta-analyst to establish that doing so would be unnecessary.
A second implication is computational: because it gives the same results, the univariate model could be used as a short-cut for fitting the multivariate model. Compare the differences in computational time:
```{r}
library(microbenchmark)
microbenchmark(
uni = rma.uni(es ~ college + males, vi = var,
data = corrdat_agg, method = "REML"),
multi = rma.mv(effectsize ~ college + males, V = V_list,
random = ~ 1 | studyid,
data = corrdat, method = "REML")
)
```
If the aggregation is done in advance, it is _way_ quicker to fit the univariate model. The short-cut would be useful if we needed to estimate _lots_ of multi-variate meta-regressions (as long as the equivalence conditions hold). For example, if we needed to bootstrap the multivariate model, we could pre-compute the aggregated effects and then just bootstrap the much simpler, much quicker univariate model.
I suspect that the results I've presented here can be further generalized, but this will need a bit of further investigation. For one, there are also equivalences between variance estimators: using the CR2 cluster-robust variance estimator for the multivariate model is equivalent to using the HC2 heteroskedasticity-robust variance estimator for the univariate model with aggregated effects.[^RVE]
For another, the same sort of equivalence relationships hold even if there are additional random effects in the model, so long as the random effects are at the study level or higher levels of aggregation (e.g., lab effects, where labs are nested within studies).
I'll leave these generalizations as exercises for a future rainy day.
[^RVE]: Here's verification with the computational example from above:
```{r}
# multivariate CR2
coef_test(MV_fit, vcov = "CR2")
# univariate HC2
coef_test(uni_fit, vcov = "CR2", cluster = corrdat_agg$studyid)
```
# References
```