With complex data structures in meta-analysis sometimes we want to aggregate effect size to simplify the analysis. As reported by James Pustejovsky, “sometimes, aggregating effect sizes is fine”. The blog post show formally why aggregating or not, in specific condition, brings the exactly same results with some additional pros in terms of computational speed. Crucially, this is true only in specific conditions.
In this post I’ll highlight one such circumstance, where aggregating effect size estimates is not only reasonable but leads to exactly the same results as a multivariate model. This occurs when two conditions are met: - We are not interested in within-study heterogeneity of effects - Any predictors included in the model vary between studies but not within a given study (i.e., effect sizes from the same study all have the same values of the predictors). In short, if all we care about is understanding between-study variation in effect sizes, then it is fine to aggregate them up to the study level.
The example refers to a situation where I have multiple effects collected on the same pool of subjects nested within papers. Thus if I have predictors at the level of the effect or I’m interested in modelling variability within papers, aggregating is not good.
In case where I do not have moderators or I’m interested only in between-papers variability, aggregating is totally fine.
We can expand this even to situations with more nesting level, outcomes nested within experiments nested within papers. If we are not interested in the effect sizes level (i.e., outcome level) aggregating is fine.
Example
Here I simulate a dataset with \(K\) papers. Each papers can have a maximum of \(J\) experiments and each experiments can have 1 (e.g., ACC or RT) or 2 (e.g., ACC and RT) outcomes colected on the same pool of subjects.
library(metafor)
Loading required package: Matrix
Loading required package: metadat
Loading required package: numDeriv
Loading the 'metafor' package (version 4.0-0). For an
introduction to the package please type: help(metafor)
We start by simulating the data structure. The number of experiments and outcomes is random:
K <-100# number of papersJ <-sample(1:3, K, replace =TRUE) # number of studies, within each paperZ <-sample(1:2, sum(J), replace =TRUE) # number of outcomes per study/paperdat <-data.frame(paper =rep(rep(1:K, J), Z), exp =rep(seqw(J), Z), effect =seqw(Z))head(dat)
Then we set our simulation parameters. We have 3 heterogeneity components: \(\tau^2\), \(\omega^2\) and \(\zeta^2\). I simulate a meta-regression model because I simulate a difference between the effect on the two outcomes but I’m not interested in fitting a meta-regression (just a shortcut to simulate the difference).
Now the crucial part, we need to create the block-variance-covariance matrix of the sampling errors. Crucially, sampling errors are correlated ONLY when we have multiple outcomes because experiments within the same paper have different subjects. The matrix should reflect this feature.
# create block-matrixV <-vcalc(vi, cluster = paper, # 1st level clustersubgroup = exp, # independent experimentsobs = effect, # correlated effectsrho =0.7, # correlationdata = dat)# splitting the matrix just for referenceVb <-blsplit(V, dat$paper, fun = round, 3)
For example, the second paper has 3 experiments. The first experiment has two outcomes while the second and the third have only one outcome.
To compare the model with and without aggregation, the trick is fitting a model without the lowest nesting level in the random effect structure. Normally, the simulated dataset above should be modeled (for the random part) as 1|paper/exp/effect. Here we drop the effect term:
fit0 <-rma.mv(yi, V, random =~1|paper/exp, data = dat, sparse =TRUE)summary(fit0)
Multivariate Meta-Analysis Model (k = 293; method: REML)
logLik Deviance AIC BIC AICc
-463.2780 926.5559 932.5559 943.5862 932.6393
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.3506 0.5921 100 no paper
sigma^2.2 0.1651 0.4063 193 no paper/exp
Test for Heterogeneity:
Q(df = 292) = 2486.0170, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
0.0010 0.0699 0.0147 0.9883 -0.1360 0.1381
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Using the V matrix we are fixing the correlation between multiple outcomes (i.e., the lowest nesting level) to 0.7.
Model with aggregation
To aggregate we use the Borenstein method implemented in the aggregate function:
# with aggregationdat <-escalc(yi = yi, vi = vi, data = dat)datl <-split(dat, dat$paper)datl <-lapply(datl, function(x) aggregate(x, cluster = exp, rho =0.7))datagg <-do.call(rbind, datl)head(datagg)