library(tidyverse)
library(metafor)Multilevel and Multivariate Meta-Analysis
Notes from the Advanced MA Course by Wolfgang Viechtbauer
mtheme <- function(size = 15){
theme_minimal(size)
}
theme_set(mtheme())These notes summarises some of the content of the Advanced Meta-Analysis course by Wolfgang Viechtbauer with some personal additions.
Quick recap about linear models
Standard linear models have some assumptions and can be formalized using the usual equation:
\[ y_i = \beta_0 + \beta_{1}x_{1_i} + \beta_{2}x_{2_i} \dots \beta_{p}x_{p_i} + \epsilon_i \]
Then \(\epsilon_i\) is the crucial part. The assumption of the linear model is the following:
\[ \epsilon_i \sim \mathcal{N}(0, \sigma^2) \]
More compactly:
\[ y_i \sim \mathcal{N}(X_i\boldsymbol{\beta}, \sigma^2) \]
Or in matrix form (relevant for other models):
\[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \sigma^2\mathbf{I}) \]
Or:
\[ \underset{n \times 1}{\mathbf{y}} = \underset{n \times p}{\mathbf{X}} \;\underset{p \times 1}{\boldsymbol{\beta}} + \underset{n \times 1}{\boldsymbol{\epsilon}} \]
With:
\[ \boldsymbol{\epsilon} \sim \mathcal{N}\!\left( \boldsymbol{0}, \; \sigma^{2}\, \underset{n \times n}{\mathbf{I}} \right) \]
Where \(\mathbf{I}\) is an identity matrix of dimension \(n \times n\). Basically a scalar multiplied by an identity matrix gives a matrix with diagonal elements equals to the scalar.
n <- 5 # number of observations
s2 <- 10 # residual variance
I <- diag(n)
I [,1] [,2] [,3] [,4] [,5]
[1,] 1 0 0 0 0
[2,] 0 1 0 0 0
[3,] 0 0 1 0 0
[4,] 0 0 0 1 0
[5,] 0 0 0 0 1
s2*I [,1] [,2] [,3] [,4] [,5]
[1,] 10 0 0 0 0
[2,] 0 10 0 0 0
[3,] 0 0 10 0 0
[4,] 0 0 0 10 0
[5,] 0 0 0 0 10
If you sample from a multivariate normal distribution with this matrix, is the same as sampling from an univariate normal distribution:
n <- 1e3
s2 <- 10 # residual variance
I <- diag(n)
ei_1 <- rnorm(n, 0, sqrt(s2))
# or also
ei_2 <- rnorm(n, 0, sqrt(diag(s2*I)))Thus the standard linear model makes the critical assumption of homogeneity of variances because each observation is expected on average to vary \(\sigma^2\) around the true value.
Equal and random-effects models
But what is the point? Let’s now move to the equal-effect meta-analysis model (Now \(k\) being the studies):
\[ y_k = \beta_0 + \epsilon_k \]
Each \(y_k\) is the effect size of a certain study but we have also \(\sigma^2_k\) that is the sampling variance. Each study has it’s own sampling variance thus a study-specific expected residual value (\(\epsilon_k\)). The assumption of the previous model is no longer relevant otherwise we are assuming that the precision (\(\frac{1}{\sigma^2_k}\)) of each study being the same (could be a sort of special case).
\[ \epsilon_k \sim \mathcal{N}(0, \sigma^2_k) \]
Or in matrix form:
\[ \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \boldsymbol{V}^2) \]
Where \(V^2\) is the variance-covariance matrix of sampling errors:
k <- 5
(vi <- round(runif(k, 0.01, 0.5), 2))[1] 0.49 0.34 0.35 0.28 0.33
V <- diag(sqrt(vi)) %*% diag(k) %*% diag(sqrt(vi))
V [,1] [,2] [,3] [,4] [,5]
[1,] 0.49 0.00 0.00 0.00 0.00
[2,] 0.00 0.34 0.00 0.00 0.00
[3,] 0.00 0.00 0.35 0.00 0.00
[4,] 0.00 0.00 0.00 0.28 0.00
[5,] 0.00 0.00 0.00 0.00 0.33
Of course the matrix formulation is not necessary here because we have a vector of variances (no covariances between studies). In fact, we can fit the same model using V or the vector.
library(metafor)
# from ?metafor::rma
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
head(dat)
trial author year tpos tneg cpos cneg ablat alloc yi
1 1 Aronson 1948 4 119 11 128 44 random -0.8893
2 2 Ferguson & Simes 1949 6 300 29 274 55 random -1.5854
3 3 Rosenthal et al 1960 3 228 11 209 42 random -1.3481
4 4 Hart & Sutherland 1977 62 13536 248 12619 52 random -1.4416
5 5 Frimodt-Moller et al 1973 33 5036 47 5761 13 alternate -0.2175
6 6 Stein & Aronson 1953 180 1361 372 1079 44 alternate -0.7861
vi
1 0.3256
2 0.1946
3 0.4154
4 0.0200
5 0.0512
6 0.0069
fit_ee <- rma(yi, vi, data = dat, method = "EE")
V <- vcalc(vi, cluster = trial, rho = 0, data = dat)
round(dat$vi[1:5], 2)[1] 0.33 0.19 0.42 0.02 0.05
round(V[1:5, 1:5], 2) [,1] [,2] [,3] [,4] [,5]
[1,] 0.33 0.00 0.00 0.00 0.00
[2,] 0.00 0.19 0.00 0.00 0.00
[3,] 0.00 0.00 0.42 0.00 0.00
[4,] 0.00 0.00 0.00 0.02 0.00
[5,] 0.00 0.00 0.00 0.00 0.05
fit_mv_ee <- rma.mv(yi, V, data = dat)
fit_ee
Equal-Effects Model (k = 13)
I^2 (total heterogeneity / total variability): 92.12%
H^2 (total variability / sampling variability): 12.69
Test for Heterogeneity:
Q(df = 12) = 152.2330, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
-0.4303 0.0405 -10.6247 <.0001 -0.5097 -0.3509 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fit_mv_ee
Multivariate Meta-Analysis Model (k = 13; method: REML)
Variance Components: none
Test for Heterogeneity:
Q(df = 12) = 152.2330, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
-0.4303 0.0405 -10.6247 <.0001 -0.5097 -0.3509 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This part about the sampling errors is the same regardless fitting a random or equal effects model. In the random-effects model we no longer have a single true effect \(\mu\) but the average of \(k\) true effects (\(\theta_k\)) that is \(\mu_{\theta}\). With \(\theta_k = \mu_\theta + \delta_k\).
\[ y_k = \mu_\theta + \delta_k + \epsilon_k \]
The additional part \(\delta_k\) is assumed to be normally distributed as:
\[ \delta_k \sim \mathcal{N}(0, \tau^2) \]
With \(\tau^2\) being the variance of true effects \(\theta_k \sim \mathcal{N}(\mu_\theta, \tau^2)\).
More compactly we can write a full equation as:
\[ y_k \sim \mathcal{N}(\mu_\theta, \tau^2 + \sigma^2_k) \]
Also for \(\delta_k\) we can do the same matrix-formulation as \(\epsilon_k\).
So more generally and following Cheung (2015) we can write the model as:
\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{\delta} + \mathbf{\epsilon} \]
\[ \underbrace{ \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_k \end{bmatrix} }_{\mathbf{y}} = \underbrace{ \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix} }_{X} \underbrace{ \begin{bmatrix} \beta_0 \end{bmatrix} }_{\beta} + \underbrace{ \begin{bmatrix} \delta_1 \\ \delta_2 \\ \vdots \\ \delta_k \end{bmatrix} }_{\delta} + \underbrace{ \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_k \end{bmatrix} }_{\epsilon} \,, \]
Multilevel and Multivariate models
There is some confusion about multilevel and multivariate in meta-analysis (see https://jepusto.com/posts/what-does-multivariate-mean/).
My suggestion is to think about the \(\mathbf{V}^2\) matrix. If the \(\mathbf{V}^2\) is diagonal then the model can be a two-level model (the usual meta-analysis model) or a \(n\)-level model (for example the three-level model).
If the \(\mathbf{V}^2\) matrix is not diagonal, this means that there are correlations between sampling errors thus the same participants contribute to one or more effect sizes computation.
Multilevel
The classical situation is when there are multiple independent experiments within a paper. Effects are correlated because they share the same paper and not because participants overlapping in effect sizes computation.
Let’s start from the konstantopoulos example by metafor:
dat <- dat.konstantopoulos2011
head(dat, 15)
district school study year yi vi
1 11 1 1 1976 -0.180 0.118
2 11 2 2 1976 -0.220 0.118
3 11 3 3 1976 0.230 0.144
4 11 4 4 1976 -0.300 0.144
5 12 1 5 1989 0.130 0.014
6 12 2 6 1989 -0.260 0.014
7 12 3 7 1989 0.190 0.015
8 12 4 8 1989 0.320 0.024
9 18 1 9 1994 0.450 0.023
10 18 2 10 1994 0.380 0.043
11 18 3 11 1994 0.290 0.012
12 27 1 12 1976 0.160 0.020
13 27 2 13 1976 0.650 0.004
14 27 3 14 1976 0.360 0.004
15 27 4 15 1976 0.600 0.007
Here district is the paper in the example above but the idea is the same. This is clearly a multilevel model and we can express it as:
\[ y_{kj} = \mu + \delta_k + \zeta_{kj} + \epsilon_{kj} \]
\[ \delta_i \sim \mathcal{N}(0, \tau^2_3) \]
\[ \zeta_{ij} \sim \mathcal{N}(0, \tau^2_2) \]
We can display the idea of this model as:
Code
k <- 5
j <- 5
tau2_3 <- 0.1
tau2_2 <- 0.05
nj <- sample(1:j, k)
paper <- rep(1:k, nj)
study <- unlist(sapply(nj, function(x) 1:x))
dat <- data.frame(
paper, study
)
deltak <- rnorm(k, 0, sqrt(tau2_3))
zetakj <- rnorm(nrow(dat), 0, sqrt(tau2_2))
dat$deltak <- deltak[dat$paper]
dat$theta <- with(dat, deltak + zetakj)
ggplot(dat, aes(x = theta, y = paper)) +
geom_point(position = position_jitter(height = 0.4, seed = 2025)) +
geom_point(aes(x = deltak), col = "firebrick", size = 5) +
geom_segment(aes(x = theta, xend = deltak, y = paper, yend = paper),
position = position_jitter(height = 0.4, seed = 2025)) +
theme_bw(20) +
xlab(latex2exp::TeX("$\\theta$")) +
ylab("Paper") +
#xlim(c(-1, 1)) +
geom_vline(xintercept = 0, lty = "dotted") +
geom_segment(aes(x = deltak, xend = 0, y = paper, yend = paper),
col = "firebrick")The red dots/lines are \(\delta_k\), black dots/lines are \(\zeta_{kj}\). The average length of the black lines depends on \(\tau^2_2\) while the average lenght of the red lines depends on \(\tau^2_3\).
The interplay between \(\tau^2_2\) and \(\tau^2_3\) is something very familiar in multilevel modeling i.e. the Intraclass Correlation Coefficient (ICC):
\[ \mbox{ICC} = \frac{\tau^2_3}{\tau^2_2 + \tau^2_3} \]
Can be considered as the proportion of the total true variance explained by the clustering (i.e., the paper membership in this case). When the ICC is close to 1 the clustering is really important. When the ICC is close to 0 belonging to a paper is not informative.
In matrix form, the \(T^2\) matrix is no longer diagonal (as in the standard two-level model) but is a combination of between and within variance.
Let’s fit the model using rma.mv instead of rma:
dat <- dat.konstantopoulos2011
fit <- rma.mv(yi, vi, random = ~ 1|district/study, data = dat)
fit
Multivariate Meta-Analysis Model (k = 56; method: REML)
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0651 0.2551 11 no district
sigma^2.2 0.0327 0.1809 56 no district/study
Test for Heterogeneity:
Q(df = 55) = 578.8640, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
0.1847 0.0846 2.1845 0.0289 0.0190 0.3504 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In the Variance Components section there are the two estimated variances: between and within district. We can also manually calculate the ICC:
# the order is the same as the displayed one, the first is district and second is study within district
fit$sigma2
## [1] 0.06506194 0.03273652
fit$sigma2 / sum(fit$sigma2)
## [1] 0.6652655 0.3347345We can also visualize the \(T^2\) matrix using a little custom function:
Code
Tcalc <- function(k, j, tau2_3 = 0, tau2_2 = 0){
cc <- split(j, k)
T2diag <- diag(nrow = length(j))
T2k <- tau2_3 * T2diag
blockT2j <- lapply(cc, function(x) matrix(tau2_2, ncol = length(x), nrow = length(x)))
fullT2j <- bdiag(blockT2j)
T2 <- T2k + fullT2j
list(
T2 = bdiag(T2),
T2k = bdiag(T2k),
T2j = fullT2j
)
}The first matrix is the combination of the between (the second) and within (the third) matrices.
TT <- Tcalc(dat$district, dat$study, fit$sigma2[1], fit$sigma2[2])
TT$T2[1:10, 1:10]10 x 10 sparse Matrix of class "dsCMatrix"
[1,] 0.09779846 0.03273652 0.03273652 0.03273652 . .
[2,] 0.03273652 0.09779846 0.03273652 0.03273652 . .
[3,] 0.03273652 0.03273652 0.09779846 0.03273652 . .
[4,] 0.03273652 0.03273652 0.03273652 0.09779846 . .
[5,] . . . . 0.09779846 0.03273652
[6,] . . . . 0.03273652 0.09779846
[7,] . . . . 0.03273652 0.03273652
[8,] . . . . 0.03273652 0.03273652
[9,] . . . . . .
[10,] . . . . . .
[1,] . . . .
[2,] . . . .
[3,] . . . .
[4,] . . . .
[5,] 0.03273652 0.03273652 . .
[6,] 0.03273652 0.03273652 . .
[7,] 0.09779846 0.03273652 . .
[8,] 0.03273652 0.09779846 . .
[9,] . . 0.09779846 0.03273652
[10,] . . 0.03273652 0.09779846
TT$T2k[1:10, 1:10]10 x 10 sparse Matrix of class "dsCMatrix"
[1,] 0.06506194 . . . . .
[2,] . 0.06506194 . . . .
[3,] . . 0.06506194 . . .
[4,] . . . 0.06506194 . .
[5,] . . . . 0.06506194 .
[6,] . . . . . 0.06506194
[7,] . . . . . .
[8,] . . . . . .
[9,] . . . . . .
[10,] . . . . . .
[1,] . . . .
[2,] . . . .
[3,] . . . .
[4,] . . . .
[5,] . . . .
[6,] . . . .
[7,] 0.06506194 . . .
[8,] . 0.06506194 . .
[9,] . . 0.06506194 .
[10,] . . . 0.06506194
TT$T2j[1:10, 1:10]10 x 10 sparse Matrix of class "dsCMatrix"
[1,] 0.03273652 0.03273652 0.03273652 0.03273652 . .
[2,] 0.03273652 0.03273652 0.03273652 0.03273652 . .
[3,] 0.03273652 0.03273652 0.03273652 0.03273652 . .
[4,] 0.03273652 0.03273652 0.03273652 0.03273652 . .
[5,] . . . . 0.03273652 0.03273652
[6,] . . . . 0.03273652 0.03273652
[7,] . . . . 0.03273652 0.03273652
[8,] . . . . 0.03273652 0.03273652
[9,] . . . . . .
[10,] . . . . . .
[1,] . . . .
[2,] . . . .
[3,] . . . .
[4,] . . . .
[5,] 0.03273652 0.03273652 . .
[6,] 0.03273652 0.03273652 . .
[7,] 0.03273652 0.03273652 . .
[8,] 0.03273652 0.03273652 . .
[9,] . . 0.03273652 0.03273652
[10,] . . 0.03273652 0.03273652
The \(V^2\) matrix i.e., the sampling errors is diagonal because effect sizes are calculated on different participants. We can also visualize the matrix:
dat$vi[1:10] [1] 0.118 0.118 0.144 0.144 0.014 0.014 0.015 0.024 0.023 0.043
fit$V[1:10, 1:10] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0.118 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
[2,] 0.000 0.118 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
[3,] 0.000 0.000 0.144 0.000 0.000 0.000 0.000 0.000 0.000 0.000
[4,] 0.000 0.000 0.000 0.144 0.000 0.000 0.000 0.000 0.000 0.000
[5,] 0.000 0.000 0.000 0.000 0.014 0.000 0.000 0.000 0.000 0.000
[6,] 0.000 0.000 0.000 0.000 0.000 0.014 0.000 0.000 0.000 0.000
[7,] 0.000 0.000 0.000 0.000 0.000 0.000 0.015 0.000 0.000 0.000
[8,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.024 0.000 0.000
[9,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.023 0.000
[10,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.043
The \(V^2\) matrix can be also constructed and used within the model. This is not usually done with multilevel models but always with multivariate models. There is the vcalc function:
# notice the rho = 0, independent sampling errors
V <- vcalc(vi, cluster = district, obs = study, rho = 0, data = dat)
V[1:10, 1:10] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0.118 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
[2,] 0.000 0.118 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
[3,] 0.000 0.000 0.144 0.000 0.000 0.000 0.000 0.000 0.000 0.000
[4,] 0.000 0.000 0.000 0.144 0.000 0.000 0.000 0.000 0.000 0.000
[5,] 0.000 0.000 0.000 0.000 0.014 0.000 0.000 0.000 0.000 0.000
[6,] 0.000 0.000 0.000 0.000 0.000 0.014 0.000 0.000 0.000 0.000
[7,] 0.000 0.000 0.000 0.000 0.000 0.000 0.015 0.000 0.000 0.000
[8,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.024 0.000 0.000
[9,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.023 0.000
[10,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.043
all.equal(V, fit$V)[1] TRUE
# same as
# rma.mv(yi, V, random = ~ 1|district/study, data = dat)here models omitting the random
cov(zetai, zetai) = tau2_2
There is another interesting feature in metafor. We can fit the same three-level model with a different parametrization:
fit_mv <- rma.mv(yi, vi, random = ~ study|district, data = dat)
fit_mv
Multivariate Meta-Analysis Model (k = 56; method: REML)
Variance Components:
outer factor: district (nlvls = 11)
inner factor: study (nlvls = 56)
estim sqrt fixed
tau^2 0.0978 0.3127 no
rho 0.6653 no
Test for Heterogeneity:
Q(df = 55) = 578.8640, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
0.1847 0.0846 2.1845 0.0289 0.0190 0.3504 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This is called multivariate parametrization1 (and will be clear later). The model is exactly the same but:
- We estimate directly the ICC (
rho) - We estimate the total variance \(\tau^2_3 + \tau^2_2\)
# fit was the standard parametrization
sum(fit$sigma2) # same as tau^2
## [1] 0.09779846
fit$sigma2[1] / sum(fit$sigma2) # same as rho
## [1] 0.6652655The three-level model is the more appropriate here. What could happen if we fit the wrong or less appropriate models?
The first option is ignoring the nested structure and fitting a standard two-level model:
fit3l <- fit # better name
fit2l <- rma(yi, vi, data = dat) # ignoring nesting
fit2l
Random-Effects Model (k = 56; tau^2 estimator: REML)
tau^2 (estimated amount of total heterogeneity): 0.0884 (SE = 0.0202)
tau (square root of estimated tau^2 value): 0.2974
I^2 (total heterogeneity / total variability): 94.70%
H^2 (total variability / sampling variability): 18.89
Test for Heterogeneity:
Q(df = 55) = 578.8640, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
0.1279 0.0439 2.9161 0.0035 0.0419 0.2139 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A third option is to aggregate thus reducing each cluster (district) to a single effect size. The metafor::aggregate() function can do it very easily. Notice that is not simply taking the average but more like doing a smaller meta-analysis for each cluster:
# rho = 0 because sampling errors are independent
dat_agg <- aggregate(dat, cluster = district, rho = 0)
dat_agg
district school study year yi vi
1 11 2.5 2.5 1976.0 -0.126 0.032
2 12 2.5 6.5 1989.0 0.067 0.004
3 18 2.0 10.0 1994.0 0.350 0.007
4 27 2.5 13.5 1976.0 0.500 0.001
5 56 2.5 17.5 1997.0 0.051 0.002
6 58 6.0 25.0 1976.0 -0.042 0.001
7 71 2.0 32.0 1997.0 0.886 0.004
8 86 4.5 37.5 1997.0 -0.029 0.000
9 91 3.5 44.5 2000.0 0.250 0.002
10 108 3.0 50.0 2000.0 0.015 0.006
11 644 2.5 54.5 1994.5 0.162 0.019
fit2l_agg <- rma(yi, vi, data = dat_agg)
fit2l_agg
Random-Effects Model (k = 11; tau^2 estimator: REML)
tau^2 (estimated amount of total heterogeneity): 0.0828 (SE = 0.0397)
tau (square root of estimated tau^2 value): 0.2878
I^2 (total heterogeneity / total variability): 98.10%
H^2 (total variability / sampling variability): 52.60
Test for Heterogeneity:
Q(df = 10) = 415.9203, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
0.1960 0.0900 2.1785 0.0294 0.0197 0.3724 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A final model could be including only the highest level and not the effect size level:
fit2l_3l <- rma.mv(yi, vi, random = ~1|district, data = dat)
fit2l_3l
Multivariate Meta-Analysis Model (k = 56; method: REML)
Variance Components:
estim sqrt nlvls fixed factor
sigma^2 0.0828 0.2878 11 no district
Test for Heterogeneity:
Q(df = 55) = 578.8640, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
0.1960 0.0900 2.1785 0.0294 0.0197 0.3724 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let’s compare the estimated effects and standard errors (the main info for inference):
ll <- list(fit2l = fit2l, fit3l = fit3l, fit2l_3l = fit2l_3l, fit2l_agg = fit2l_agg)
data.frame(
mu = sapply(ll, function(x) x$b),
se = sapply(ll, function(x) x$se)
) mu se
fit2l 0.1279321 0.04387028
fit3l 0.1847132 0.08455592
fit2l_3l 0.1960299 0.08998342
fit2l_agg 0.1960299 0.08998338
Notice that the last model is the same as assuming \(\tau^2_2 = 0\). We have a small impact on the results only because the ICC is quite high thus most of the variance is due to the level that we considered.
metafor can fix all the values to zero for testing, sensitivity analysis but also model comparison:
fit3l_0 <- rma.mv(yi, vi, random = ~1|district/study, data = dat, sigma2 = c(NA, 0))
# the two models are the sameler
fit3l_0
Multivariate Meta-Analysis Model (k = 56; method: REML)
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0828 0.2878 11 no district
sigma^2.2 0.0000 0.0000 56 yes district/study
Test for Heterogeneity:
Q(df = 55) = 578.8640, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
0.1960 0.0900 2.1785 0.0294 0.0197 0.3724 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fit2l_3l
Multivariate Meta-Analysis Model (k = 56; method: REML)
Variance Components:
estim sqrt nlvls fixed factor
sigma^2 0.0828 0.2878 11 no district
Test for Heterogeneity:
Q(df = 55) = 578.8640, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
0.1960 0.0900 2.1785 0.0294 0.0197 0.3724 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# model with tau2 fixed to 0 vs estimated
anova(fit3l_0, fit3l)
df AIC BIC AICc logLik LRT pval QE
Full 3 21.9174 27.9394 22.3880 -7.9587 578.8640
Reduced 2 68.4330 72.4476 68.6637 -32.2165 48.5155 <.0001 578.8640
This means that ignoring the nesting (fit2l) is like assuming that \(\tau^2_3 = 0\) that is far from reality. This is why it is the worst model.
James Pustejovsky wrote a blog post https://jepusto.com/posts/Sometimes-aggregating-effect-sizes-is-fine/ showing when aggregating is fine.
It also show a model-based way to aggregate instead of using the aggregate function. Basically, we can drop the lowest random effect and provide an appropriate \(V^2\) matrix.
V <- vcalc(vi, cluster = district, obs = study, rho = 0, data = dat)
# dropping the /study
fit_agg_jep <- rma.mv(yi, V, random = ~ 1|district, data = dat)
fit_agg_jep
Multivariate Meta-Analysis Model (k = 56; method: REML)
Variance Components:
estim sqrt nlvls fixed factor
sigma^2 0.0828 0.2878 11 no district
Test for Heterogeneity:
Q(df = 55) = 578.8640, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
0.1960 0.0900 2.1785 0.0294 0.0197 0.3724 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# EXACTLY the same as fit2l_agg
fit2l_agg
Random-Effects Model (k = 11; tau^2 estimator: REML)
tau^2 (estimated amount of total heterogeneity): 0.0828 (SE = 0.0397)
tau (square root of estimated tau^2 value): 0.2878
I^2 (total heterogeneity / total variability): 98.10%
H^2 (total variability / sampling variability): 52.60
Test for Heterogeneity:
Q(df = 10) = 415.9203, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
0.1960 0.0900 2.1785 0.0294 0.0197 0.3724 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multivariate
The main point of a multivariate model is that the same participants contribute to more than one effect size. Some examples:
- multiple outcomes of a treatment: anxiety, depression, etc.
- multiple measures of the same outcome
- multiple groups compared to the same control group
- …
Let’s imagine a simple situation where we have two outcomes of a certain treatment and we compare two groups: control and experimental. The effect size is the difference between the two group means for each outcome.
dat <- data.frame(
paper = 1,
m_ctrl = c(0.1, 0),
m_exp = c(0.5, 0.7),
sd_ctrl = 1,
sd_exp = 1,
n_exp = 50,
n_ctrl = 50,
outcome = c("anxiety", "depression")
)
dat paper m_ctrl m_exp sd_ctrl sd_exp n_exp n_ctrl outcome
1 1 0.1 0.5 1 1 50 50 anxiety
2 1 0.0 0.7 1 1 50 50 depression
# calculate effect size
dat <- escalc("SMD", m1i = m_exp, m2i = m_ctrl, sd1i = sd_exp, sd2i = sd_ctrl, n1i = n_exp, n2i = n_ctrl, data = dat)
dat
paper m_ctrl m_exp sd_ctrl sd_exp n_exp n_ctrl outcome yi vi
1 1 0.1 0.5 1 1 50 50 anxiety 0.3969 0.0408
2 1 0.0 0.7 1 1 50 50 depression 0.6946 0.0424
In this example, the same people answered to the depression and the anxiety questionnaires thus the effect are correlated. What is the problem? While the vi (\(\sigma^2\)) can be easily calculated (usually), the correlation between anxiety and depression is not usually (never) reported.
However, let’s imagine create the \(V^2\) matrix for this paper. Usually the \(V^2\) matrix is diagonal (also in the multilevel model):
# this would the the usual matrix, assuming independence
V <- vcalc(vi, cluster = paper, obs = outcome, rho = 0, data = dat)
V
## [,1] [,2]
## [1,] 0.04078777 0.00000000
## [2,] 0.00000000 0.04241253
# this is the actual matrix, assuming to know rho = 0.7
V <- vcalc(vi, cluster = paper, obs = outcome, rho = 0.7, data = dat)
V
## [,1] [,2]
## [1,] 0.04078777 0.02911455
## [2,] 0.02911455 0.04241253The diagonal is the same but the off-diagonal is the covariance. The covariance is \(\sigma_{x, y} = \rho_{xy} \sqrt{\sigma^2_x} \sqrt{\sigma^2_y}\) thus:
V[upper.tri(V)]
## [1] 0.02911455
prod(sqrt(dat$vi)) * 0.7
## [1] 0.02911455Thus the \(V^2\) matrix for more complex models is surely non-diagonal. A diagonal matrix in this case is like assuming that sampling errors are independent (no correlation).
As an example, let’s use the dat.berkey1998 dataset from metafor (see https://www.metafor-project.org/doku.php/analyses:berkey1998 for the full example):
dat <- dat.berkey1998
dat
trial author year ni outcome yi vi v1i v2i
1 1 Pihlstrom et al. 1983 14 PD 0.4700 0.0075 0.0075 0.0030
2 1 Pihlstrom et al. 1983 14 AL -0.3200 0.0077 0.0030 0.0077
3 2 Lindhe et al. 1982 15 PD 0.2000 0.0057 0.0057 0.0009
4 2 Lindhe et al. 1982 15 AL -0.6000 0.0008 0.0009 0.0008
5 3 Knowles et al. 1979 78 PD 0.4000 0.0021 0.0021 0.0007
6 3 Knowles et al. 1979 78 AL -0.1200 0.0014 0.0007 0.0014
7 4 Ramfjord et al. 1987 89 PD 0.2600 0.0029 0.0029 0.0009
8 4 Ramfjord et al. 1987 89 AL -0.3100 0.0015 0.0009 0.0015
9 5 Becker et al. 1988 16 PD 0.5600 0.0148 0.0148 0.0072
10 5 Becker et al. 1988 16 AL -0.3900 0.0304 0.0072 0.0304
This is a common example but honestly not so common because not only variances vi are reported but also the covariances (or correlations). Basically is like having the correlation between anxiety and depression for each study. But, for the moment, we can ignore the covariance.
Now, we have now multiple rows for each paper and the rows are also correlated (same participants, multiple outcomes). Thus the multilevel model is not entirely appropriate:
dat$effect <- 1:nrow(dat)
fit3l <- rma.mv(yi, vi, random = ~ 1|trial/effect, data = dat)
fit3l
Multivariate Meta-Analysis Model (k = 10; method: REML)
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0000 0.0000 5 no trial
sigma^2.2 0.1609 0.4011 10 no trial/effect
Test for Heterogeneity:
Q(df = 9) = 583.6005, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
0.0155 0.1296 0.1198 0.9046 -0.2385 0.2695
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
What is missing? basically we are estimating an average effect across outcomes (PD and AL). Furthermore we would probably estimate the correlation between PD and AL. We are estimating the correlation of true effects, not the correlation between sampling errors. But we know how to parametrize the model (remember the multivariate parametrization?) using the random = ~ inner|outer format where inner is the variable varying within paper (as the effect size index in the three-level model) and outer is the cluster:
fitmv <- rma.mv(yi, vi, random = ~ outcome|trial, data = dat)
fitmv
Multivariate Meta-Analysis Model (k = 10; method: REML)
Variance Components:
outer factor: trial (nlvls = 5)
inner factor: outcome (nlvls = 2)
estim sqrt fixed
tau^2 0.1540 0.3924 no
rho -0.7674 no
Test for Heterogeneity:
Q(df = 9) = 583.6005, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
0.0104 0.0649 0.1602 0.8727 -0.1168 0.1376
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now we have the correlation between outcomes rho and the total heterogeneity tau^2. What is missing? The fixed-effect of the outcome. We want an estimate of PD and AL:
fitmv2 <- rma.mv(yi, vi, mods = ~ 0 + outcome, random = ~ outcome|trial, data = dat)
fitmv2
Multivariate Meta-Analysis Model (k = 10; method: REML)
Variance Components:
outer factor: trial (nlvls = 5)
inner factor: outcome (nlvls = 2)
estim sqrt fixed
tau^2 0.0245 0.1567 no
rho 0.5831 no
Test for Residual Heterogeneity:
QE(df = 8) = 124.9021, p-val < .0001
Test of Moderators (coefficients 1:2):
QM(df = 2) = 78.0792, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
outcomeAL -0.3323 0.0772 -4.3041 <.0001 -0.4836 -0.1810 ***
outcomePD 0.3594 0.0780 4.6056 <.0001 0.2064 0.5123 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
What is missing now? Remember that we are assuming that the \(V^2\) matrix is diagonal thus we are not taking into account the correlation. We are only estimating the correlation of true effects and considering sampling errors as independent.
Let’s assume to have \(\rho = 0.7\):
V <- vcalc(vi, cluster = trial, obs = outcome, data = dat, rho = 0.7)
# now we use V not the vector
fitmv3 <- rma.mv(yi, V, mods = ~ 0 + outcome, random = ~ outcome|trial, data = dat)
fitmv3
Multivariate Meta-Analysis Model (k = 10; method: REML)
Variance Components:
outer factor: trial (nlvls = 5)
inner factor: outcome (nlvls = 2)
estim sqrt fixed
tau^2 0.0259 0.1608 no
rho 0.4593 no
Test for Residual Heterogeneity:
QE(df = 8) = 176.9217, p-val < .0001
Test of Moderators (coefficients 1:2):
QM(df = 2) = 79.5040, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
outcomeAL -0.3455 0.0795 -4.3481 <.0001 -0.5013 -0.1898 ***
outcomePD 0.3703 0.0796 4.6515 <.0001 0.2143 0.5263 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We are still making an implicit assumption that is about the \(T^2\) matrix. The random effects matrix is now a 2x2 matrix (because we have two outcomes):
\[ \begin{align} \begin{gathered} \mathrm{T}^2 = \begin{bmatrix} \tau_1^2 & & \\ \rho_{21}\tau_2\tau_1 & \tau_2^2 & \\ \end{bmatrix} \end{gathered} \end{align} \]
But in the model we have one \(\tau^2\) (not two) and a single correlation (correct). In fact, rma.mv by default has a struct = "CS" argument (see ?rma.mv).
CS stands for compound symmetry and the idea is to estimate a single \(\tau^2\) and a single \(\rho\) regardless the number of outcomes. Is an assumption (that can be tested) as dropping the random effect in the multilevel model.
Let’s switch to the struct = "UN" to estimate the unstructured matrix:
fitmv4 <- rma.mv(yi, V, mods = ~ 0 + outcome, random = ~ outcome|trial, struct = "UN", data = dat)
fitmv4
Multivariate Meta-Analysis Model (k = 10; method: REML)
Variance Components:
outer factor: trial (nlvls = 5)
inner factor: outcome (nlvls = 2)
estim sqrt k.lvl fixed level
tau^2.1 0.0331 0.1819 5 no AL
tau^2.2 0.0144 0.1199 5 no PD
rho.AL rho.PD AL PD
AL 1 - 5
PD 0.4536 1 no -
Test for Residual Heterogeneity:
QE(df = 8) = 176.9217, p-val < .0001
Test of Moderators (coefficients 1:2):
QM(df = 2) = 92.7518, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
outcomeAL -0.3527 0.0884 -3.9892 <.0001 -0.5261 -0.1794 ***
outcomePD 0.3636 0.0633 5.7425 <.0001 0.2395 0.4877 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# and compare with LRT
anova(fitmv3, fitmv4)
df AIC BIC AICc logLik LRT pval QE
Full 5 3.0552 3.4524 33.0552 3.4724 176.9217
Reduced 4 1.5524 1.8702 14.8857 3.2238 0.4972 0.4807 176.9217
Extra for multilevel models
Van den Noortgate et al. (2013) describe how, in some conditions, the multilevel model (thus assuming independent sampling errors) provides good inference even when effects are correlated.
Some simulated data (see below):
Code
sim_multi_meta <- function(k,
p,
mus,
tau2s,
ro = 0,
rs = 0,
n = NULL,
sample_p = FALSE,
sample_n = NULL,
ranef = FALSE){
RT <- ro + diag(1 - ro, p)
RS <- rs + diag(1 - rs, p)
S <- diag(sqrt(tau2s)) %*% RT %*% diag(sqrt(tau2s))
TT <- MASS::mvrnorm(k, rep(0, p), S)
yi <- vi <- deltai <- vvi <- vector(mode = "list", length = k)
for(i in 1:k){
X0 <- MASS::mvrnorm(n, rep(0, p), RS)
X1 <- MASS::mvrnorm(n, mus + TT[i, ], RS)
mm0 <- apply(X0, 2, mean)
mm1 <- apply(X1, 2, mean)
vv01 <- cov(X0)/n + cov(X1)/n
yi[[i]] <- mm1 - mm0
vi[[i]] <- diag(vv01)
vvi[[i]] <- vv01
deltai[[i]] <- TT[i, ]
}
if(sample_p){
pi <- sample(1:p, k, replace = TRUE)
} else{
pi <- rep(p, k)
}
study <- rep(1:k, each = p)
outcome <- rep(1:p, k)
sim <- data.frame(
study,
outcome
)
sim$yi <- unlist(yi)
sim$vi <- unlist(vi)
vvi <- do.call(rbind, vvi)
vvi <- data.frame(vvi)
names(vvi) <- sprintf("v%si", 1:ncol(vvi))
sim <- cbind(sim, vvi)
if(ranef){
sim$deltai <- unlist(deltai)
}
# selecting studies
siml <- split(sim, sim$study)
for(i in 1:k){
siml[[i]] <- siml[[i]][1:pi[i], ]
}
sim <- do.call(rbind, siml)
sim$outcome <- factor(paste0("o", sim$outcome))
rownames(sim) <- NULL
return(sim)
}dat <- sim_multi_meta(k = 100, # 10 studies
p = 3, # 3 outcomes e.g. depression, anxiety, etc.
mus = c(0, 0, 0), # true outcomes means
tau2s = c(0.1, 0.1, 0.1), # true outcomes tau2s
ro = 0.5, # correlation between outcomes
rs = 0.8, # correlation between sampling errors
n = 50) # sample size in each study
dat <- dat[, c("study", "outcome", "yi", "vi")]
head(dat) study outcome yi vi
1 1 o1 0.6166690 0.03086006
2 1 o2 0.3173710 0.04357880
3 1 o3 0.4087180 0.03725503
4 2 o1 -0.6009339 0.03318603
5 2 o2 -0.2818165 0.03504323
6 2 o3 -0.1316034 0.03821046
Now we know that outcomes are likely to be correlated but we don’t want to guess a value and we have any information. Van den Noortgate et al. (2013) suggested (and demonstrated) that using a three-level model is still valid. Let’s try it:
fit_3l <- rma.mv(yi, vi, mods = ~ 0 + outcome, random = ~ 1|study/outcome, data = dat, sparse = TRUE)
fit_3l
Multivariate Meta-Analysis Model (k = 300; method: REML)
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0805 0.2837 100 no study
sigma^2.2 0.0147 0.1212 300 no study/outcome
Test for Residual Heterogeneity:
QE(df = 297) = 1026.6152, p-val < .0001
Test of Moderators (coefficients 1:3):
QM(df = 3) = 2.8032, p-val = 0.4230
Model Results:
estimate se zval pval ci.lb ci.ub
outcomeo1 -0.0249 0.0367 -0.6785 0.4974 -0.0969 0.0471
outcomeo2 0.0196 0.0367 0.5332 0.5939 -0.0523 0.0914
outcomeo3 -0.0288 0.0367 -0.7853 0.4323 -0.1008 0.0432
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now let’s fit the correct model:
V <- vcalc(vi, cluster = study, obs = outcome, rho = 0.8, data = dat)
fit_mv <- rma.mv(yi, V, mods = ~ 0 + outcome, random = ~ 1|study/outcome, data = dat, sparse = TRUE)
fit_mv
Multivariate Meta-Analysis Model (k = 300; method: REML)
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0499 0.2234 100 no study
sigma^2.2 0.0455 0.2133 300 no study/outcome
Test for Residual Heterogeneity:
QE(df = 297) = 1638.3953, p-val < .0001
Test of Moderators (coefficients 1:3):
QM(df = 3) = 2.8674, p-val = 0.4125
Model Results:
estimate se zval pval ci.lb ci.ub
outcomeo1 -0.0261 0.0368 -0.7106 0.4774 -0.0982 0.0460
outcomeo2 0.0186 0.0367 0.5075 0.6118 -0.0533 0.0906
outcomeo3 -0.0297 0.0368 -0.8084 0.4188 -0.1019 0.0424
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Suggested workflow
This post by Wolfgang https://wviechtb.github.io/metafor/reference/misc-recs.html is about some best practices for complex models. Let’s unpack important parts.
Profile Likelihood
A quick and dirty intuition about the profile likelihood is the following. Assuming to have a vector of numbers x and i want to fit a linear model finding the mean and standard deviation using the maximum likelihood:
- Define the log-likelihood function. This is implemented in
dnorm()or any otherd*distribution.
\[ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \]
- Choose a value for \(\mu\) and \(\sigma^2\), find the likelihood of observed data
- Repeat for other values and find the combination of \(\mu\) and \(\sigma^2\) that maximise the previous equation.
You can do it manually with a grid-search or using some optimizers but the idea is the same. Furthemore, for some models we already know that the MLE estimator for the mean of a gaussian distribution is the sample mean mean(x).
Example with a grid-search:
# simulate some data with true parameters
set.seed(5075)
mu <- 10
s <- 5
n <- 50
x <- rnorm(n, mu, s)
# now, to find mu and sigma, we could do a two-dimensional
# optimization. Roughly speaking with a grid search we try
# different value of mu and s finding the maximum likelihood
# combination
# around true value just for testing
mui <- seq(8, 12, length.out = 100)
si <- seq(3, 7, length.out = 100)
lld <- expand.grid(
mui = mui,
si = si
)
lld <- expand.grid(mui = mui, si = si)
lld$ll <- NA_real_
for (i in 1:nrow(lld)) {
lld$ll[i] <- prod(dnorm(x, lld$mui[i], lld$si[i]))
}
ggplot(lld, aes(x = si, y = mui, z = ll)) +
stat_contour_filled(show.legend = FALSE) +
geom_vline(xintercept = sd(x)) +
geom_hline(yintercept = mean(x))For a fancier but useful 3D plot:
Code
library(plotly)
# convert to wide format: rows = mui, columns = si, values = ll
zmat <- tidyr::pivot_wider(lld, names_from = si, values_from = ll)
zz <- as.matrix(zmat[,-1]) # the ll values as matrix
xx <- unique(lld$si) # x axis
yy <- unique(lld$mui) # y axis
plot_ly(
x = xx,
y = yy,
z = zz,
type = "surface",
colorscale = "Viridis",
showscale = FALSE
) |>
layout(
showlegend = FALSE,
scene = list(
xaxis = list(title = "σ"),
yaxis = list(title = "μ"),
zaxis = list(title = "log-likelihood")
)
)What is the point? If you move the plot having the mountain in front of you, you clearly see that for each value of x (\(\sigma^2\)) in this case, there is always a \(\mu\) in the peak. In other terms you can slice the mountain at the highest \(\mu\) for a given vector of \(\sigma^2\). This is the profile likelihood.
In practical terms, for a given \(\sigma^2\), the MLE estimator is always mean(x). Thus you can fix \(\mu = \bar x\) and optimize only over \(\sigma^2\). This reduces the 2D optimization to a 1D optimization.
# the mle estimator for mu is mean(x)
# thus for any fixed value of sigma, mean(x) is the MLE estimator
mle_mu <- mean(x)
mle_mu[1] 9.951154
# now i can optimize the log likelihood fixing mu
si [1] 3.000000 3.040404 3.080808 3.121212 3.161616 3.202020 3.242424 3.282828
[9] 3.323232 3.363636 3.404040 3.444444 3.484848 3.525253 3.565657 3.606061
[17] 3.646465 3.686869 3.727273 3.767677 3.808081 3.848485 3.888889 3.929293
[25] 3.969697 4.010101 4.050505 4.090909 4.131313 4.171717 4.212121 4.252525
[33] 4.292929 4.333333 4.373737 4.414141 4.454545 4.494949 4.535354 4.575758
[41] 4.616162 4.656566 4.696970 4.737374 4.777778 4.818182 4.858586 4.898990
[49] 4.939394 4.979798 5.020202 5.060606 5.101010 5.141414 5.181818 5.222222
[57] 5.262626 5.303030 5.343434 5.383838 5.424242 5.464646 5.505051 5.545455
[65] 5.585859 5.626263 5.666667 5.707071 5.747475 5.787879 5.828283 5.868687
[73] 5.909091 5.949495 5.989899 6.030303 6.070707 6.111111 6.151515 6.191919
[81] 6.232323 6.272727 6.313131 6.353535 6.393939 6.434343 6.474747 6.515152
[89] 6.555556 6.595960 6.636364 6.676768 6.717172 6.757576 6.797980 6.838384
[97] 6.878788 6.919192 6.959596 7.000000
ll <- sapply(si, function(sii) prod(dnorm(x, mle_mu, sii)))
plot(si, ll, type = "l", xlab = "sigma")
points(si[which.max(ll)], max(ll), col = "firebrick", cex = 2, pch = 19)
abline(v = sd(x))
text(si[which.max(ll)] - 0.1, y = min(ll), labels = "sd(x)")The same idea is applied to more complex models such as meta-analyis and mixed-models. The idea is to re-write the log-likelihood in a way that you optimize over parameters that you care profiling out nuisance parameters.
What happens with meta-analysis is the same, \(\mu\) or \(\boldsymbol{\beta}\) are profiled out optimizing over \(\tau^2\)(s).
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
fit <- rma(yi, vi, data = dat)
profile(fit)Robust Variance Estimation
The formal definition of cluster robust is out of my expertise but the idea is the following:
- With complex data structure (multilevel and/or multivariate) the \(V^2\) matrix is most of the time constructed without data but using plausible guess, previous literature, etc.
- When we specify a model in
metaforusing different random structures and/or differentVmatrices we are creating a working model that can be a very bad or very good (or in the middle) approximation of the true correlation matrix - With a misspecified \(V^2\) matrix, the regression coefficients are still unbiased
- the RVE approach calculate standard errors that provide valid inference even with a misspecified working model. If the model is misspecified estimates are less precise but valid. If the model is more accurate (I have raw data, I have plausible guesses, etc.) standard errors are more precise.
Some papers
Packages
clubSandwichpackagemetafor::robust()(basically usesclubSandwich)
Some videos
https://www.youtube.com/watch?v=sJUc39vneWk https://www.youtube.com/watch?v=XsyUzaZHs5o https://www.youtube.com/watch?v=j3Sxbkd9iIs
Examples
Example with previous models:
# multivariate
dat <- dat.berkey1998
V <- vcalc(vi, cluster = trial, obs = outcome, data = dat, rho = 0.7)
fit_mv <- rma.mv(yi, V, mods = ~ 0 + outcome, random = ~ outcome | trial, struct="UN", data=dat)
fit_mv
Multivariate Meta-Analysis Model (k = 10; method: REML)
Variance Components:
outer factor: trial (nlvls = 5)
inner factor: outcome (nlvls = 2)
estim sqrt k.lvl fixed level
tau^2.1 0.0331 0.1819 5 no AL
tau^2.2 0.0144 0.1199 5 no PD
rho.AL rho.PD AL PD
AL 1 - 5
PD 0.4536 1 no -
Test for Residual Heterogeneity:
QE(df = 8) = 176.9217, p-val < .0001
Test of Moderators (coefficients 1:2):
QM(df = 2) = 92.7518, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
outcomeAL -0.3527 0.0884 -3.9892 <.0001 -0.5261 -0.1794 ***
outcomePD 0.3636 0.0633 5.7425 <.0001 0.2395 0.4877 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
robust(fit_mv, cluster = trial, adjust = TRUE, clubSandwich = TRUE)
Multivariate Meta-Analysis Model (k = 10; method: REML)
Variance Components:
outer factor: trial (nlvls = 5)
inner factor: outcome (nlvls = 2)
estim sqrt k.lvl fixed level
tau^2.1 0.0331 0.1819 5 no AL
tau^2.2 0.0144 0.1199 5 no PD
rho.AL rho.PD AL PD
AL 1 - 5
PD 0.4536 1 no -
Test for Residual Heterogeneity:
QE(df = 8) = 176.9217, p-val < .0001
Number of estimates: 10
Number of clusters: 5
Estimates per cluster: 2
Test of Moderators (coefficients 1:2):¹
F(df1 = 2, df2 = 2.86) = 34.8727, p-val = 0.0098
Model Results:
estimate se¹ tval¹ df¹ pval¹ ci.lb¹ ci.ub¹
outcomeAL -0.3527 0.0896 -3.9355 3.81 0.0187 -0.6066 -0.0989 *
outcomePD 0.3636 0.0630 5.7754 3.84 0.0050 0.1860 0.5412 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
1) results based on cluster-robust inference (var-cov estimator: CR2,
approx t/F-tests and confidence intervals, df: Satterthwaite approx)
# multilevel
dat <- dat.konstantopoulos2011
fit_ml <- rma.mv(yi, vi, random = ~ 1|district/study, data=dat)
fit_ml
Multivariate Meta-Analysis Model (k = 56; method: REML)
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0651 0.2551 11 no district
sigma^2.2 0.0327 0.1809 56 no district/study
Test for Heterogeneity:
Q(df = 55) = 578.8640, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
0.1847 0.0846 2.1845 0.0289 0.0190 0.3504 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
robust(fit_ml, cluster = district, adjust = TRUE, clubSandwich = TRUE)
Multivariate Meta-Analysis Model (k = 56; method: REML)
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0651 0.2551 11 no district
sigma^2.2 0.0327 0.1809 56 no district/study
Test for Heterogeneity:
Q(df = 55) = 578.8640, p-val < .0001
Number of estimates: 56
Number of clusters: 11
Estimates per cluster: 3-11 (mean: 5.09, median: 4)
Model Results:
estimate se¹ tval¹ df¹ pval¹ ci.lb¹ ci.ub¹
0.1847 0.0845 2.1870 9.86 0.0540 -0.0038 0.3733 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
1) results based on cluster-robust inference (var-cov estimator: CR2,
approx t-test and confidence interval, df: Satterthwaite approx)
References
Footnotes
https://www.metafor-project.org/doku.php/analyses:konstantopoulos2011↩︎