Multilevel and Multivariate Meta-Analysis

Notes from the Advanced MA Course by Wolfgang Viechtbauer

#meta-analysis

Author

Filippo Gambarota

Published

November 14, 2025

Modified

December 1, 2025

library(tidyverse)
library(metafor)

mtheme <- function(size = 15){
    theme_minimal(size)
}

theme_set(mtheme())

These notes summarises some of the content of the Advanced Meta-Analysis course by Wolfgang Viechtbauer with some personal additions.

Quick recap about linear models

Standard linear models have some assumptions and can be formalized using the usual equation:

\[ y_i = \beta_0 + \beta_{1}x_{1_i} + \beta_{2}x_{2_i} \dots \beta_{p}x_{p_i} + \epsilon_i \]

Then \(\epsilon_i\) is the crucial part. The assumption of the linear model is the following:

\[ \epsilon_i \sim \mathcal{N}(0, \sigma^2) \]

More compactly:

\[ y_i \sim \mathcal{N}(X_i\boldsymbol{\beta}, \sigma^2) \]

Or in matrix form (relevant for other models):

\[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \sigma^2\mathbf{I}) \]

Or:

\[ \underset{n \times 1}{\mathbf{y}} = \underset{n \times p}{\mathbf{X}} \;\underset{p \times 1}{\boldsymbol{\beta}} + \underset{n \times 1}{\boldsymbol{\epsilon}} \]

With:

\[ \boldsymbol{\epsilon} \sim \mathcal{N}\!\left( \boldsymbol{0}, \; \sigma^{2}\, \underset{n \times n}{\mathbf{I}} \right) \]

Where \(\mathbf{I}\) is an identity matrix of dimension \(n \times n\). Basically a scalar multiplied by an identity matrix gives a matrix with diagonal elements equals to the scalar.

n <- 5 # number of observations
s2 <- 10 # residual variance
I <- diag(n)
I

     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    0    0    0
[2,]    0    1    0    0    0
[3,]    0    0    1    0    0
[4,]    0    0    0    1    0
[5,]    0    0    0    0    1

s2*I

     [,1] [,2] [,3] [,4] [,5]
[1,]   10    0    0    0    0
[2,]    0   10    0    0    0
[3,]    0    0   10    0    0
[4,]    0    0    0   10    0
[5,]    0    0    0    0   10

If you sample from a multivariate normal distribution with this matrix, is the same as sampling from an univariate normal distribution:

n <- 1e3
s2 <- 10 # residual variance
I <- diag(n)
ei_1 <- rnorm(n, 0, sqrt(s2))

# or also
ei_2 <- rnorm(n, 0, sqrt(diag(s2*I)))

Thus the standard linear model makes the critical assumption of homogeneity of variances because each observation is expected on average to vary \(\sigma^2\) around the true value.

Equal and random-effects models

But what is the point? Let’s now move to the equal-effect meta-analysis model (Now \(k\) being the studies):

\[ y_k = \beta_0 + \epsilon_k \]

Each \(y_k\) is the effect size of a certain study but we have also \(\sigma^2_k\) that is the sampling variance. Each study has it’s own sampling variance thus a study-specific expected residual value (\(\epsilon_k\)). The assumption of the previous model is no longer relevant otherwise we are assuming that the precision (\(\frac{1}{\sigma^2_k}\)) of each study being the same (could be a sort of special case).

\[ \epsilon_k \sim \mathcal{N}(0, \sigma^2_k) \]

Or in matrix form:

\[ \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \boldsymbol{V}^2) \]

Where \(V^2\) is the variance-covariance matrix of sampling errors:

k <- 5
(vi <- round(runif(k, 0.01, 0.5), 2))

[1] 0.49 0.34 0.35 0.28 0.33

V <- diag(sqrt(vi)) %*% diag(k) %*% diag(sqrt(vi))
V

     [,1] [,2] [,3] [,4] [,5]
[1,] 0.49 0.00 0.00 0.00 0.00
[2,] 0.00 0.34 0.00 0.00 0.00
[3,] 0.00 0.00 0.35 0.00 0.00
[4,] 0.00 0.00 0.00 0.28 0.00
[5,] 0.00 0.00 0.00 0.00 0.33

Of course the matrix formulation is not necessary here because we have a vector of variances (no covariances between studies). In fact, we can fit the same model using V or the vector.

library(metafor)

# from ?metafor::rma
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)

head(dat)


  trial               author year tpos  tneg cpos  cneg ablat     alloc      yi 
1     1              Aronson 1948    4   119   11   128    44    random -0.8893 
2     2     Ferguson & Simes 1949    6   300   29   274    55    random -1.5854 
3     3      Rosenthal et al 1960    3   228   11   209    42    random -1.3481 
4     4    Hart & Sutherland 1977   62 13536  248 12619    52    random -1.4416 
5     5 Frimodt-Moller et al 1973   33  5036   47  5761    13 alternate -0.2175 
6     6      Stein & Aronson 1953  180  1361  372  1079    44 alternate -0.7861 
      vi 
1 0.3256 
2 0.1946 
3 0.4154 
4 0.0200 
5 0.0512 
6 0.0069

fit_ee <- rma(yi, vi, data = dat, method = "EE")

V <- vcalc(vi, cluster = trial, rho = 0, data = dat)
round(dat$vi[1:5], 2)

[1] 0.33 0.19 0.42 0.02 0.05

round(V[1:5, 1:5], 2)

     [,1] [,2] [,3] [,4] [,5]
[1,] 0.33 0.00 0.00 0.00 0.00
[2,] 0.00 0.19 0.00 0.00 0.00
[3,] 0.00 0.00 0.42 0.00 0.00
[4,] 0.00 0.00 0.00 0.02 0.00
[5,] 0.00 0.00 0.00 0.00 0.05

fit_mv_ee <- rma.mv(yi, V, data = dat)

fit_ee


Equal-Effects Model (k = 13)

I^2 (total heterogeneity / total variability):   92.12%
H^2 (total variability / sampling variability):  12.69

Test for Heterogeneity:
Q(df = 12) = 152.2330, p-val < .0001

Model Results:

estimate      se      zval    pval    ci.lb    ci.ub      
 -0.4303  0.0405  -10.6247  <.0001  -0.5097  -0.3509  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

fit_mv_ee


Multivariate Meta-Analysis Model (k = 13; method: REML)

Variance Components: none

Test for Heterogeneity:
Q(df = 12) = 152.2330, p-val < .0001

Model Results:

estimate      se      zval    pval    ci.lb    ci.ub      
 -0.4303  0.0405  -10.6247  <.0001  -0.5097  -0.3509  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This part about the sampling errors is the same regardless fitting a random or equal effects model. In the random-effects model we no longer have a single true effect \(\mu\) but the average of \(k\) true effects (\(\theta_k\)) that is \(\mu_{\theta}\). With \(\theta_k = \mu_\theta + \delta_k\).

\[ y_k = \mu_\theta + \delta_k + \epsilon_k \]

The additional part \(\delta_k\) is assumed to be normally distributed as:

\[ \delta_k \sim \mathcal{N}(0, \tau^2) \]

With \(\tau^2\) being the variance of true effects \(\theta_k \sim \mathcal{N}(\mu_\theta, \tau^2)\).

Tip

More compactly we can write a full equation as:

\[ y_k \sim \mathcal{N}(\mu_\theta, \tau^2 + \sigma^2_k) \]

Also for \(\delta_k\) we can do the same matrix-formulation as \(\epsilon_k\).

So more generally and following Cheung (2015) we can write the model as:

\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{\delta} + \mathbf{\epsilon} \]

\[ \underbrace{ \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_k \end{bmatrix} }_{\mathbf{y}} = \underbrace{ \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix} }_{X} \underbrace{ \begin{bmatrix} \beta_0 \end{bmatrix} }_{\beta} + \underbrace{ \begin{bmatrix} \delta_1 \\ \delta_2 \\ \vdots \\ \delta_k \end{bmatrix} }_{\delta} + \underbrace{ \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_k \end{bmatrix} }_{\epsilon} \,, \]

Multilevel and Multivariate models

There is some confusion about multilevel and multivariate in meta-analysis (see https://jepusto.com/posts/what-does-multivariate-mean/).

My suggestion is to think about the \(\mathbf{V}^2\) matrix. If the \(\mathbf{V}^2\) is diagonal then the model can be a two-level model (the usual meta-analysis model) or a \(n\)-level model (for example the three-level model).

If the \(\mathbf{V}^2\) matrix is not diagonal, this means that there are correlations between sampling errors thus the same participants contribute to one or more effect sizes computation.

Multilevel

The classical situation is when there are multiple independent experiments within a paper. Effects are correlated because they share the same paper and not because participants overlapping in effect sizes computation.

Let’s start from the konstantopoulos example by metafor:

dat <- dat.konstantopoulos2011
head(dat, 15)


   district school study year     yi    vi 
1        11      1     1 1976 -0.180 0.118 
2        11      2     2 1976 -0.220 0.118 
3        11      3     3 1976  0.230 0.144 
4        11      4     4 1976 -0.300 0.144 
5        12      1     5 1989  0.130 0.014 
6        12      2     6 1989 -0.260 0.014 
7        12      3     7 1989  0.190 0.015 
8        12      4     8 1989  0.320 0.024 
9        18      1     9 1994  0.450 0.023 
10       18      2    10 1994  0.380 0.043 
11       18      3    11 1994  0.290 0.012 
12       27      1    12 1976  0.160 0.020 
13       27      2    13 1976  0.650 0.004 
14       27      3    14 1976  0.360 0.004 
15       27      4    15 1976  0.600 0.007

Here district is the paper in the example above but the idea is the same. This is clearly a multilevel model and we can express it as:

\[ y_{kj} = \mu + \delta_k + \zeta_{kj} + \epsilon_{kj} \]

\[ \delta_i \sim \mathcal{N}(0, \tau^2_3) \]

\[ \zeta_{ij} \sim \mathcal{N}(0, \tau^2_2) \]

We can display the idea of this model as:

Code

k <- 5
j <- 5

tau2_3 <- 0.1
tau2_2 <- 0.05


nj <- sample(1:j, k)

paper <- rep(1:k, nj)
study <- unlist(sapply(nj, function(x) 1:x))

dat <- data.frame(
    paper, study
)

deltak <- rnorm(k, 0, sqrt(tau2_3))
zetakj <- rnorm(nrow(dat), 0, sqrt(tau2_2))

dat$deltak <- deltak[dat$paper]
dat$theta <- with(dat, deltak + zetakj)

ggplot(dat, aes(x = theta, y = paper)) +
    geom_point(position = position_jitter(height = 0.4, seed = 2025)) +
    geom_point(aes(x = deltak), col = "firebrick", size = 5) +
    geom_segment(aes(x = theta, xend = deltak, y = paper, yend = paper),
                 position = position_jitter(height = 0.4, seed = 2025)) +
    theme_bw(20) +
    xlab(latex2exp::TeX("$\\theta$")) +
    ylab("Paper") +
    #xlim(c(-1, 1)) +
    geom_vline(xintercept = 0, lty = "dotted") +
    geom_segment(aes(x = deltak, xend = 0, y = paper, yend = paper),
                 col = "firebrick")

The red dots/lines are \(\delta_k\), black dots/lines are \(\zeta_{kj}\). The average length of the black lines depends on \(\tau^2_2\) while the average lenght of the red lines depends on \(\tau^2_3\).

The interplay between \(\tau^2_2\) and \(\tau^2_3\) is something very familiar in multilevel modeling i.e. the Intraclass Correlation Coefficient (ICC):

\[ \mbox{ICC} = \frac{\tau^2_3}{\tau^2_2 + \tau^2_3} \]

Can be considered as the proportion of the total true variance explained by the clustering (i.e., the paper membership in this case). When the ICC is close to 1 the clustering is really important. When the ICC is close to 0 belonging to a paper is not informative.

In matrix form, the \(T^2\) matrix is no longer diagonal (as in the standard two-level model) but is a combination of between and within variance.

Let’s fit the model using rma.mv instead of rma:

dat <- dat.konstantopoulos2011
fit <- rma.mv(yi, vi, random = ~ 1|district/study, data = dat)
fit


Multivariate Meta-Analysis Model (k = 56; method: REML)

Variance Components:

            estim    sqrt  nlvls  fixed          factor 
sigma^2.1  0.0651  0.2551     11     no        district 
sigma^2.2  0.0327  0.1809     56     no  district/study 

Test for Heterogeneity:
Q(df = 55) = 578.8640, p-val < .0001

Model Results:

estimate      se    zval    pval   ci.lb   ci.ub    
  0.1847  0.0846  2.1845  0.0289  0.0190  0.3504  * 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In the Variance Components section there are the two estimated variances: between and within district. We can also manually calculate the ICC:

# the order is the same as the displayed one, the first is district and second is study within district
fit$sigma2
## [1] 0.06506194 0.03273652
fit$sigma2 / sum(fit$sigma2)
## [1] 0.6652655 0.3347345

We can also visualize the \(T^2\) matrix using a little custom function:

Code

Tcalc <- function(k, j, tau2_3 = 0, tau2_2 = 0){
    cc <- split(j, k)
    T2diag <- diag(nrow = length(j))
    T2k <- tau2_3 * T2diag
    blockT2j <- lapply(cc, function(x) matrix(tau2_2, ncol = length(x), nrow = length(x)))
    fullT2j <- bdiag(blockT2j)
    T2 <- T2k + fullT2j
    list(
        T2 = bdiag(T2),
        T2k = bdiag(T2k),
        T2j = fullT2j
    )
}

The first matrix is the combination of the between (the second) and within (the third) matrices.

TT <- Tcalc(dat$district, dat$study, fit$sigma2[1], fit$sigma2[2])
TT$T2[1:10, 1:10]

10 x 10 sparse Matrix of class "dsCMatrix"
                                                                       
 [1,] 0.09779846 0.03273652 0.03273652 0.03273652 .          .         
 [2,] 0.03273652 0.09779846 0.03273652 0.03273652 .          .         
 [3,] 0.03273652 0.03273652 0.09779846 0.03273652 .          .         
 [4,] 0.03273652 0.03273652 0.03273652 0.09779846 .          .         
 [5,] .          .          .          .          0.09779846 0.03273652
 [6,] .          .          .          .          0.03273652 0.09779846
 [7,] .          .          .          .          0.03273652 0.03273652
 [8,] .          .          .          .          0.03273652 0.03273652
 [9,] .          .          .          .          .          .         
[10,] .          .          .          .          .          .         
                                                 
 [1,] .          .          .          .         
 [2,] .          .          .          .         
 [3,] .          .          .          .         
 [4,] .          .          .          .         
 [5,] 0.03273652 0.03273652 .          .         
 [6,] 0.03273652 0.03273652 .          .         
 [7,] 0.09779846 0.03273652 .          .         
 [8,] 0.03273652 0.09779846 .          .         
 [9,] .          .          0.09779846 0.03273652
[10,] .          .          0.03273652 0.09779846

TT$T2k[1:10, 1:10]

10 x 10 sparse Matrix of class "dsCMatrix"
                                                                       
 [1,] 0.06506194 .          .          .          .          .         
 [2,] .          0.06506194 .          .          .          .         
 [3,] .          .          0.06506194 .          .          .         
 [4,] .          .          .          0.06506194 .          .         
 [5,] .          .          .          .          0.06506194 .         
 [6,] .          .          .          .          .          0.06506194
 [7,] .          .          .          .          .          .         
 [8,] .          .          .          .          .          .         
 [9,] .          .          .          .          .          .         
[10,] .          .          .          .          .          .         
                                                 
 [1,] .          .          .          .         
 [2,] .          .          .          .         
 [3,] .          .          .          .         
 [4,] .          .          .          .         
 [5,] .          .          .          .         
 [6,] .          .          .          .         
 [7,] 0.06506194 .          .          .         
 [8,] .          0.06506194 .          .         
 [9,] .          .          0.06506194 .         
[10,] .          .          .          0.06506194

TT$T2j[1:10, 1:10]

10 x 10 sparse Matrix of class "dsCMatrix"
                                                                       
 [1,] 0.03273652 0.03273652 0.03273652 0.03273652 .          .         
 [2,] 0.03273652 0.03273652 0.03273652 0.03273652 .          .         
 [3,] 0.03273652 0.03273652 0.03273652 0.03273652 .          .         
 [4,] 0.03273652 0.03273652 0.03273652 0.03273652 .          .         
 [5,] .          .          .          .          0.03273652 0.03273652
 [6,] .          .          .          .          0.03273652 0.03273652
 [7,] .          .          .          .          0.03273652 0.03273652
 [8,] .          .          .          .          0.03273652 0.03273652
 [9,] .          .          .          .          .          .         
[10,] .          .          .          .          .          .         
                                                 
 [1,] .          .          .          .         
 [2,] .          .          .          .         
 [3,] .          .          .          .         
 [4,] .          .          .          .         
 [5,] 0.03273652 0.03273652 .          .         
 [6,] 0.03273652 0.03273652 .          .         
 [7,] 0.03273652 0.03273652 .          .         
 [8,] 0.03273652 0.03273652 .          .         
 [9,] .          .          0.03273652 0.03273652
[10,] .          .          0.03273652 0.03273652

The \(V^2\) matrix i.e., the sampling errors is diagonal because effect sizes are calculated on different participants. We can also visualize the matrix:

dat$vi[1:10]

 [1] 0.118 0.118 0.144 0.144 0.014 0.014 0.015 0.024 0.023 0.043

fit$V[1:10, 1:10]

       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
 [1,] 0.118 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
 [2,] 0.000 0.118 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
 [3,] 0.000 0.000 0.144 0.000 0.000 0.000 0.000 0.000 0.000 0.000
 [4,] 0.000 0.000 0.000 0.144 0.000 0.000 0.000 0.000 0.000 0.000
 [5,] 0.000 0.000 0.000 0.000 0.014 0.000 0.000 0.000 0.000 0.000
 [6,] 0.000 0.000 0.000 0.000 0.000 0.014 0.000 0.000 0.000 0.000
 [7,] 0.000 0.000 0.000 0.000 0.000 0.000 0.015 0.000 0.000 0.000
 [8,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.024 0.000 0.000
 [9,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.023 0.000
[10,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.043

The \(V^2\) matrix can be also constructed and used within the model. This is not usually done with multilevel models but always with multivariate models. There is the vcalc function:

# notice the rho = 0, independent sampling errors
V <- vcalc(vi, cluster = district, obs = study, rho = 0, data = dat)

V[1:10, 1:10]

       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
 [1,] 0.118 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
 [2,] 0.000 0.118 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
 [3,] 0.000 0.000 0.144 0.000 0.000 0.000 0.000 0.000 0.000 0.000
 [4,] 0.000 0.000 0.000 0.144 0.000 0.000 0.000 0.000 0.000 0.000
 [5,] 0.000 0.000 0.000 0.000 0.014 0.000 0.000 0.000 0.000 0.000
 [6,] 0.000 0.000 0.000 0.000 0.000 0.014 0.000 0.000 0.000 0.000
 [7,] 0.000 0.000 0.000 0.000 0.000 0.000 0.015 0.000 0.000 0.000
 [8,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.024 0.000 0.000
 [9,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.023 0.000
[10,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.043

all.equal(V, fit$V)

[1] TRUE

# same as
# rma.mv(yi, V, random = ~ 1|district/study, data = dat)

here models omitting the random

cov(zetai, zetai) = tau2_2

There is another interesting feature in metafor. We can fit the same three-level model with a different parametrization:

fit_mv <- rma.mv(yi, vi, random = ~ study|district, data = dat)
fit_mv


Multivariate Meta-Analysis Model (k = 56; method: REML)

Variance Components:

outer factor: district (nlvls = 11)
inner factor: study    (nlvls = 56)

            estim    sqrt  fixed 
tau^2      0.0978  0.3127     no 
rho        0.6653             no 

Test for Heterogeneity:
Q(df = 55) = 578.8640, p-val < .0001

Model Results:

estimate      se    zval    pval   ci.lb   ci.ub    
  0.1847  0.0846  2.1845  0.0289  0.0190  0.3504  * 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This is called multivariate parametrization¹ (and will be clear later). The model is exactly the same but:

We estimate directly the ICC (rho)
We estimate the total variance \(\tau^2_3 + \tau^2_2\)

# fit was the standard parametrization
sum(fit$sigma2) # same as tau^2
## [1] 0.09779846
fit$sigma2[1] / sum(fit$sigma2) # same as rho
## [1] 0.6652655

The three-level model is the more appropriate here. What could happen if we fit the wrong or less appropriate models?

The first option is ignoring the nested structure and fitting a standard two-level model:

fit3l <- fit # better name
fit2l <- rma(yi, vi, data = dat) # ignoring nesting

fit2l


Random-Effects Model (k = 56; tau^2 estimator: REML)

tau^2 (estimated amount of total heterogeneity): 0.0884 (SE = 0.0202)
tau (square root of estimated tau^2 value):      0.2974
I^2 (total heterogeneity / total variability):   94.70%
H^2 (total variability / sampling variability):  18.89

Test for Heterogeneity:
Q(df = 55) = 578.8640, p-val < .0001

Model Results:

estimate      se    zval    pval   ci.lb   ci.ub     
  0.1279  0.0439  2.9161  0.0035  0.0419  0.2139  ** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A third option is to aggregate thus reducing each cluster (district) to a single effect size. The metafor::aggregate() function can do it very easily. Notice that is not simply taking the average but more like doing a smaller meta-analysis for each cluster:

# rho = 0 because sampling errors are independent
dat_agg <- aggregate(dat, cluster = district, rho = 0)
dat_agg


   district school study   year     yi    vi 
1        11    2.5   2.5 1976.0 -0.126 0.032 
2        12    2.5   6.5 1989.0  0.067 0.004 
3        18    2.0  10.0 1994.0  0.350 0.007 
4        27    2.5  13.5 1976.0  0.500 0.001 
5        56    2.5  17.5 1997.0  0.051 0.002 
6        58    6.0  25.0 1976.0 -0.042 0.001 
7        71    2.0  32.0 1997.0  0.886 0.004 
8        86    4.5  37.5 1997.0 -0.029 0.000 
9        91    3.5  44.5 2000.0  0.250 0.002 
10      108    3.0  50.0 2000.0  0.015 0.006 
11      644    2.5  54.5 1994.5  0.162 0.019

fit2l_agg <- rma(yi, vi, data = dat_agg)
fit2l_agg


Random-Effects Model (k = 11; tau^2 estimator: REML)

tau^2 (estimated amount of total heterogeneity): 0.0828 (SE = 0.0397)
tau (square root of estimated tau^2 value):      0.2878
I^2 (total heterogeneity / total variability):   98.10%
H^2 (total variability / sampling variability):  52.60

Test for Heterogeneity:
Q(df = 10) = 415.9203, p-val < .0001

Model Results:

estimate      se    zval    pval   ci.lb   ci.ub    
  0.1960  0.0900  2.1785  0.0294  0.0197  0.3724  * 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A final model could be including only the highest level and not the effect size level:

fit2l_3l <- rma.mv(yi, vi, random = ~1|district, data = dat)
fit2l_3l


Multivariate Meta-Analysis Model (k = 56; method: REML)

Variance Components:

            estim    sqrt  nlvls  fixed    factor 
sigma^2    0.0828  0.2878     11     no  district 

Test for Heterogeneity:
Q(df = 55) = 578.8640, p-val < .0001

Model Results:

estimate      se    zval    pval   ci.lb   ci.ub    
  0.1960  0.0900  2.1785  0.0294  0.0197  0.3724  * 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s compare the estimated effects and standard errors (the main info for inference):

ll <- list(fit2l = fit2l, fit3l = fit3l, fit2l_3l = fit2l_3l, fit2l_agg = fit2l_agg)
data.frame(
    mu = sapply(ll, function(x) x$b),
    se = sapply(ll, function(x) x$se)
)

                 mu         se
fit2l     0.1279321 0.04387028
fit3l     0.1847132 0.08455592
fit2l_3l  0.1960299 0.08998342
fit2l_agg 0.1960299 0.08998338

Notice that the last model is the same as assuming \(\tau^2_2 = 0\). We have a small impact on the results only because the ICC is quite high thus most of the variance is due to the level that we considered.

metafor can fix all the values to zero for testing, sensitivity analysis but also model comparison:

fit3l_0 <- rma.mv(yi, vi, random = ~1|district/study, data = dat, sigma2 = c(NA, 0))

# the two models are the sameler
fit3l_0


Multivariate Meta-Analysis Model (k = 56; method: REML)

Variance Components:

            estim    sqrt  nlvls  fixed          factor 
sigma^2.1  0.0828  0.2878     11     no        district 
sigma^2.2  0.0000  0.0000     56    yes  district/study 

Test for Heterogeneity:
Q(df = 55) = 578.8640, p-val < .0001

Model Results:

estimate      se    zval    pval   ci.lb   ci.ub    
  0.1960  0.0900  2.1785  0.0294  0.0197  0.3724  * 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

fit2l_3l


Multivariate Meta-Analysis Model (k = 56; method: REML)

Variance Components:

            estim    sqrt  nlvls  fixed    factor 
sigma^2    0.0828  0.2878     11     no  district 

Test for Heterogeneity:
Q(df = 55) = 578.8640, p-val < .0001

Model Results:

estimate      se    zval    pval   ci.lb   ci.ub    
  0.1960  0.0900  2.1785  0.0294  0.0197  0.3724  * 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# model with tau2 fixed to 0 vs estimated
anova(fit3l_0, fit3l)


        df     AIC     BIC    AICc   logLik     LRT   pval       QE 
Full     3 21.9174 27.9394 22.3880  -7.9587                578.8640 
Reduced  2 68.4330 72.4476 68.6637 -32.2165 48.5155 <.0001 578.8640

This means that ignoring the nesting (fit2l) is like assuming that \(\tau^2_3 = 0\) that is far from reality. This is why it is the worst model.

Tip

James Pustejovsky wrote a blog post https://jepusto.com/posts/Sometimes-aggregating-effect-sizes-is-fine/ showing when aggregating is fine.

It also show a model-based way to aggregate instead of using the aggregate function. Basically, we can drop the lowest random effect and provide an appropriate \(V^2\) matrix.

V <- vcalc(vi, cluster = district, obs = study, rho = 0, data = dat)

# dropping the /study
fit_agg_jep <- rma.mv(yi, V, random = ~ 1|district, data = dat)
fit_agg_jep


Multivariate Meta-Analysis Model (k = 56; method: REML)

Variance Components:

            estim    sqrt  nlvls  fixed    factor 
sigma^2    0.0828  0.2878     11     no  district 

Test for Heterogeneity:
Q(df = 55) = 578.8640, p-val < .0001

Model Results:

estimate      se    zval    pval   ci.lb   ci.ub    
  0.1960  0.0900  2.1785  0.0294  0.0197  0.3724  * 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# EXACTLY the same as fit2l_agg
fit2l_agg


Random-Effects Model (k = 11; tau^2 estimator: REML)

tau^2 (estimated amount of total heterogeneity): 0.0828 (SE = 0.0397)
tau (square root of estimated tau^2 value):      0.2878
I^2 (total heterogeneity / total variability):   98.10%
H^2 (total variability / sampling variability):  52.60

Test for Heterogeneity:
Q(df = 10) = 415.9203, p-val < .0001

Model Results:

estimate      se    zval    pval   ci.lb   ci.ub    
  0.1960  0.0900  2.1785  0.0294  0.0197  0.3724  * 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multivariate

The main point of a multivariate model is that the same participants contribute to more than one effect size. Some examples:

multiple outcomes of a treatment: anxiety, depression, etc.
multiple measures of the same outcome
multiple groups compared to the same control group
…

Let’s imagine a simple situation where we have two outcomes of a certain treatment and we compare two groups: control and experimental. The effect size is the difference between the two group means for each outcome.

dat <- data.frame(
    paper = 1,
    m_ctrl = c(0.1, 0),
    m_exp = c(0.5, 0.7),
    sd_ctrl = 1,
    sd_exp = 1,
    n_exp = 50,
    n_ctrl = 50,
    outcome = c("anxiety", "depression")
)

dat

  paper m_ctrl m_exp sd_ctrl sd_exp n_exp n_ctrl    outcome
1     1    0.1   0.5       1      1    50     50    anxiety
2     1    0.0   0.7       1      1    50     50 depression

# calculate effect size
dat <- escalc("SMD", m1i = m_exp, m2i = m_ctrl, sd1i = sd_exp, sd2i = sd_ctrl, n1i = n_exp, n2i = n_ctrl, data = dat)
dat


  paper m_ctrl m_exp sd_ctrl sd_exp n_exp n_ctrl    outcome     yi     vi 
1     1    0.1   0.5       1      1    50     50    anxiety 0.3969 0.0408 
2     1    0.0   0.7       1      1    50     50 depression 0.6946 0.0424

In this example, the same people answered to the depression and the anxiety questionnaires thus the effect are correlated. What is the problem? While the vi (\(\sigma^2\)) can be easily calculated (usually), the correlation between anxiety and depression is not usually (never) reported.

However, let’s imagine create the \(V^2\) matrix for this paper. Usually the \(V^2\) matrix is diagonal (also in the multilevel model):

# this would the the usual matrix, assuming independence
V <- vcalc(vi, cluster = paper, obs = outcome, rho = 0, data = dat)
V
##            [,1]       [,2]
## [1,] 0.04078777 0.00000000
## [2,] 0.00000000 0.04241253

# this is the actual matrix, assuming to know rho = 0.7
V <- vcalc(vi, cluster = paper, obs = outcome, rho = 0.7, data = dat)
V
##            [,1]       [,2]
## [1,] 0.04078777 0.02911455
## [2,] 0.02911455 0.04241253

The diagonal is the same but the off-diagonal is the covariance. The covariance is \(\sigma_{x, y} = \rho_{xy} \sqrt{\sigma^2_x} \sqrt{\sigma^2_y}\) thus:

V[upper.tri(V)]
## [1] 0.02911455
prod(sqrt(dat$vi)) * 0.7
## [1] 0.02911455

Thus the \(V^2\) matrix for more complex models is surely non-diagonal. A diagonal matrix in this case is like assuming that sampling errors are independent (no correlation).

As an example, let’s use the dat.berkey1998 dataset from metafor (see https://www.metafor-project.org/doku.php/analyses:berkey1998 for the full example):

dat <- dat.berkey1998
dat


   trial           author year ni outcome      yi     vi    v1i    v2i 
1      1 Pihlstrom et al. 1983 14      PD  0.4700 0.0075 0.0075 0.0030 
2      1 Pihlstrom et al. 1983 14      AL -0.3200 0.0077 0.0030 0.0077 
3      2    Lindhe et al. 1982 15      PD  0.2000 0.0057 0.0057 0.0009 
4      2    Lindhe et al. 1982 15      AL -0.6000 0.0008 0.0009 0.0008 
5      3   Knowles et al. 1979 78      PD  0.4000 0.0021 0.0021 0.0007 
6      3   Knowles et al. 1979 78      AL -0.1200 0.0014 0.0007 0.0014 
7      4  Ramfjord et al. 1987 89      PD  0.2600 0.0029 0.0029 0.0009 
8      4  Ramfjord et al. 1987 89      AL -0.3100 0.0015 0.0009 0.0015 
9      5    Becker et al. 1988 16      PD  0.5600 0.0148 0.0148 0.0072 
10     5    Becker et al. 1988 16      AL -0.3900 0.0304 0.0072 0.0304

This is a common example but honestly not so common because not only variances vi are reported but also the covariances (or correlations). Basically is like having the correlation between anxiety and depression for each study. But, for the moment, we can ignore the covariance.

Now, we have now multiple rows for each paper and the rows are also correlated (same participants, multiple outcomes). Thus the multilevel model is not entirely appropriate:

dat$effect <- 1:nrow(dat)
fit3l <- rma.mv(yi, vi, random = ~ 1|trial/effect, data = dat)
fit3l


Multivariate Meta-Analysis Model (k = 10; method: REML)

Variance Components:

            estim    sqrt  nlvls  fixed        factor 
sigma^2.1  0.0000  0.0000      5     no         trial 
sigma^2.2  0.1609  0.4011     10     no  trial/effect 

Test for Heterogeneity:
Q(df = 9) = 583.6005, p-val < .0001

Model Results:

estimate      se    zval    pval    ci.lb   ci.ub    
  0.0155  0.1296  0.1198  0.9046  -0.2385  0.2695    

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

What is missing? basically we are estimating an average effect across outcomes (PD and AL). Furthermore we would probably estimate the correlation between PD and AL. We are estimating the correlation of true effects, not the correlation between sampling errors. But we know how to parametrize the model (remember the multivariate parametrization?) using the random = ~ inner|outer format where inner is the variable varying within paper (as the effect size index in the three-level model) and outer is the cluster:

fitmv <- rma.mv(yi, vi, random = ~ outcome|trial, data = dat)
fitmv


Multivariate Meta-Analysis Model (k = 10; method: REML)

Variance Components:

outer factor: trial   (nlvls = 5)
inner factor: outcome (nlvls = 2)

             estim    sqrt  fixed 
tau^2       0.1540  0.3924     no 
rho        -0.7674             no 

Test for Heterogeneity:
Q(df = 9) = 583.6005, p-val < .0001

Model Results:

estimate      se    zval    pval    ci.lb   ci.ub    
  0.0104  0.0649  0.1602  0.8727  -0.1168  0.1376    

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now we have the correlation between outcomes rho and the total heterogeneity tau^2. What is missing? The fixed-effect of the outcome. We want an estimate of PD and AL:

fitmv2 <- rma.mv(yi, vi, mods = ~ 0 + outcome, random = ~ outcome|trial, data = dat)
fitmv2


Multivariate Meta-Analysis Model (k = 10; method: REML)

Variance Components:

outer factor: trial   (nlvls = 5)
inner factor: outcome (nlvls = 2)

            estim    sqrt  fixed 
tau^2      0.0245  0.1567     no 
rho        0.5831             no 

Test for Residual Heterogeneity:
QE(df = 8) = 124.9021, p-val < .0001

Test of Moderators (coefficients 1:2):
QM(df = 2) = 78.0792, p-val < .0001

Model Results:

           estimate      se     zval    pval    ci.lb    ci.ub      
outcomeAL   -0.3323  0.0772  -4.3041  <.0001  -0.4836  -0.1810  *** 
outcomePD    0.3594  0.0780   4.6056  <.0001   0.2064   0.5123  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

What is missing now? Remember that we are assuming that the \(V^2\) matrix is diagonal thus we are not taking into account the correlation. We are only estimating the correlation of true effects and considering sampling errors as independent.

Let’s assume to have \(\rho = 0.7\):

V <- vcalc(vi, cluster = trial, obs = outcome, data = dat, rho = 0.7)

# now we use V not the vector
fitmv3 <- rma.mv(yi, V, mods = ~ 0 + outcome, random = ~ outcome|trial, data = dat)
fitmv3


Multivariate Meta-Analysis Model (k = 10; method: REML)

Variance Components:

outer factor: trial   (nlvls = 5)
inner factor: outcome (nlvls = 2)

            estim    sqrt  fixed 
tau^2      0.0259  0.1608     no 
rho        0.4593             no 

Test for Residual Heterogeneity:
QE(df = 8) = 176.9217, p-val < .0001

Test of Moderators (coefficients 1:2):
QM(df = 2) = 79.5040, p-val < .0001

Model Results:

           estimate      se     zval    pval    ci.lb    ci.ub      
outcomeAL   -0.3455  0.0795  -4.3481  <.0001  -0.5013  -0.1898  *** 
outcomePD    0.3703  0.0796   4.6515  <.0001   0.2143   0.5263  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We are still making an implicit assumption that is about the \(T^2\) matrix. The random effects matrix is now a 2x2 matrix (because we have two outcomes):

\[ \begin{align} \begin{gathered} \mathrm{T}^2 = \begin{bmatrix} \tau_1^2 & & \\ \rho_{21}\tau_2\tau_1 & \tau_2^2 & \\ \end{bmatrix} \end{gathered} \end{align} \]

But in the model we have one \(\tau^2\) (not two) and a single correlation (correct). In fact, rma.mv by default has a struct = "CS" argument (see ?rma.mv).

CS stands for compound symmetry and the idea is to estimate a single \(\tau^2\) and a single \(\rho\) regardless the number of outcomes. Is an assumption (that can be tested) as dropping the random effect in the multilevel model.

Let’s switch to the struct = "UN" to estimate the unstructured matrix:

fitmv4 <- rma.mv(yi, V, mods = ~ 0 + outcome, random = ~ outcome|trial, struct = "UN", data = dat)
fitmv4


Multivariate Meta-Analysis Model (k = 10; method: REML)

Variance Components:

outer factor: trial   (nlvls = 5)
inner factor: outcome (nlvls = 2)

            estim    sqrt  k.lvl  fixed  level 
tau^2.1    0.0331  0.1819      5     no     AL 
tau^2.2    0.0144  0.1199      5     no     PD 

    rho.AL  rho.PD    AL  PD 
AL       1             -   5 
PD  0.4536       1    no   - 

Test for Residual Heterogeneity:
QE(df = 8) = 176.9217, p-val < .0001

Test of Moderators (coefficients 1:2):
QM(df = 2) = 92.7518, p-val < .0001

Model Results:

           estimate      se     zval    pval    ci.lb    ci.ub      
outcomeAL   -0.3527  0.0884  -3.9892  <.0001  -0.5261  -0.1794  *** 
outcomePD    0.3636  0.0633   5.7425  <.0001   0.2395   0.4877  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# and compare with LRT
anova(fitmv3, fitmv4)


        df    AIC    BIC    AICc logLik    LRT   pval       QE 
Full     5 3.0552 3.4524 33.0552 3.4724               176.9217 
Reduced  4 1.5524 1.8702 14.8857 3.2238 0.4972 0.4807 176.9217

Extra for multilevel models

Van den Noortgate et al. (2013) describe how, in some conditions, the multilevel model (thus assuming independent sampling errors) provides good inference even when effects are correlated.

Some simulated data (see below):

Code

sim_multi_meta <- function(k, 
                           p,
                           mus,
                           tau2s,
                           ro = 0,
                           rs = 0,
                           n = NULL,
                           sample_p = FALSE,
                           sample_n = NULL,
                           ranef = FALSE){
    
    RT <- ro + diag(1 - ro, p)
    RS <- rs + diag(1 - rs, p)
    S <- diag(sqrt(tau2s)) %*% RT %*% diag(sqrt(tau2s))
    
    TT <- MASS::mvrnorm(k, rep(0, p), S)
    
    yi <- vi <- deltai <- vvi <- vector(mode = "list", length = k)
    
    for(i in 1:k){
        X0 <- MASS::mvrnorm(n, rep(0, p), RS)
        X1 <- MASS::mvrnorm(n, mus + TT[i, ], RS)
        mm0 <- apply(X0, 2, mean)
        mm1 <- apply(X1, 2, mean)
        
        vv01 <- cov(X0)/n + cov(X1)/n
        
        yi[[i]] <- mm1 - mm0
        vi[[i]] <- diag(vv01)
        
        vvi[[i]] <- vv01
        deltai[[i]] <- TT[i, ]
    }
    
    if(sample_p){
        pi <- sample(1:p, k, replace = TRUE)
    } else{
        pi <- rep(p, k)
    }
    
    study <- rep(1:k, each = p)
    outcome <- rep(1:p, k)
    
    sim <- data.frame(
        study,
        outcome
    )
    
    sim$yi <- unlist(yi)
    sim$vi <- unlist(vi)
    
    vvi <- do.call(rbind, vvi)
    vvi <- data.frame(vvi)
    names(vvi) <- sprintf("v%si", 1:ncol(vvi))
    
    sim <- cbind(sim, vvi)
    
    if(ranef){
        sim$deltai <- unlist(deltai)
    }
    
    # selecting studies
    siml <- split(sim, sim$study)
    for(i in 1:k){
        siml[[i]] <- siml[[i]][1:pi[i], ]
    }
    sim <- do.call(rbind, siml)
    sim$outcome <- factor(paste0("o", sim$outcome))
    rownames(sim) <- NULL
    return(sim)
}

dat <- sim_multi_meta(k = 100, # 10 studies
                      p = 3, # 3 outcomes e.g. depression, anxiety, etc.
                      mus = c(0, 0, 0), # true outcomes means
                      tau2s = c(0.1, 0.1, 0.1), # true outcomes tau2s
                      ro = 0.5, # correlation between outcomes
                      rs = 0.8, # correlation between sampling errors
                      n = 50) # sample size in each study

dat <- dat[, c("study", "outcome", "yi", "vi")]
head(dat)

  study outcome         yi         vi
1     1      o1  0.6166690 0.03086006
2     1      o2  0.3173710 0.04357880
3     1      o3  0.4087180 0.03725503
4     2      o1 -0.6009339 0.03318603
5     2      o2 -0.2818165 0.03504323
6     2      o3 -0.1316034 0.03821046

Now we know that outcomes are likely to be correlated but we don’t want to guess a value and we have any information. Van den Noortgate et al. (2013) suggested (and demonstrated) that using a three-level model is still valid. Let’s try it:

fit_3l <- rma.mv(yi, vi, mods = ~ 0 + outcome, random = ~ 1|study/outcome, data = dat, sparse = TRUE)
fit_3l


Multivariate Meta-Analysis Model (k = 300; method: REML)

Variance Components:

            estim    sqrt  nlvls  fixed         factor 
sigma^2.1  0.0805  0.2837    100     no          study 
sigma^2.2  0.0147  0.1212    300     no  study/outcome 

Test for Residual Heterogeneity:
QE(df = 297) = 1026.6152, p-val < .0001

Test of Moderators (coefficients 1:3):
QM(df = 3) = 2.8032, p-val = 0.4230

Model Results:

           estimate      se     zval    pval    ci.lb   ci.ub    
outcomeo1   -0.0249  0.0367  -0.6785  0.4974  -0.0969  0.0471    
outcomeo2    0.0196  0.0367   0.5332  0.5939  -0.0523  0.0914    
outcomeo3   -0.0288  0.0367  -0.7853  0.4323  -0.1008  0.0432    

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now let’s fit the correct model:

V <- vcalc(vi, cluster = study, obs = outcome, rho = 0.8, data = dat)
fit_mv <- rma.mv(yi, V, mods = ~ 0 + outcome, random = ~ 1|study/outcome, data = dat, sparse = TRUE)

fit_mv


Multivariate Meta-Analysis Model (k = 300; method: REML)

Variance Components:

            estim    sqrt  nlvls  fixed         factor 
sigma^2.1  0.0499  0.2234    100     no          study 
sigma^2.2  0.0455  0.2133    300     no  study/outcome 

Test for Residual Heterogeneity:
QE(df = 297) = 1638.3953, p-val < .0001

Test of Moderators (coefficients 1:3):
QM(df = 3) = 2.8674, p-val = 0.4125

Model Results:

           estimate      se     zval    pval    ci.lb   ci.ub    
outcomeo1   -0.0261  0.0368  -0.7106  0.4774  -0.0982  0.0460    
outcomeo2    0.0186  0.0367   0.5075  0.6118  -0.0533  0.0906    
outcomeo3   -0.0297  0.0368  -0.8084  0.4188  -0.1019  0.0424    

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Suggested workflow

This post by Wolfgang https://wviechtb.github.io/metafor/reference/misc-recs.html is about some best practices for complex models. Let’s unpack important parts.

Profile Likelihood

A quick and dirty intuition about the profile likelihood is the following. Assuming to have a vector of numbers x and i want to fit a linear model finding the mean and standard deviation using the maximum likelihood:

Define the log-likelihood function. This is implemented in dnorm() or any other d* distribution.

\[ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \]

Choose a value for \(\mu\) and \(\sigma^2\), find the likelihood of observed data
Repeat for other values and find the combination of \(\mu\) and \(\sigma^2\) that maximise the previous equation.

You can do it manually with a grid-search or using some optimizers but the idea is the same. Furthemore, for some models we already know that the MLE estimator for the mean of a gaussian distribution is the sample mean mean(x).

Example with a grid-search:

# simulate some data with true parameters
set.seed(5075)

mu <- 10
s <- 5

n <- 50
x <- rnorm(n, mu, s)

# now, to find mu and sigma, we could do a two-dimensional
# optimization. Roughly speaking with a grid search we try
# different value of mu and s finding the maximum likelihood
# combination

# around true value just for testing
mui <- seq(8, 12, length.out = 100)
si  <- seq(3, 7,  length.out = 100)

lld <- expand.grid(
    mui = mui,
    si = si
)

lld <- expand.grid(mui = mui, si = si)
lld$ll <- NA_real_

for (i in 1:nrow(lld)) {
    lld$ll[i] <- prod(dnorm(x, lld$mui[i], lld$si[i]))
}

ggplot(lld, aes(x = si, y = mui, z = ll)) +
    stat_contour_filled(show.legend = FALSE) +
    geom_vline(xintercept = sd(x)) +
    geom_hline(yintercept = mean(x))

For a fancier but useful 3D plot:

Code

library(plotly)

# convert to wide format: rows = mui, columns = si, values = ll
zmat <- tidyr::pivot_wider(lld, names_from = si, values_from = ll)
zz <- as.matrix(zmat[,-1])          # the ll values as matrix
xx <- unique(lld$si)               # x axis
yy <- unique(lld$mui)                # y axis

plot_ly(
    x = xx,
    y = yy,
    z = zz,
    type = "surface",
    colorscale = "Viridis",
    showscale = FALSE
) |>
    layout(
        showlegend = FALSE,
        scene = list(
            xaxis = list(title = "σ"),
            yaxis = list(title = "μ"),
            zaxis = list(title = "log-likelihood")
        )
    )

What is the point? If you move the plot having the mountain in front of you, you clearly see that for each value of x (\(\sigma^2\)) in this case, there is always a \(\mu\) in the peak. In other terms you can slice the mountain at the highest \(\mu\) for a given vector of \(\sigma^2\). This is the profile likelihood.

In practical terms, for a given \(\sigma^2\), the MLE estimator is always mean(x). Thus you can fix \(\mu = \bar x\) and optimize only over \(\sigma^2\). This reduces the 2D optimization to a 1D optimization.

# the mle estimator for mu is mean(x)
# thus for any fixed value of sigma, mean(x) is the MLE estimator

mle_mu <- mean(x)
mle_mu

[1] 9.951154

# now i can optimize the log likelihood fixing mu

si

  [1] 3.000000 3.040404 3.080808 3.121212 3.161616 3.202020 3.242424 3.282828
  [9] 3.323232 3.363636 3.404040 3.444444 3.484848 3.525253 3.565657 3.606061
 [17] 3.646465 3.686869 3.727273 3.767677 3.808081 3.848485 3.888889 3.929293
 [25] 3.969697 4.010101 4.050505 4.090909 4.131313 4.171717 4.212121 4.252525
 [33] 4.292929 4.333333 4.373737 4.414141 4.454545 4.494949 4.535354 4.575758
 [41] 4.616162 4.656566 4.696970 4.737374 4.777778 4.818182 4.858586 4.898990
 [49] 4.939394 4.979798 5.020202 5.060606 5.101010 5.141414 5.181818 5.222222
 [57] 5.262626 5.303030 5.343434 5.383838 5.424242 5.464646 5.505051 5.545455
 [65] 5.585859 5.626263 5.666667 5.707071 5.747475 5.787879 5.828283 5.868687
 [73] 5.909091 5.949495 5.989899 6.030303 6.070707 6.111111 6.151515 6.191919
 [81] 6.232323 6.272727 6.313131 6.353535 6.393939 6.434343 6.474747 6.515152
 [89] 6.555556 6.595960 6.636364 6.676768 6.717172 6.757576 6.797980 6.838384
 [97] 6.878788 6.919192 6.959596 7.000000

ll <- sapply(si, function(sii) prod(dnorm(x, mle_mu, sii)))

plot(si, ll, type = "l", xlab = "sigma")
points(si[which.max(ll)], max(ll), col = "firebrick", cex = 2, pch = 19)
abline(v = sd(x))
text(si[which.max(ll)] - 0.1, y = min(ll), labels = "sd(x)")

The same idea is applied to more complex models such as meta-analyis and mixed-models. The idea is to re-write the log-likelihood in a way that you optimize over parameters that you care profiling out nuisance parameters.

What happens with meta-analysis is the same, \(\mu\) or \(\boldsymbol{\beta}\) are profiled out optimizing over \(\tau^2\)(s).

dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
fit <- rma(yi, vi, data = dat)

profile(fit)

Robust Variance Estimation

The formal definition of cluster robust is out of my expertise but the idea is the following:

With complex data structure (multilevel and/or multivariate) the \(V^2\) matrix is most of the time constructed without data but using plausible guess, previous literature, etc.
When we specify a model in metafor using different random structures and/or different V matrices we are creating a working model that can be a very bad or very good (or in the middle) approximation of the true correlation matrix
With a misspecified \(V^2\) matrix, the regression coefficients are still unbiased
the RVE approach calculate standard errors that provide valid inference even with a misspecified working model. If the model is misspecified estimates are less precise but valid. If the model is more accurate (I have raw data, I have plausible guesses, etc.) standard errors are more precise.

Some papers

Pustejovsky & Tipton (2022)
Pustejovsky & Tipton (2018)
Tipton & Pustejovsky (2015)

Packages

clubSandwich package
metafor::robust()(basically uses clubSandwich)

Some videos

https://www.youtube.com/watch?v=sJUc39vneWk https://www.youtube.com/watch?v=XsyUzaZHs5o https://www.youtube.com/watch?v=j3Sxbkd9iIs

Examples

Example with previous models:

# multivariate
dat <- dat.berkey1998
V <- vcalc(vi, cluster = trial, obs = outcome, data = dat, rho = 0.7)
fit_mv <- rma.mv(yi, V, mods = ~ 0 + outcome, random = ~ outcome | trial, struct="UN", data=dat)

fit_mv


Multivariate Meta-Analysis Model (k = 10; method: REML)

Variance Components:

outer factor: trial   (nlvls = 5)
inner factor: outcome (nlvls = 2)

            estim    sqrt  k.lvl  fixed  level 
tau^2.1    0.0331  0.1819      5     no     AL 
tau^2.2    0.0144  0.1199      5     no     PD 

    rho.AL  rho.PD    AL  PD 
AL       1             -   5 
PD  0.4536       1    no   - 

Test for Residual Heterogeneity:
QE(df = 8) = 176.9217, p-val < .0001

Test of Moderators (coefficients 1:2):
QM(df = 2) = 92.7518, p-val < .0001

Model Results:

           estimate      se     zval    pval    ci.lb    ci.ub      
outcomeAL   -0.3527  0.0884  -3.9892  <.0001  -0.5261  -0.1794  *** 
outcomePD    0.3636  0.0633   5.7425  <.0001   0.2395   0.4877  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

robust(fit_mv, cluster = trial, adjust = TRUE, clubSandwich = TRUE)


Multivariate Meta-Analysis Model (k = 10; method: REML)

Variance Components:

outer factor: trial   (nlvls = 5)
inner factor: outcome (nlvls = 2)

            estim    sqrt  k.lvl  fixed  level 
tau^2.1    0.0331  0.1819      5     no     AL 
tau^2.2    0.0144  0.1199      5     no     PD 

    rho.AL  rho.PD    AL  PD 
AL       1             -   5 
PD  0.4536       1    no   - 

Test for Residual Heterogeneity:
QE(df = 8) = 176.9217, p-val < .0001

Number of estimates:   10
Number of clusters:    5
Estimates per cluster: 2

Test of Moderators (coefficients 1:2):¹
F(df1 = 2, df2 = 2.86) = 34.8727, p-val = 0.0098

Model Results:

           estimate      se¹     tval¹    df¹    pval¹    ci.lb¹    ci.ub¹     
outcomeAL   -0.3527  0.0896   -3.9355   3.81   0.0187   -0.6066   -0.0989    * 
outcomePD    0.3636  0.0630    5.7754   3.84   0.0050    0.1860    0.5412   ** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

1) results based on cluster-robust inference (var-cov estimator: CR2,
   approx t/F-tests and confidence intervals, df: Satterthwaite approx)

# multilevel
dat <- dat.konstantopoulos2011
fit_ml <- rma.mv(yi, vi, random = ~ 1|district/study, data=dat)

fit_ml


Multivariate Meta-Analysis Model (k = 56; method: REML)

Variance Components:

            estim    sqrt  nlvls  fixed          factor 
sigma^2.1  0.0651  0.2551     11     no        district 
sigma^2.2  0.0327  0.1809     56     no  district/study 

Test for Heterogeneity:
Q(df = 55) = 578.8640, p-val < .0001

Model Results:

estimate      se    zval    pval   ci.lb   ci.ub    
  0.1847  0.0846  2.1845  0.0289  0.0190  0.3504  * 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

robust(fit_ml, cluster = district, adjust = TRUE, clubSandwich = TRUE)


Multivariate Meta-Analysis Model (k = 56; method: REML)

Variance Components:

            estim    sqrt  nlvls  fixed          factor 
sigma^2.1  0.0651  0.2551     11     no        district 
sigma^2.2  0.0327  0.1809     56     no  district/study 

Test for Heterogeneity:
Q(df = 55) = 578.8640, p-val < .0001

Number of estimates:   56
Number of clusters:    11
Estimates per cluster: 3-11 (mean: 5.09, median: 4)

Model Results:

estimate      se¹    tval¹    df¹    pval¹    ci.lb¹   ci.ub¹    
  0.1847  0.0845   2.1870   9.86   0.0540   -0.0038   0.3733   . 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

1) results based on cluster-robust inference (var-cov estimator: CR2,
   approx t-test and confidence interval, df: Satterthwaite approx)

References

Pustejovsky, J. E., & Tipton, E. (2018). Small-sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models. Journal of Business & Economic Statistics: A Publication of the American Statistical Association, 36, 672–683. https://doi.org/10.1080/07350015.2016.1247004

Pustejovsky, J. E., & Tipton, E. (2022). Meta-analysis with Robust Variance Estimation: Expanding the Range of Working Models. Prevention Science: The Official Journal of the Society for Prevention Research, 23, 425–438. https://doi.org/10.1007/s11121-021-01246-3

Tipton, E., & Pustejovsky, J. E. (2015). Small-sample adjustments for tests of moderators and model fit using robust variance estimation in meta-regression. Journal of Educational and Behavioral Statistics: A Quarterly Publication Sponsored by the American Educational Research Association and the American Statistical Association, 40, 604–634. https://doi.org/10.3102/1076998615606099

Van den Noortgate, W., López-López, J. A., Marín-Martínez, F., & Sánchez-Meca, J. (2013). Three-level meta-analysis of dependent effect sizes. Behavior Research Methods, 45, 576–594. https://doi.org/10.3758/s13428-012-0261-6

Footnotes

https://www.metafor-project.org/doku.php/analyses:konstantopoulos2011↩︎