Sage Journals: Discover world-class research

Abstract

Multivariate meta-analysis is used to synthesize estimates of multiple quantities (“effect sizes”), such as risk factors or treatment effects, accounting for correlation and typically also heterogeneity. In the most general case, estimation can be intractable if data are sparse (for example, many risk factors but few studies) because the number of model parameters that must be estimated scales quadratically with the number of effect sizes. This article presents a new command, smvmeta, that makes estimation tractable by modeling correlation and heterogeneity in a low-dimensional space via random projection. This reduces the number of model parameters to be linear in the number of effect sizes. smvmeta is demonstrated in a meta-analysis of 23 risk factors for pain after total knee arthroplasty. Validation experiments show that, compared with meta-regression (a reasonable alternative model that could be used when data are sparse), smvmeta can provide substantially more precise estimates (that is, narrower confidence intervals) at little cost in bias.

Keywords

st0749 smvmeta multivariate meta-analysis sparse data dimensionality reduction penalized maximum likelihood risk factors

1 Introduction

Meta-analysis is used to synthesize multiple estimates, most often of one quantity. This is called univariate or pairwise meta-analysis (after pairs of treatments that are compared to estimate treatment effect). Multivariate meta-analysis is used to synthesize one or more estimates of each of multiple quantities. This article is about multivariate meta-analysis when data are sparse, which easily occurs when the number of quantities of interest is even moderately large. For consistency with the nomenclature used in meta (see [META] meta) and much of the meta-analysis literature, the remainder of this article uses “effect size” to refer to the value or an estimate of a quantity of interest, as should be clear from context.

The canonical application of multivariate meta-analysis is arguably diagnostic test accuracy, in which sensitivity and specificity are of interest (Riley, Thompson, and Abrams 2008; Riley 2009). Stata 16 introduced an excellent suite of built-in meta-analysis commands. Support for multivariate meta-analysis was added in Stata 17 (see [META] meta and [META] meta mvregress). Stata’s meta-analysis commands were preceded by third-party add-on commands, notably mvmeta (White 2009, 2011), which does multivariate meta-analysis.

Naïvely, multivariate meta-analysis can be posed as multiple independent univariate meta-analyses or meta-regressions. However, these approaches fail to model potentially important correlations between effect sizes or heterogeneity (differences in the included studies’ estimation targets). Such approaches are expected to yield excessively biased and imprecise estimates (Riley 2009). Multivariate models are recommended in meta-analyses of diagnostic test accuracy because sensitivity and specificity are interdependent, and it is therefore beneficial to account for the correlation between them.

Higher-dimensional problems, such as questions on risk factors, can be challenging because data are often sparse (informally, the number of effect sizes published by studies is less than the number of model parameters that must be estimated). A more detailed description of the sparsity problem is given in section 4. Existing multivariate metaanalysis methods can be applied when data are sparse, but results may be untrustworthy (White 2009) and performance may be poor (White 2011). Sparse multivariate metaanalysis has been addressed previously, for example, by Lin and Chu (2018).

This article describes a new command, smvmeta, that implements a multivariate random-effects meta-analysis model for sparse data. It is a frequentist version of the Bayesian model described by Rose et al. (2021), which in turn was motivated by the model presented by Lin and Chu (2018). Section 2 presents the syntax of smvmeta. Section 3 shows how to use the command, using a meta-analysis of risk factors for pain after total knee arthroplasty as an example. Section 4 provides mathematical details on the model and describes its implementation in smvmeta. Section 5 presents simulationbased validation experiments. The article closes with a discussion.

Figure 1 illustrates how smvmeta makes multivariate meta-analysis tractable when data are sparse. In the most general case in which correlations and heterogeneity are unknown for all effect sizes, the number of model parameters that must be estimated in existing models scales quadratically with the number of effect sizes because correlation and heterogeneity must be estimated for each unique pairing of effect sizes. smvmeta models correlation and heterogeneity in a low-dimensional space, such that the number of effect sizes to be estimated is the dominant term in the model’s complexity and the number of model parameters scales linearly instead of quadratically. This means that fewer studies and study estimates are needed to support estimation. This comes at a price: while the random-effects variance and covariance components are modeled (that is, accounted for), they are not directly estimated. This price may be worth paying because simulations suggest that smvmeta‘s estimates are on average substantially more precise compared with those obtained using meta-regression (section 5).

Figure 1.

Multivariate random-effects meta-analysis can be intractable when there are many effect sizes (for example risk factors) but few studies. In the most general case, the number of model parameters that must be estimated scales quadratically with the number of effect sizes (the models of Riley, Thompson, and Abrams [2008] and Lin and Chu [2018]). smvmeta makes estimation tractable by modeling correlation and heterogeneity in a space of low dimension, q, such that the number of model parameters scales linearly with the number of effect sizes.

2 The smvmeta command

2.1 Syntax

2.1.1 Specify generic effect sizes, standard errors, and a factor variable identifying them

Specify the variables containing generic effect sizes, their standard errors, and a factor variable that identifies the distinct effect sizes to be estimated (for example, risk factors) using smvmeta set:

smvmeta set esvar sevar fvvar

where data are arranged with one estimate per row. esvar and sevar correspond to variables containing point estimates of effect sizes and their standard errors, respectively. fvvar specifies a factor variable identifying the distinct effect sizes to be estimated. The factor variable operator is not allowed.

esvar (and therefore sevar) must be specified in the metric closest to normality, such as log odds-ratios instead of odds ratios.

It is not necessary to specify which study provides a given estimate, and there is no studylabel() option as with Stata’s meta set. For the moment, assume that each study reports at most one estimate for a given effect size.

Data must be smvmeta set. You are free to subsequently change the data without needing to smvmeta set it again, provided you do not change the names of esvar, sevar, or fvvar. If you do change those names, you must smvmeta set the data again.

2.1.2 Sparse multivariate random-effects meta-analysis as declared with smvmeta set

Meta-analysis is performed using the following syntax:

The options est_opts and rep_opts are explained in section 2.2. by and therefore bysort are allowed for smvmeta estimate.

2.1.3 Make a forest plot showing estimates obtained using smvmeta estimate

Stored estimation results can be displayed as a forest plot using the following syntax:

Options rep_opts are not “remembered” from previous smvmeta commands (see section 2.2). The optional columns may be one or more of _fv, _plot, _es, _ci, _lb, _ub, _esci, _p, _k, _Pscore, and _I2:

2.1.4 Redisplay stored estimation results

Stored estimation results can be redisplayed using the following syntax:

Options rep_opts are not “remembered” from previous smvmeta commands (see section 2.2).

2.1.5 Display coefficient legend after estimation

The coefficient legend (how to specify coefficients in an expression) can be displayed after estimation using the following syntax:

2.2 Options

2.2.1 est_opts specifies how estimates are computed and stored

dimension(#) sets the number of dimensions in which to model correlation and heterogeneity. This corresponds to the symbol q in section 4, which explains the role of this option in detail. An error will result unless 2 ≤ q < p, where p is the number of distinct effect sizes to be estimated, as identified by variable fvvar, and is present in the sample selected by if, in, by, etc. By default, smvmeta will search for a suitable value of q. This option is provided because a priori domain knowledge of the underlying dimensionality of the problem may facilitate better modeling.

[no] log displays or suppresses the penalized log likelihood and other maximization information at the start of each iteration. Logging is enabled by default.

2.2.2 rep_opts specifies how stored estimates are reported

eform is a synonym for transform(exp).

transform(transf_name) reports transformed estimates. By default, results are displayed in the metric in which the problem was posed. transf_name affects how results are displayed, not how they are estimated and stored. transf_name is corr, efficacy, exp, invlogit, or tanh:

corr is a synonym for tanh.

efficacy transforms the effect sizes and CIs using the 1 − exp() function (or more precisely, the −expm1() function). This transformation is used, for example, when the effect sizes are log risk-ratios so that the transformed effect sizes can be interpreted as treatment efficacies, 1 − risk ratios.

exp exponentiates effect sizes and CIs. This transformation is used, for example, when the effect sizes are log risk-ratios, log odds-ratios, and log hazard-ratios so that the transformed effect sizes can be interpreted as risk ratios, odds ratios,and hazard ratios.

invlogit transforms effect sizes and CIs with the inverse-logit function, invlogit().This transformation is used, for example, when the effect sizes are logit of proportions so that the transformed effect sizes can be interpreted as proportions.

tanh applies the hyperbolic tangent transformation, tanh(), to the effect sizes and CIs. This transformation is used, for example, when the effect sizes are Fisher’s z-values (compare Z-scores) so that the transformed effect sizes can be interpreted as correlations.

superior(supspec) computes P-scores using the definition of superiority specified by supspec. P-scores are explained in section 4.4. supspec is +inf (the default), -inf, big, or small:

+inf specifies that, of two effect sizes, the one closest to +∞ is superior. -inf specifies the opposite.

big specifies that, of two effect sizes, the one with the largest magnitude is superior. small specifies the opposite.

P-scores are computed in the metric of esvar; this is relevant to the efficacy transform, which reverses estimates’ signs.

sort(column [, ascending| descending] ) sorts transformed results by column. By default, results are sorted in ascending order. column is one of _fv, _es, _p _k, _Pscore, or _I2:

_fv sorts by the levels of fvvar and is the default.

_es sorts by estimated effect size.

_p sorts by p-value.

_k sorts by number of observations (estimates) included.

_Pscore sorts by P-score.

_I2 sorts by I².

level(#) specifies the confidence level as a percentage.

smvmeta does not “remember” rep_opts from previous smvmeta commands, so it is essential to specify how results should be reported when displaying stored estimation results (for example, when using smvmeta to redisplay stored results or using smvmeta forestplot to produce figures).

2.2.3 forest_opts facilitate some control over forest plots

if(exp) restricts the results included in the forest plot to those identified by the expression. This option uses the stored estimates; it does not result in reestimation. The same variable names may be used as for the sort() option.

cformat(%fmt) and pformat(%fmt) specify the formats of coefficients and their CIs and p-values, respectively. By default, c(cformat) and c(pformat) are used.

forest(meta_forest_opts) passes meta_forest_opts as they are as options to meta forestplot (see [META] meta forestplot), which provides some control over the forest plot. For example, if you want to title the effect-size column Correlation, use the option forest(columnopts(_es, title("Correlation"))). Similarly, the option forest(note("")) removes the note stating the model used. forest() cannot be repeated, so all options must be passed in meta_forest_opts. Other options provided by meta forestplot may be used, but smvmeta forestplot and meta forestplot may not always play nicely with one another. The help file for smvmeta provides more details on how forest plots are constructed, particularly how the CIs are constructed for graphical display.

2.3 Stored results

In addition to the above, the following are stored in r():

Note that results stored in r() are updated when the command is replayed and will be replaced when any r-class command is run after the estimation command.

3 Example: Risk factors for pain after knee replacement

3.1 Background

The development of smvmeta was motivated by a systematic review of pain and function after total knee arthroplasty for osteoarthritis (Olsen et al. 2020, 2022, 2023). Around 20% of patients experience pain and poor function after surgery. The identification of risk factors for pain and function could lead to better patient outcomes.

The review identified and extracted estimates of association between risk factors measured at baseline prior to surgery and pain measured 12 months after surgery. Because it is generally not possible to randomize patients to risk factors, the review was based on observational studies. To facilitate meta-analysis of estimates of association reported using various metrics (for example, risk ratios, odds ratios, mean differences, and correlations) and between risk factors and pain assessments measured using a mix of binomial and continuous variables, correlation coefficients were extracted or imputed and then Fisher z-transformed. The following example is illustrative and may differ from the published systematic review.

3.2 Overview of pain12.dta

We begin by loading pain12.dta and listing its first 10 rows:

There are four variables with one effect-size estimate per row, as required by smvmeta:

study: the studies that contribute estimates.

factor: the risk factors. The first few risk factors are better mental health, pain, and older age (all measured at baseline).

z: Fisher z-transformed correlation coefficients between presence of the factor (for binomial factors) or higher values of the factor (for risk factors measured on a continuum) and pain 12 months after surgery.

z_se: specifies standard errors on z.

Preprocessing would have been required had the data not been arranged in smvmeta‘s preferred form. For example, if the data were arranged in wide rather than long form— as used by Stata’s meta mvregress (see [META] meta mvregress) and mvmeta—it would be necessary to use Stata’s reshape (see [D] reshape) command.

3.3 Specify generic effect sizes, standard errors, and risk factors

The meta-analysis is specified using smvmeta set:

Similarly to Stata’s meta set (see [META] meta set), smvmeta set displays a summary of the data that will be used by subsequent smvmeta commands. We see that there are 51 observations (estimates of correlations with pain) and that the data are considered sparse (that is, amenable to analysis by smvmeta). If the sample is subsequently restricted for estimation (for example, using if or by), the restricted sample may not be considered sparse, but smvmeta will not report this. A summary of each of the three variables smvmeta needs to perform meta-analysis is displayed. The “types” of esvar and sevar are shown (“Generic” for effect sizes and “Standard error” for precisions). Types are not inferred from the data but displayed to tell you how smvmeta understands the data. Variable labels are displayed if they exist, along with variable names. The number of levels of the fvvar variable is shown, which tells us there are 23 risk factors. The number of missing values is shown for each variable. Missing values may exist in the dataset but cannot be included in the estimation sample. Finally, the model and estimation method are stated. Meta-analysis can now be performed.

3.4 Meta-analysis using smvmeta estimate

Before running the meta-analysis, we first set the random-number generator’s seed to ensure that we can reproduce results and define a local macro, rep_opts, containing reporting options that we will reuse in subsequent commands. We are estimating mean correlations, so we use transform(corr). We consider risk factors that are more correlated with pain to be superior (irrespective of the direction of correlation), so we define superiority using superior(big) and specify a sort() option that will present results in descending order by P-score (that is, the risk factor with greatest mean extent of superiority will appear at the top of the results).

We then use smvmeta estimate to meta-analyze the data, modeling correlations and heterogeneity between effect sizes in three dimensions. To eliminate superfluous information from this article, we specify the nolog option. If logging is not disabled, then information about the penalized log likelihood will be displayed for each iteration. If logging is not disabled and the dimension() option is not specified, smvmeta will search for a dimensionality in which the data can be modeled. This may result in a long report that may include scary-looking messages from Stata’s optimizer about “flat or discontinuous” regions of the parameter space. Such messages are generally not concerning unless the entire optimization fails (check e(converged)).

The header section of the results table summarizes the analysis, providing information such as the number of observations (effect-size estimates) included in the analysis and the number of dimensions used to model correlation and heterogeneity. Each row of the table shows results for a particular risk factor. The first result is for Rey–Osterrieth complex figure (ROCF) recall, which assesses functional decline on multiple cognitive dimensions. Working left to right, the mean correlation between this variable and pain is estimated to be 0.342 with an effective standard error of 0.126 (section 4.3.3). The p-value testing the hypothesis that the mean correlation is zero is 0.005 (section 4.3.3). The 95% CI for mean correlation is [0.089 to 0.554] (section 4.3.3). Only one observation (study result) of correlation was used in the meta-analysis (column k). The P-score assessing the mean extent of certainty that ROCF recall is superior to (has mean correlation with larger magnitude than) all the other risk factors is 92.4% (section 4.4). The I² heterogeneity statistic cannot be computed, because only one estimate (study result) of correlation was available. The footer tells us if and how the estimates have been transformed and what definition of superiority was used (see sections 2.2 and 4.4.2).

In principle, we can now use almost all of Stata’s postestimation commands. For example, we could test (see [R] test) if the correlation between urban or semiurban residency and pain is zero. To do this, we need to know how this variable is coded. We could use smvmeta, coeflegend to determine the code, but using a local macro is perhaps more readable:

This test gives a p-value of 0.045, suggesting that studies like those included in the metaanalysis generally estimate nonzero correlations between urban or semiurban residency and pain. However, the p-value provided by smvmeta (p = 0.125) is different and does not reject the hypothesis using the conventional p < 0.05 criterion. Section 4.3.3 explains the discrepancy. Let this example encourage you to be careful when using postestimation commands after smvmeta.

3.5 Present estimates graphically using smvmeta forestplot

The command below uses smvmeta forestplot to plot the stored estimates. smvmeta does not “remember” rep_opts, so one must specify how results should be presented, hence the definition of the rep_opts macro previously. For the same reason, we now define the local macro forest_opts, which specifies how estimates and p-values should be formatted. The forest()options specify a title for the column of effect-size estimates and what the directions of correlation mean.

Figure 2.

Forest plot showing estimates of mean correlation between risk factors and pain 12 months after total knee arthroplasty, the number of studies (k) contributing estimates, P-scores that measure the mean extent of certainty of superiority, and I² values that measure the percentage of heterogeneity not explained by sampling error

The forest plot is shown in figure 2. The three risk factors estimated to have the greatest mean extent of certainty of superiority (that is, the highest P-scores; see section 4.4) are cognitive in nature:

ROCF recall (see above; mean correlation 0.34; 95% CI [0.09 to 0.55]; P-score 92.4%). However, this finding is supported only by one study.

Pain catastrophizing (0.28; 95% CI [0.12 to 0.42]; P-score 89%; 2 studies; I² 59.9%).

Temporal summation, which—informally—assesses pain experienced with respect to the frequency of a controlled pain stimulus (0.21; 95% CI [0.05 to 0.36]; P-score 77.1%; 2 studies; I² 0.0%).

While it is useful to show estimates for all risk factors and all quantities that we can report, we might want to produce a simpler forest plot, for example, showing only a subset of the available columns, for “interesting” results. This can be done by specifying the forest plot columns of interest and an expression that restricts the results that should be presented (but does not change the estimates). We specify that we want to show columns for the names of the risk factors, the forest graph, effect-size point estimates (correlations, as transformed via the rep_opts macro), and the number of estimates supporting each risk factor. We use the if() option to restrict the results to estimates with (transformed) point estimates at least 0.15 in magnitude and that have p-values less than 0.05:

Figure 3.

A simplified presentation of the results, restricted to correlations with point estimates of at least 0.15 in magnitude and p-values less than 0.05

The forest plot is shown in figure 3. In addition to the three cognitive risk factors discussed above, the plot shows estimates for Kellgren–Lawrence grade (a tool for assessing radiological evidence of osteoarthritis) and number of symptomatic joints.

4 The estimator and its implementation

4.1 Assumptions and nomenclature

Let y be an n-vector of effect-size estimates (that is, esvar) and Λ be an n × n diagonal matrix of sampling variances (that is, the squares of sevar), such that Λ _i,i is the sampling variance for y_i. Let X be an n × p design matrix whose (i, j)th element is unity if effect size i corresponds to factor level j and is zero otherwise. We make the following assumptions:

The n estimates of the p effect sizes are exchangeable across study.

There are no nonrandom study-level effects.

There is nonnegligible correlation and heterogeneity between effect sizes and studies that is well approximated in a low-dimensional space using a q × q covariance matrix Σ, where 2 ≤ q < p.

No further information is available about the structure of the correlations and heterogeneity (for example, studies do not publish correlations between estimates).

Assumptions 1 and 2 explain why it is not necessary to specify to smvmeta the studies from which estimates arise. If more is known about the structure of the correlations and heterogeneity, then the problem may not in fact be sparse, and smvmeta may not be a good model choice. Obvious alternatives are Stata’s multivariate meta-analysis commands and mvmeta.

4.2 Sparsity and smvmeta’s multivariate random-effects model

In the most general case, there may be correlation and heterogeneity between each pair of effect sizes. This gives p mean effect sizes, p(p − 1)/2 correlations, and p(p + 1)/2 covariances that model heterogeneity for a total of p+p² model parameters. If p is even moderately large, then p + p² is likely to be substantially larger than n, the number of estimates available to support estimation. smvmeta considers data to be sparse if n < p + p².

smvmeta makes the simplifying assumption that correlations and heterogeneity do not need to be modeled separately, so effect-size estimates y can be modeled as multivariate normal with mean Xβ and covariance matrix Λ+ XΨX^⊤, where Λ and X are as defined in section 4.1, β is a p-vector of mean effect sizes, and Ψ is a p × p covariance matrix that models correlation and heterogeneity.

Assuming unstructured Ψ, it appears necessary to estimate p + p(p + 1)/2 model parameters (that is, the p parameters of β plus one of the triangles of Ψ). This is still quadratic in p. However, if we can assume that Ψ is anisotropic, then it may be possible to approximate Ψ in a space of dimension q < p. This can be achieved by defining a random projection R between IR ^p and IR ^q and estimating a q × q covariance matrix Σ, where RΣR^⊤ ≈ Ψ. This reduces the number of parameters that must be estimated to p + q(q + 1)/2, which is linear in p and quadratic in q rather than p. This gives the following model:

y \sim N (X β, Λ + X R Σ R^{⊤} X^{⊤})

A related dimensionality-reduction problem is principal component analysis (Pearson 1901; Jolliffe 2005). In principal component analysis, a linear transformation into a low-dimensional space can be defined via eigendecomposition of a known or estimated covariance matrix. A matrix of the eigenvectors with largest eigenvalues defines a transform that preserves at least a chosen proportion of total variation. This is possible if the covariance matrix is known or can be estimated but cannot be used to choose q because Ψ is unknown.

The term “random projection” is typically used in the context of mapping point sets from high- to low-dimensional spaces in a way that approximately preserves some useful aspect of the data, particularly relative distances between points. smvmeta uses random projection to establish an arbitrary low-dimensional orthonormal basis in which an approximation to Ψ can be estimated.

For some meta-analyses, it may be possible to use a priori domain knowledge to choose q. For example, perhaps it is reasonable to consider some effect sizes as facets of a common construct for which distinct estimates are desired, preventing pooling at the level of the construct, but facilitating dimensionality reduction. In general, however, domain knowledge may not be available to choose q. If q is not specified via the dimension() option, smvmeta will attempt to find a suitable value. Section 4.3 describes how this is done. As for principal components analysis, good results can be achieved with surprisingly small q.

4.3 Estimation

4.3.1 Penalized maximum likelihood

The primary estimation target is β , though Σ is unknown and must also be addressed. White (2009) summarizes estimation methods applicable to the multivariate setting and chooses likelihood-based methods for their generality and optimality properties. There is little support for estimating Σ when data are sparse. The log-likelihood function corresponding to (1) can be trivially maximized by setting Σ close to the zero matrix, which violates assumption 3. This is addressed using penalized maximum likelihood (Cole, Chu, and Greenland 2014).

The log density corresponding to Λ + XRΣR^⊤X^⊤ ∼ W⁻¹(I, p), an inverse-Wishart distribution, is added to the log likelihood for (1) to penalize values of Σ that are incompatible with assumption 3 while otherwise assuming as little as possible. This results in the following penalized log-likelihood problem. smvmeta uses Mata’s [M-5] optimize( ) to find $\hat{β} = β (\hat{θ})$ with

\begin{array}{l} \hat{θ} = \underset{θ}{\arg \max} \frac{1}{4} [- 2 z {(θ)}^{⊤} Q {(θ)}^{- 1} z (θ) - 2 \log tr (Q {(θ)}^{- 1}) \\ - 4 (p + 1) \log | Q (θ) | + 4 \log Γ_{p} (\frac{p}{2}) \\ - 2 n \log 2 π + p^{2} \log π - p^{2} \log 4 - p \log π] \end{array}

where Q( θ ) = Λ + XRΣ( θ )R^⊤X^⊤; z( θ ) = y − β ( θ ); β ( θ ) and Σ( θ ) form the p-dimensional mean vector and q-dimensional covariance matrix, respectively; and Γ _p is the multivariate gamma function. Matrix products XR and R^⊤X^⊤ are independent of θ and can be precomputed. Symmetry and positive semidefiniteness of Σ( θ ) is ensured by constructing the covariance matrix from Cholesky factors. Random projection matrix R is formed by sampling elements independently and identically distributed from N(0, 1) and then applying a singular value decomposition to ensure orthonormality.

4.3.2 Optimization strategy

The optimization problem described above can be nontrivial. I found that Stata’s default optimization algorithm (modified Newton–Raphson) sometimes failed to converge and other algorithms often failed to start in a fruitful direction. Optimization is performed by initializing the elements of θ that correspond to the effect sizes with univariate meta-analysis estimates, running 10 modified Newton–Raphson iterations, and then switching to Broyden–Fletcher–Goldfarb–Shanno. If q is not specified, smvmeta will search for the largest value of q satisfying 2 ≤ p + q(q + 1)/2 < n. However, admissible values of q do not exist for all possible values of p and n. smvmeta will issue an error if an admissible value of q does not exist.

Optimization can fail if q is too high. smvmeta detects lack of convergence and will restart optimization with the next smallest admissible value of q (and a new random projection) until no such values remain. The actual value of q used is stored in e(q) and may differ from that specified by dimension().

4.3.3 Constructing CIs and p-values

We seek a test statistic and its sampling distribution to construct 100(1 − α)% CIs and p-values on the elements of effect-size vector β . I found that Wald-type CIs constructed using a normal approximation of the sampling distribution of $\hat{θ}$ do not always provide nominal coverage (and similarly for p-values). Thus, be careful if you use e(V) for postestimation, as is done in commands such as Stata’s test and margins (see [R] test and [R] margins). Instead, smvmeta uses profile penalized likelihood.

Cole, Chu, and Greenland (2014) provide a useful introduction to likelihood ratiobased CIs. Briefly, in the nonpenalized case, CIs can be constructed for β_j (the jth effect size) on the basis that

2 {f (\hat{θ}) - f_{j} (β)} ~ χ_{1}^{2}

where f is the log-likelihood function and $f_{j} (β) = m a x_{θ} {_{| β j}}_{= β} f (θ)$ is the maximum value of the log-likelihood function with the element of θ corresponding to the jth element of β constrained to be the scalar value β. This allows a critical value corresponding to a specified confidence level to be found and used to compute the lower and upper limits, β_L and β_U, of a CI on β_j. This section explains how the nonpenalized profile likelihood method can be modified to account for penalization.

Let g and g_j be log-penalty equivalents of f and f_j. To account for penalization, we are interested in the distribution of ${f (\hat{θ}) - f_{j} (β)} + {g (\hat{θ}) - g_{j} (β)}$ . Recall that if $X ~ χ_{ν}^{2}$ , then kX ∼ Γ(ν/2, 2k), a gamma distribution in the shape-scale parameterization. We can therefore restate (2) as $F_{j} = f (\hat{θ}) - f_{j} (β) ~ Γ (1 / 2, 1)$ and, by the same reasoning, define $G_{j} = g (\hat{θ}) - g_{j} (β) ~ Γ (1 / 2, 1)$ . We now seek the distribution of U_j = F_j + G_j, the sum of two gamma variates.

The sum of an arbitrary number of gamma variates has moment-generating function M(s) = s⁻¹ ∏ _i (1 − λ_is) ^−a , where a is a shape parameter common to the gamma variates and λ_i is the ith eigenvalue of a matrix constructed from the scale parameters and the correlation coefficients between the variates (Alouini, Abdi, and Kaveh 2001). In the case of the sum of two gamma variates with correlation ρ, this simplifies to

M (s) = \frac{1}{s \sqrt{1 + s - s \sqrt{ρ}} \sqrt{1 + s + s \sqrt{ρ}}}

The inverse Laplace transform of (3) can be used to obtain the cumulative distribution function of U_j for a particular value of ρ and hence a sampling distribution that can be used to compute critical values and p-values.

smvmeta reports CIs constructed using ρ = 0, which corresponds to U_j ∼ Exp(1). This gives the shortest intervals if 100(1 − α)% ≳78.5%¹ and the longest ones otherwise. Validation experiments (section 5) show that this provides CIs with empirical coverage close to the nominal level but that are shorter compared with those provided by metaregressions that do not model correlation and heterogeneity between effect sizes. In summary, approximate CIs are constructed on the basis that

{f (\hat{θ}) - f_{j} (β)} + {g (\hat{θ}) - g_{j} (β)} ~ Exp (1)

Equation (4) can be used to test the hypothesis H_j : β_j = 0. The test statistic is $u_{j} = {f (\hat{θ}) - f_{j} (0)} + {g (\hat{θ}) - g_{j} (0)}$ , and the p-value is P(U_j ≥ u_j|H_j) = e^−u_j. Recall from section 3.4 that Stata’s test reported a p-value that is different from that reported by smvmeta. The discrepancy is explained by the different test statistics and their distributions (test uses a $χ_{1}^{2}$ statistic).

Having identified the distribution from which a critical value can be computed, we can find bounds β_L and β_U for β_j efficiently via a quadratic approximation of ℓ around $\hat{θ}$ .The presentation below is based on that for the LOGISTIC procedure of SAS/STAT (SAS Institute, Inc. 2016). smvmeta searches for β_L and β_U such that

l (\hat{θ}) - l_{j} (β_{L}) = l (\hat{θ}) - l_{j} (β_{U}) = \log \frac{1}{1 - α}

where ℓ( θ ) = f( θ ) + g( θ ), ℓ_j(β) = f_j(β) + g_j(β), and the 100(1 − α)th percentile of Exp(1) is given by log 1/(1−α). Figure 4 illustrates the construction of u_j, β_L, and β_U. The quadratic approximation

\tilde{l} (θ + δ) = l (θ) + δ^{⊤} \nabla l (θ) + \frac{1}{2} δ^{⊤} V (θ) δ

is used to find a value of δ that can be used in the update rule θ → θ + δ to minimize |ℓ( θ ) − ℓ₀|, where $l_{0} = l (\hat{θ}) - \log 1 / (1 - α)$ and ∇ℓ( θ ) and V( θ ) are the gradient vector and Hessian matrix of ℓ at θ . The next value of δ is found by solving

\frac{d}{d δ} \tilde{l} (θ + δ) + \frac{d}{d δ} η (e_{j}^{⊤} δ - β) = 0

where η is a Lagrange multiplier and e _j is a unit vector that indicates the dimension of θ corresponding to β_j, giving

δ = - V {(θ)}^{- 1} {\nabla l (θ) + η e_{j}} with η = \pm {[\frac{2 {l_{0} - l (θ) + \frac{1}{2} \nabla l {(θ)}^{⊤} V {(θ)}^{- 1} \nabla l (θ)}}{e_{j}^{⊤} V {(θ)}^{- 1} e_{j}}]}^{\frac{1}{2}}

Negative and positive values of η yield lower and upper bounds β_L and β_U, respectively. The quantities in (5) are iteratively updated, starting at $θ = \hat{θ}$ , until |ℓ( θ ) − ℓ₀| ≤ ϵ and {∇ℓ( θ ) + η e _j }^⊤V( θ ) ⁻ ¹{∇ℓ( θ ) + η e _j } ≤ ϵ with ϵ = 1 × 10 ⁻ ⁴.

Figure 4.

Constructing the test statistic, u_j for H_j : β_j = 0, and approximate CI bounds β_L and β_U for β_j, via profile penalized likelihood

Recall that Wald-type CIs that are constructed from standard errors obtained from the variance–covariance matrix of the estimators do not necessarily have nominal coverage. smvmeta therefore reports effective standard errors imputed under the assumption that p-values computed as above arise from a test statistic that is normally distributed. This is reported in the results table as Eff. SE, stored as e(eff_se), and used to compute P-scores to assess superiority.

4.3.4 Alternative approaches to penalization

Penalized maximum likelihood is used by smvmeta to prevent trivial solutions that collapse Σ (section 4.3). The penalty used arises from an inverse Wishart distribution over the approximation of Ψ. In the Bayesian framework, this distribution is the conjugate prior for the covariance matrix of the multivariate normal. This prior has been criticized within Bayesian statistics (for example, see Schuurman, Grasman, and Hamaker [2016]). Some criticism concerns the distribution’s inflexibility: it is not possible to construct an inverse Wishart prior that expresses that some covariances are known a priori with more precision than others. However, this is not relevant to smvmeta. Other criticism concerns the low density of covariances close to zero, but this is exactly why it yields a useful penalty in smvmeta. One criticism that may have merit is that larger variances are associated with correlations with absolute values close to unity (Tokuda et al. 2011). Alternative distributions are used within Bayesian statistics to construct priors for covariance matrices, such as the Lewandowski–Kurowicka–Joe distribution over correlation matrices.

The model used by smvmeta does not attempt to estimate or disambiguate the full correlational and heterogeneity structure, which is approximated in a low-dimensional space. While most research questions are likely focused on mean effect sizes, correlation and heterogeneity may also be of interest. However, there is very little information about these p × p matrices in the setting addressed herein. While Σ is estimated, it is defined in an arbitrary low-dimensional space that precludes interpretation. It is therefore not stored. While the approximation of Ψ is interpretable in principle, it is subject to run-to-run variance arising from the random projection and spans at most a q-dimensional subspace of IR ^p . This matrix is not stored to prevent over interpretation.

4.4 Assessing superiority

Multivariate meta-analyses are often motivated by a need to know which effect size is “superior” or to rank them. For example, it is natural to ask which risk factor is most important or which of several treatments is most effective. smvmeta facilitates this by reporting P-scores (Rücker and Schwarzer 2015). Originating in frequentist network meta-analysis (which can be posed as a multivariate meta-analysis), a P-score is a statistic that measures the mean extent of certainty that a given effect size is superior to all the others.² A P-score reflects not only the estimated effect size relative to those of the others but also the precision of the estimates. This section explains how P-scores are computed and how various definitions of superiority are implemented in smvmeta. It ends with a discussion of the advantages and disadvantages of P-scores over alternatives.

4.4.1 Computing P-scores

The following presentation differs slightly from that of Rücker and Schwarzer (2015) but describes the method concisely in a way that is hopefully relatively intuitive. Assume momentarily that, of two effect sizes, the one closest to +∞ is superior. Let π_i,j be a one-sided p-value for the null hypothesis β_i ≤ β_j. Then define the p × p matrix P with elements

P_{i, j} = {\begin{array}{l} 1 - π_{i, j} & if {\hat{β}}_{i} \leq {\hat{β}}_{j} \\ π_{i, j} & otherwise \end{array} \forall i, j \in {1, \dots, p}

The elements of this matrix are simply 1 minus the one-sided p-values, such that P_i,j will be close to unity if there is strong evidence that β_i > β_j. smvmeta calculates p-values using a normal approximation and effective standard errors. A vector of P-scores is then computed as

p = \frac{1}{p - 1} (P 1_{p} - diag P) \times 100 %

where 1 _p is a p × 1 vector of ones. The ith element of p is the mean of the 1 minus p-values over all but the ith effect size, expressed as a percentage. The one-minus in the definition of matrix P is used so that P-scores closer to 100%, rather than 0%, can be interpreted as superior. P-scores are expressed as percentages but do not generally sum to 100%.

4.4.2 Defining superiority

So far we have assumed that, of two effect sizes, the one closest to +∞ is superior. This corresponds to the superior(+inf) option and is the default. This could be applicable in a meta-analysis of risk factors in which relative risks greater than unity are associated with the outcome. Risk factors with P-scores closer to 100% would be interpreted as superior to those closer to 0%.

If relative risks less than unity were associated with the outcome, it may be useful to reverse the definition of superiority so that a P-score of 100% would have the interpretation of superiority rather than inferiority. Similarly, in meta-analyses of correlation coefficients, correlations with large magnitudes may be considered superior to small correlations. Section 2.2.2 describes four options for superior(). These are implemented by temporarily transforming $\hat{β}$ in the computation of P-scores:

\hat{β} \to {\begin{array}{l} + \hat{β} & if & superior(+inf) \\ - \hat{β} & if & superior(−inf) \\ + | \hat{β} | & if & superior(big) \\ - | \hat{β} | & if & superior(small) \end{array}

smvmeta computes and stores P-scores using all four definitions, allowing results to be extracted or redisplayed using alternatives as desired. P-scores are computed in the metric in which the meta-analysis was posed. This is relevant to transform(efficacy), which reverses estimates’ signs.

4.4.3 Alternative methods for assessing superiority

P-scores and their Bayesian equivalent, surface under the cumulative ranking curve (SUCRA) values (Salanti, Ades, and Ioannidis 2011), have been argued to be preferable to the simpler approach of estimating a Bayesian posterior probability that a given effect size is superior to all others, for example, as implemented by the pbest() option of mvmeta (White 2011).

A weakness of estimating posterior probabilities of superiority is that an imprecisely estimated effect size may have substantial probability mass for effect sizes that are implausibly large in magnitude. This phenomenon is most apparent in meta-analyses of correlations in which the magnitude of a coefficient, rather than its magnitude and direction, determines superiority. The probability mass in the tails of a posterior distribution that extends far in the negative and positive directions “doubles up” after taking the magnitude of the effect size. Consequently, the posterior probability that the effect size is indeed superior (or inferior) may also be large.

There is nothing wrong with such a probability: there really is a high posterior probability of superiority.³ However, such a result can lack face validity and cause nonstatisticians to doubt the entire analysis. This is not unreasonable. Few people should accept that an imprecisely estimated quantity is highly likely to be superior to other quantities that are supported by much stronger evidence. P-scores and SUCRA values are quite robust to this problem but are admittedly more challenging to interpret.

An additional benefit of P-scores is that they are trivial to compute. In particular, it is not necessary to use sampling-based estimation methods, as is the case with SUCRA values and posterior probabilities of superiority. This means that results can be obtained quickly.

5 Simulation-based validation

I ran three simulation experiments to validate a prerelease version of smvmeta, using simulated effect sizes similar to the pain data (section 3). The numbers of effect sizes, p, ranged from 10 to 30. Standard errors on the simulated effect-size estimates and effectsize heterogeneity were generated to approximately match the distributions of standard errors and I² values in the pain data.

In the first two experiments, effect-size estimates were missing within each simulated study either completely at random (MCAR) or not at random (MNAR). The MNAR scenario used almost the same data as the MCAR scenario, but I discarded effect-size estimates—the values “published” by simulated studies—if their CIs included ±0.05. This models a publication bias scenario in which studies do not publish effect-size estimates if they are small or imprecisely estimated. I assumed no “domain knowledge” was available to choose q, the dimension in which correlation and heterogeneity are approximated. In these scenarios, smvmeta chooses q automatically.

The third experiment was an MCAR scenario in which a priori estimates of q were assumed to be available. Values of q were obtained by applying principal components analysis to the known correlation matrices used to generate the simulated data. In each of these meta-analyses, I set smvmeta‘s dimension() option to the minimum number of components that accounted for at least 90% of total variation. However, recall that smvmeta will try to use a different value of q if the specified model does not converge.

I generated synthetic data for 1,000 meta-analyses for each experiment. I used the known mean effect sizes to quantify and compare bias and empirical coverage of 95% CIs with results of random-effects meta-regressions in which mean effect size is estimated via a categorical covariate. This is a simple alternative model choice that can be applied to high-dimensional sparse data. It does not account for correlations between effect sizes and assumes heterogeneity is the same for all effect sizes.

Results for the three scenarios were very similar, suggesting that smvmeta has some robustness to the MNAR scenario and performs well even if nothing can be assumed about q. Because the results were so similar, I will report the MCAR results unless stated otherwise. Overall bias is about 1.03 (95% CI [1.02 to 1.04]) times larger for smvmeta compared with meta-regression, but this was negligible in absolute terms for most meta-analyses. Overall empirical coverage of 95% CIs is 94.4% (95% CI [94.1% to 94.7%]). If p is small (close to 10, say), there is some evidence that bias may be very high and CIs may be excessively wide (empirical coverage is unaffected). There is also some evidence that empirical coverage may drop slightly below the nominal level if p is close to 30, but there is considerable uncertainty on this finding.

These findings are perhaps unsurprising given that random projection works less well in lower dimensions and that it will not always be possible to model correlation and heterogeneity between many effect sizes in a low-dimensional space. The experiments suggest that smvmeta can probably be used safely if there are at least 15–20 effect sizes. For p > 20 effect sizes, smvmeta‘s CIs are on average about 90.3% (95% CI [88.0% to 92.7%]) of the length of those provided by meta-regression, and the results suggest they become shorter as p increases.

In the third experiment, the prespecified value of q was used in about 87% of metaanalyses. Around 10% of meta-analyses used a value of q 1 less than the prespecified value. The median number of dimensions used was six. Fewer than 1% of analyses used a value of q less than 4, and fewer than 1% used a value greater than 7. These findings suggest that 4 ≤ q ≤ 7 is probably reasonable if you want to specify q and the number of effect sizes is not more than 30, but smvmeta does a good job of finding a sensible value if you do not specify q.

6 Conclusions

This article has presented smvmeta, a new command for sparse multivariate randomeffects meta-analysis. The underlying model makes estimation tractable in the sparse setting by modeling correlation and heterogeneity in a low-dimensional space via random projection. smvmeta‘s syntax is modeled on Stata’s univariate meta-analysis commands (see [META] meta) with the aim of making the command feel familiar and easy to use.

The command was demonstrated in an example multivariate meta-analysis of pain after total knee arthroplasty. Mathematical and computational details of the model and P-scores were then presented. Simulation-based validation experiments show that, in the higher-dimensional sparse setting for which smvmeta was developed, the method is expected to provide substantially more precise estimates (that is, narrower CIs) at little cost in bias compared with random-effects meta-regression.

The fundamental limitations of smvmeta are due to the simplifying assumptions that facilitate random-effects meta-analysis in the challenging sparse multivariate setting. In particular, modeling correlation and heterogeneity in a low-dimensional space using one covariance matrix combines two distinct concepts, thus precluding interpretation of variance components and focusing estimation on mean effect sizes. While estimating means is almost always the main purpose of random-effects meta-analysis, it is nevertheless also important to estimate and report variance in effect sizes (IntHout et al. 2016). Future work could explore alternative approximations that could be used in the sparse setting to facilitate such estimation. However, despite the approximations used, the simulation-based validation provides reassurance that trustworthy and precise estimates can be expected if the number of effect sizes p is large enough. This holds regardless of whether sufficient domain knowledge is available to specify q, the number of dimensions in which to model correlation and heterogeneity, and also if missing effect-size estimates are not MCAR but instead are missing because of a typical MNAR mechanism by which small or imprecise (“nonsignificant”) estimates go unreported.

8 Programs and supplemental materials

Supplemental Material, sj-zip-1-stj-10.1177_1536867X241258008 - Multivariate random-effects meta-analysis for sparse data using smvmeta

Supplemental Material, sj-zip-1-stj-10.1177_1536867X241258008 for Multivariate random-effects meta-analysis for sparse data using smvmeta by Christopher James Rose in The Stata Journal

Footnotes

7 Acknowledgments

The author thanks Unni Olsen, Maren Falch Lindberg, and Anners Lerdal (University of Oslo and Lovisenberg Diaconal Hospital, Norway), Eva Marie-Louise Denison (Norwegian Institute of Public Health), and Arild Aamodt (Lovisenberg Diaconal Hospital) for their work on the systematic reviews that motivated development of smvmeta. He also thanks Stephen Jenkins (London School of Economics and Political Science, United Kingdom) and an anonymous reviewer, whose detailed comments improved this article and the code.

8 Programs and supplemental materials

To install a snapshot of the corresponding software files as they existed at the time of publication of this article, type

Notes

References

Alouini

M.-S.

Abdi

Kaveh

2001. Sum of gamma variates and performance of wireless communication systems over Nakagami-fading channels. IEEE Transactions on Vehicular Technology 50: 1471–1480. https://doi.org/10.1109/25.966578.

Cole

S. R.

Chu

Greenland

2014. Maximum likelihood, profile likelihood, and penalized likelihood: A primer. American Journal of Epidemiology 179: 252–260. https://doi.org/10.1093/aje/kwt245.

IntHout

Ioannidis

J. P. A.

Rovers

M. M.

Goeman

J. J.

2016. Plea for routinely presenting prediction intervals in meta-analysis. BMJ Open 6: e010247. https://doi.org/10.1136/bmjopen-2015-010247.

Jolliffe

2005. Principal component analysis. In Vol. 3 of Encyclopedia of Statistics in Behavioral Science, ed. Everitt

B. S.

Howell

D. C.

Chichester, U.K.: Wiley. https://doi.org/10.1002/0470013192.bsa501.

Lin

Chu

2018. Bayesian multivariate meta-analysis of multiple factors. Research Synthesis Methods 9: 261–272. https://doi.org/10.1002/jrsm.1293.

Olsen

Lindberg

M. F.

Denison

E. M.-L.

Rose

C. J.

Gay

C. L.

Aamodt

Brox

J. I.

, et al. 2020. Predictors of chronic pain and level of physical function in total knee arthroplasty: A protocol for a systematic review and meta-analysis. BMJ Open 10: e037674. https://doi.org/10.1136/bmjopen-2020-037674.

Olsen

Lindberg

M. F.

Rose

Denison

Gay

Aamodt

Brox

J. I.

, et al. 2022. Factors correlated with physical function 1 year after total knee arthroplasty in patients with knee osteoarthritis: A systematic review and meta-analysis. JAMA Network Open 5: e2219636. https://doi.org/10.1001/jamanetworkopen.2022.19636.

Olsen

2023. Factors correlated with pain after total knee arthroplasty: A systematic review and meta-analysis. PLOS ONE 18: e0283446. https://doi.org/10.1371/journal.pone.0283446.

Pearson

1901. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 6th ser., 2: 559–572. https://doi.org/10.1080/14786440109462720.

10.

Riley

R. D.

2009. Multivariate meta-analysis: The effect of ignoring within-study correlation. Journal of the Royal Statistical Society, A ser., 172: 789–811. https://doi.org/10.1111/j.1467-985X.2008.00593.x.

11.

Riley

R. D.

Thompson

J. R.

Abrams

K. R.

2008. An alternative model for bivariate random-effects meta-analysis when the within-study correlations are unknown. Biostatistics 9: 172–186. https://doi.org/10.1093/biostatistics/kxm023.

12.

Rose

C. J.

Olsen

Lindberg

M. F.

Denison

E. M.-L.

Aamodt

Lerdal

2021. A new multivariate meta-analysis model for many variates and few studies. arXiv:2009.11808 [stat.ME]. https://doi.org/10.48550/arXiv.2009.11808.

13.

Rücker

Schwarzer

2015. Ranking treatments in frequentist network metaanalysis works without resampling methods. BMC Medical Research Methodology 15: Article 58. https://doi.org/10.1186/s12874-015-0060-8.

14.

Salanti

Ades

A. E.

Ioannidis

J. P. A.

2011. Graphical methods and numerical summaries for presenting results from multiple-treatment meta-analysis: An overview and tutorial. Journal of Clinical Epidemiology 64: 163–171. https://doi.org/10.1016/j.jclinepi.2010.03.016.

15.

Schuurman

N. K.

Grasman

R. P. P. P.

Hamaker

E. L.

2016. A comparison of inverse-wishart prior specifications for covariance matrices in multilevel autoregressive models. Multivariate Behavioral Research 51: 185–206. https://doi.org/10.1080/00273171.2015.1065398.

16.

SAS Institute, Inc. 2016. SAS/Stat 14.2 User’s Guide: The LOGISTIC Procedure. Cary, NC: SAS Institute.

17.

Tokuda

Goodrich

Van Mechelen

Gelman

Tuerlinckx

2011. Visualizing distributions of covariance matrices. Technical report, Columbia University, New York.

18.

White

I. R.

2009. Multivariate random-effects meta-analysis. Stata Journal 9: 40–56. https://doi.org/10.1177/1536867X0900900103.

19.

White

I. R.

2011. Multivariate random-effects meta-regression: Updates to mvmeta. Stata Journal 11: 255–270. https://doi.org/10.1177/1536867X1101100206.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB