Abstract
As efforts to improve the credibility of psychological and other social sciences continue, researchers aim to conduct multi-study or multi-sample research and synthesize findings using different parameterizations of individual participant meta-analysis. No overarching organizational framework exists, and only a few simulation-based or empirical examples comparing these parameterizations. This article has two goals. First, we provide an overview of six common parameterizations of individual participant meta-analysis, organized into a taxonomy based on different features (e.g., sample-specific parameters, meta-analytic parameters, and number of models). Second, using empirical data from 26,205 participants across 11 longitudinal studies, we provide comparisons of each parameterization testing prospective associations between the personality traits and crystallized abilities. We found that openness was a robust predictor of crystallized abilities across samples. Across methods, we observed consistency in model estimates, with some exceptions. We conclude with recommendations for choosing an approach given a team’s goals, questions, data availability, and model features.
Keywords
Introduction
Over the past decade, increasing concerns over the replicability, reproducibility, and generalizability of psychological and other social sciences has driven advances in psychological research methods (Nosek et al., 2022; Open Science Collaboration, 2012, 2015; Wiggins & Christopherson, 2019). Discourse has focused on replicating psychological experiments (Chartier et al., 2020; Ebersole et al., 2016; Klein et al., 2018, 2022), underemphasizing the importance of non-experimental, observational, and quasi-experimental psychological science (Brooks-Gunn et al., 1991; Mroczek et al., 2011; Weston et al., 2019). Such non-experimental approaches often rely on complex and sometimes intensive longitudinal studies for which replication is much more difficult (Hofer & Piccinin, 2009) given that data often take decades to accumulate.
Integrative data analysis (IDA) is a core tool for synthesizing multiple sources of data (Hofer & Piccinin, 2009) and arose in response for the need for specialized methods within longitudinal data analysis to better understand the replicability and generalizability of findings by synthesizing existing data from a variety of sources and samples (Graham et al., 2022; Hill & Stine-Morrow, 2022; Mroczek et al., 2022; Willroth et al., 2022). This methodological framework and its specialized tools have been generative for the study of personality and aging in the past two decades (Beck & Jackson, 2022; Graham et al., 2017; Jokela et al., 2013; Willroth et al., 2025; Yoneda et al., 2022). These methods also have potential to be applied to other areas of psychology as well.
Yet despite their promise, the diverse available methods for the synthesis of data and results have hampered wide-scale adoption. Here, we use the term data synthesis as an umbrella term that encapsulates multiple methods for integrating different data sources, including IDA (e.g., see special issue on IDA, Bauer & Hussong, 2009; Cooper & Patall, 2009; Curran, 2009; Curran & Hussong, 2009; Hofer & Piccinin, 2009; McArdle et al., 2009), individual participant data (IPD) meta-analysis (Debray et al., 2015; L. A. Stewart et al., 2015), coordinated data analysis (CDA; Graham et al., 2022; Willroth et al., 2022), pooled analysis (Bidard et al., 2014), and meta-analytic structural equation modeling (MASEM; Jak & Cheung, 2020), and traditional meta-analysis (Viechtbauer, 2007), among others. Thus, researchers must choose the best form of data synthesis for different sets of circumstances, despite a limited number of systematic conceptual discussions or empirical investigations of their similarities and differences. For example, within the psychological literature, there is little discussion of (1) what different data synthesis approaches are available or (2) how their estimates may differ from one another (see Burke et al., 2017; Debray et al., 2015; G. B. Stewart et al., 2012). As a result, different techniques are applied depending on researcher preferences and little is known about how well these methods replicate one another, particularly outside the context of clinical trials.
The current article has three objectives. First, we provide an overview of some of the most commonly used parameterizations of data synthesis in psychology. We organize these into a taxonomy based on different features of each parameterization (e.g., sample-specific parameters, meta-analytic parameters, number of models required). We describe each method in detail and provide R code to carry them out. Second, we test associations between Big Five personality traits and crystallized abilities using four of the five levels of our taxonomy, along with four moderators of these associations, using empirical data from 26,205 participants across 11 longitudinal studies. Third, we compare convergence and divergence of findings across methods, outline pros and cons of each approach, and make recommendations for best practices. While we focus specifically on application to regression coefficients and IPD, the issues described herein and the taxonomy presented apply to other types of effect sizes and statistical models, including mean differences, odds ratios, risk ratios, and more.
Issues in Data Synthesis
As the need for data synthesis is increasingly recognized and the availability of data to synthesize grows, more researchers may be interested in conducting data synthesis. One of the first questions is which method researchers should use to synthesize data, but this is not a straightforward question because there is no single standard method. Instead, subfields and individual researchers have adopted numerous different approaches to and names for data synthesis (Beck & Jackson, 2022; Curran & Hussong, 2009; Debray et al., 2015; Graham et al., 2022; Hofer & Piccinin, 2009). Adding to the complexity of this question, inconsistent and imprecise terminology has been applied to these different approaches (e.g., mega-analysis, CDA, and IPD meta-analysis have been used largely interchangeably; Beck et al., 2024; Graham et al., 2022; Legha et al., 2018). We believe that data synthesis is a useful umbrella term to capture all of these methods; there are currently few resources available that summarize and compare the different approaches to data synthesis.
A clear taxonomy of the data synthesis methods will benefit the field in at least three ways. First, a taxonomy of data synthesis methods will serve as a useful resource for researchers who are new to data synthesis. Second, a clear taxonomy will help to address the problem of inconsistent terminology to refer to the same methodological approaches. Third, a taxonomy of data synthesis methods will facilitate conceptual and empirical comparisons of different methods. The present study aims to provide such a taxonomy, compare different methods, and make recommendations about when and how to use each.
Methods of Data Synthesis
We build our taxonomy based on five broad categories of data synthesis, two of which have multiple parameterizations that we will describe in more detail: (1) pooled analysis of IPD (Batty et al., 2018), without (Method 1A) and with (Method 1B) cluster-corrected standard errors; (2) pooled analysis of IPD using contrasts and interactions (Method 2A; Debray et al., 2013) or random effects (Method 2B), (3) sample-specific analyses followed by random-effects meta-analysis (Graham et al., 2017; Yoneda et al., 2022), (4) sample-specific analyses reported together (Graham et al., 2020; Graham, James, Jackson, Willroth, Boyle, et al., 2021), and (5) traditional meta-analysis (Barrick & Mount, 1991; Poropat, 2009). Although there are a variety of analytical choices researchers face within each broad category and each can be conceptualized in terms of traditional, univariate fixed or random-effect meta-analyses, network meta-analysis, Bayesian meta-analysis, or MASEM, we emphasize consistent broad differences across categories and center our empirical example around the most commonly applied univariate parameterizations in psychology. Before describing each of them in detail, we first delineate the key features that differentiate them, which are summarized in Table 1.
Key Features of Five Levels of Data Synthesis.
Note. Meta-analysis is used in the broad sense to include heterogeneous meta-analysis models, including fixed and random-effect meta-analysis and MASEM. IPD = individual participant data; MASEM = meta-analytic structural equation models.
We argue there are at least four key factors that differentiate the taxonomy:
Individual participant data
Number of models needed
The inclusion of sample-specific estimates
The degree of harmonization required to apply the method
Each of these considerations will be discussed in more detail in the section “Method” when outlining core characteristics of each approach. However, one aspect of differentiating these methods deserves special consideration, which is that these methods differ in the degree of harmonization necessary to conduct the analyses. Harmonization refers to how variables of interest are pulled, recoded, and included in order to allow data from different sources to have exact or conceptual mapping across samples (for a more thorough treatment, see Cheng et al., 2024; Dubrow & Tomescu-Dubrow, 2016). One-stage (single model) pooled analyses require the highest level of harmonization, with the operationalization of key variables in each sample essential to model estimation and interpretation. Of course, there are methods, like standardization or other transformations of variables, that can transform variables onto the same scale, but such harmonization choices should be made carefully as transformations may not always be sufficient for harmonization (e.g., when syntactic, structural, or semantic considerations are highly heterogeneous; Cheng et al., 2024). In contrast, coordinated analyses and meta-analyses of existing studies can have somewhat more flexible harmonization because effect sizes are estimated separately and can be converted to standardized metrics for inferential equivalence before carrying out a meta-analytic procedure. However, particularly for coordinated analyses using raw data, harmonization is an important consideration for later interpretation and pooling (Bauer & Hussong, 2009; Cole et al., 2023; Graham et al., 2022). Finally, in traditional meta-analyses, harmonization of key variables in raw data sometimes considered unnecessary as it relies on effect size estimates alone (and measurement differences can be tested via meta-regression). A more nuanced discussion of data harmonization is beyond the scope of this necessarily brief introduction to methods for data synthesis focusing on specific modeling parameters, assuming prior careful harmonization. For a more nuanced discussion, we suggest a growing body of work specifically on harmonization (Cheng et al., 2024; Cole et al., 2023; Dubrow & Tomescu-Dubrow, 2016; Fortier et al., 2017; Griffith et al., 2013) and more recent work on MASEM that combines measurement models, measurement invariance, and path models into a combined framework (Jak & Cheung, 2020).
In choosing the specific methods, we chose to parameterization as illustration, two omitted methods are of note. The first are additional parameterizations that adjust for study membership but assume homogeneous predictor–outcome associations across studies. These models adjusts for study membership only through fixed study indicators or random intercepts without allowing slopes to vary across studies. Because our goal was to illustrate the key conceptual distinctions among synthesis approaches rather than enumerate every possible parameterization, we did not include these additional variants, as doing so would substantially increase the number of models presented without introducing new conceptual categories.
The second omitted parameterization is MASEM, which provides an important alternative framework for synthesizing effect estimates across studies. One-stage MASEM, in particular, allows researchers to estimate a meta-analytic model directly from study-level correlation or covariance matrices, including the integration of measurement and structural components when appropriate (Jak & Cheung, 2020). Conceptually, MASEM occupies a position between the Individual Participant Meta-Analysis (IPD-MA) approaches emphasized here and traditional meta-analysis: it addresses the univariate limitations of conventional meta-analytic models by leveraging the full correlation or covariation structure of each study, while still operating at the study level rather than the individual level. Given the already broad set of regression-based synthesis methods illustrated in this article, and our focus on approaches that are readily accessible to applied researchers, we do not provide a full MASEM demonstration. Nonetheless, we acknowledge MASEM as a valuable option for more complex modeling goals and encourage interested readers to consult existing methodological resources.
Substantive Empirical Question: Personality and Crystalized Abilities
To demonstrate the power of data synthesis for answering common psychological research questions, we conducted a study investigating associations between Big Five personality traits and crystallized cognitive abilities. We chose these constructs because they are consequential (Beck & Jackson, 2022) and readily harmonized at the construct level based on prior construct validation studies (Griffith et al., 2013; Langa et al., 2020; Soto & John, 2017). Moreover, recent findings suggest that personality traits likely are associated with crystallized abilities, but these effects may be strongest for openness to experience and, to a lesser extent, neuroticism and extraversion (Hultsch et al., 1999; Jorm et al., 1993; Rammstedt, 2018; Soubelet & Salthouse, 2010, 2011; Wettstein et al., 2017; Zimprich et al., 2009). There is a growing need for replication of these findings, as the reported associations above are limited by the number of datasets used. We extend previous work by examining prospective Big Five personality trait–crystallized ability associations across (1) 11 samples and (2) several sets of covariates and moderators (e.g., education) that are thought to explain associations between them. Because traditional meta-analysis is beyond the scope of this article, we instead compare our results to a recent traditional meta-analysis (Anglim et al., 2022), which found significant meta-analytic estimates among crystallized intelligence and the Big Five, specifically for neuroticism, extraversion, and openness.
The Present Study
The present study directly tests and compares four broad categories of our data synthesis taxonomy. To demonstrate these various data synthesis approaches, we applied each approach to investigate whether the Big Five prospectively predict crystallized abilities in 11 longitudinal panel samples.
Method
Transparency and Openness
This study was preregistered on the Open Science Framework (https://osf.io/rzym7). In addition, all code, model objects, figures, and tables are available in the Supplemental Material on the OSF (https://osf.io/zut7b/) and GitHub (https://github.com/emoriebeck/data-synthesis-tutorial). Data are publicly available or available by application (see Supplemental Material for details on accessing specific samples). There were a few minor deviations from the pre-registration, and these are described in Supplemental Table S1 in a preregistration deviation table (Willroth & Atherton, 2024). Finally, rendered results are available as a standalone web page on GitHub (https://emoriebeck.github.io/data-synthesis-tutorial) and in an online R Shiny webapp (https://emoriebeck.shinyapps.io/data-synth-tutorial/). We recommend using the web page for code, and the web app to explore tables and figures.
Data cleaning, analyses, and results communication were done using the following R (version 4.2.0; R Core Team, 2022) packages: psych (version 2.2.5; Revelle, 2022), knitr (version 1.40; Xie, 2014), kableExtra (version 1.3.4.9000; Zhu, 2022), brms (version 2.18.0; Bürkner, 2021), readxl (version 1.4.1; Wickham & Bryan, 2022), haven (version 2.5.1; Wickham et al., 2022), estimatr (version 1.0.0; Blair et al., 2022), lme4 (version 1.1.30; Bates et al., 2015), broom.mixed (version 0.2.9.4; Bolker & Robinson, 2022), bootpredictlme4 (version 0.1; Duursma, 2022), effectsize (version 0.7.0.5; Ben-Shachar et al., 2020), metafor (version 3.8.1 Viechtbauer, 2010), rstan (version 2.21.7; Stan Development Team, 2022), tidybayes (version 3.0.2; Kay, 2022), cowplot (version 1.1.1; Wilke, 2020), plyr (version 1.8.7; Wickham, 2011), tidyverse (version 1.3.2; Wickham et al., 2019), and furrr (version 0.3.1; Vaughan & Dancho, 2022).
Participants
The current study included 26,205 participants from 11 longitudinal samples. These samples spanned three continents and five countries. We chose samples based on prior work using different data synthesis methods with IPD to examine associations between personality or cognitive ability and a number of life outcomes (Beck & Jackson, 2022; Graham et al., 2020; Graham, James, Jackson, Willroth, Luo, et al., 2021; Hakulinen et al., 2015; Jokela et al., 2013, 2020; Sutin et al., 2019; Yoneda et al., 2022). Study selection and inclusion criteria are outlined in Figure 1. These final k = 11 studies represent a mix of publicly available data (The German Socioeconomic Panel Study [GSOEP], The Household, Income and Labour Dynamics in Australia Study [HILDA], the Health and Retirement Study [HRS], and the Swedish Adoption/Twin Study of Aging [SATSA]) and data we obtained through data use agreements with the sample maintainers (The Berlin Aging Study [BASE], The Einstein Aging Study [EAS], The Longitudinal Aging Study Amsterdamn [LASA], The Religious Orders Study [ROS], The Memory and Aging Project [MAP], The Minority Aging Reserach Study [MARS], and Origin of Variances in the Oldest-Old: Octogenarian Twins [OCTO-Twin]). For each sample, we used the latest data release, and participants were included in all models in which they had all necessary data (i.e., participants within studies vary across combinations of personality traits, covariates, and moderators when necessary). Detailed descriptions for each sample, including the steps necessary to access the data, are included in the Supplemental Material and R Shiny Web app (https://emoriebeck.shinyapps.io/data-synth-tutorial/).

Flowchart of sample identification and inclusion at time of analysis.
Measures
In this study, we tested how the Big Five personality traits are associated with crystallized abilities, while adjusting for no covariates, adjusting for age, gender, and education individually, as well as combinations of each. We also examined the covariates as three participant-level moderators of personality–cognition relationships. 1 For a full overview of which personality and outcome measures are available across datasets, see Supplemental Table S1(and also webapp).
Personality Traits
We examined the Big Five (extraversion, agreeableness, conscientiousness, neuroticism, and openness to experience). Full information on the scales used for each of these measures for each sample is presented in Table 2, and full descriptions of each scale, including lists of items, are available in the Supplemental Material. Many of the measures are on different scales, so all personality indicators were operationalized as Percentages Of the Maximum Possible (POMP) scores in the data synthesis procedure (Cohen et al., 1999), which allows for interpretation in relative percentiles. 2 To aid in convergence, we deviated from traditional POMP scoring and multiply the ratio by 10:
Sample Characteristics and Sample-Level Characteristics.
Note. Prediction interval was calculated by taking each participant’s first personality measurement year from their last cognitive ability measurement year. Baseline age is the average participant age at their first personality assessment. E = Extraversion; A = Agreeableness; C = Conscientiousness; N = Neuroticism; O = Openness. NEO-FFI = 60-item NEO Five Factor Inventory (Costa & McCrae, 1992); IPIP NEO = International Item Pool in Personality NEO (Johnson, 2014); BFI-S = Big Five Inventory, Short Form (German; Hahn et al., 2012); TDA-40 = Trait Descriptive Adjectives-40 (Saucier, 1994); MIDI = The Midlife Development Inventory (Lachman & Weaver, 1997); DPQ = Dutch Personality Questionnaire (Barelds & Luteijn, 2002); Eysenck = Eysenck Personality Questionnaire (Eysenck & Eysenck, 1965).
Crystallized Abilities
We examined crystallized abilities as our primary outcome. Full information on the tests used for each of these measures for each sample is presented in Table 2, and full details on each test are included in the Supplemental Material. As with personality traits, many of the measures are on different scales, so all crystallized ability indicators will be operationalized as POMP, with higher scores indicating better performance.
Participant-Level Covariates and Moderators
In addition, we adjusted for three participant-level covariates – age, gender, and education. 3 These covariates were included because each has a long-documented association with both crystallized ability and some personality domains. Age was defined as the participants’ age at their baseline personality assessment, centered at 60 years. Gender was dummy coded as 0 (men) and 1 (women). Finally, education was measured in years of education at baseline personality assessment, centered at 12 years of education. We examined how adjusting for each covariate separately as well as all covariates simultaneously, impacted the estimated personality trait–crystallized ability associations.
Data Preparation
Extensive details about the data preparation procedure can be found in the Supplemental Material and Open Practices. Descriptive statistics of all harmonized variables for each sample are presented in Table 3. Zero-order correlations among measures within samples are presented in the Supplemental Material and web app.
Descriptive Statistics of All Harmonized Measures Across Samples.
Note. Age, education, and gender were assessed at the first baseline personality assessment. Valid N (range) indicates the range of valid observations with complete personality trait and outcome data across different trait measures. E = Extraversion; A = Agreeableness; C = Conscientiousness; N = Neuroticism; O = Openness.
Analysis Plan
To test whether personality predicts cognitive domain scores, we covered four levels of our taxonomy of data synthesis, some of which are broken down into sub-methods (1) One-Stage Pooled Analysis without Sample-Specific Effects, (2) One-Stage Pooled Analysis with Sample-Specific Effects, (3) Two-Stage, Separate Analysis with Meta-Analytic Pooling, and (4) Separate Analysis of Individual Participant Data Reported Together. As noted previously, depending on the research question, different approaches may be chosen. We opted for a basic regression framework, including random-effects meta-analysis, but a growing body of research demonstrates that our taxonomy is compatible with other meta-analytic frameworks, including MASEM with cluster corrected standard errors (Groot et al., 2025) and IPD (Groot et al., 2024).
For each of these, we will estimate all combinations of Big Five personality trait (5) × crystallized abilities (1) × participant-level moderator (3) × covariate (5; unadjusted, fully adjusted, partially adjusted for each covariate) combination. Below, we provide a brief overview of each Method. A more thorough description and sample analytic plan for each method can be found in the Supplemental Material. Notably, each of the techniques described are linear regression-based statistical models. As a result, they are bound both by the flexibility and assumptions of regression. As a result, we see as broad classes of approaches that can be fine-tuned to the specific nuances of a set of research questions and/or available data, much as regression can be flexible applied across a huge range of circumstances, including both observational and experimental research. Thus, when detailing the regression equations below, we use predictor and outcome as the core observed variables across studies, which could include a broad array of continuous and nominal variables. But the specific parameterization of the model will depend on the specific research questions under investigation.
Method 1: Pooled Analysis of IPD
Method 1 is a fully pooled procedure where a single estimate of a prospective Big Five personality characteristic–crystallized ability association is estimated across samples, and no sample-specific estimates are estimated. In other words, in this method, data from all samples are combined and associations are estimated in a single model, ignoring sample membership.
Method 1A: Pooled Simple Linear Regression
The basic, unadjusted form of the model is as follows:
where
Method 1B: Pooled Linear Regression With Cluster Robust Standard Errors
Method 1B estimates only an overall effect of personality on cognition by including all data in a single regression model, with cluster robust standard errors (Gaure, 2013; J. E. Pustejovsky & Tipton, 2018).
4
Correcting for dependencies without explicitly modeling cross-sample heterogeneity is sometimes called nuisance clustering (Fitzmaurice & Laird, 1995). The basic form of the model is the same as Equation 2, with sample as a cluster, where
Method 2: Pooled Analysis of IPD Using Contrasts or Random Effects
Method 2 is also a fully pooled procedure that provides a single estimate of associations across samples. Unlike Method 1, Method 2 provides sample-specific estimates of the association.
Method 2A: Interactions
Method 2A estimates both an overall effect as well as sample-specific estimates using effects coded contrasts and interaction terms. The basic form of the model is as follows:
where k indicates the number of samples −1. Study is effects coded, which results in a term that captures the overall estimated effect and k − 1 terms capturing sample-specific deviations from the overall estimate (all sample estimates can then be recovered via linear combinations). This is the same approach utilized in analysis of variance. Of interest are two key sets of terms.
Method 2B: Random Effects
Method 2B uses multilevel models (Raudenbush, 2002) in which participants are nested within studies to decompose the variance into different sources. Unlike the regression techniques above that rely on ordinary least squares estimation, MLM uses restricted maximum likelihood estimation or maximum likelihood estimation. The simple model of a basic predictor–outcome association is as follows:
Level 1
Level 2
where the sample-specific prospective associations are captured by
Method 3: Separate Analyses Followed by Random-Effects Meta-Analysis
Method 3 estimated both a prospective overall personality–cognitive ability association as well as sample-specific estimates. However, unlike in Method 2, sample-specific estimates will not be estimated simultaneously with overall estimates. The procedure for this method is as follows:
Step 1: Sample-Specific Statistical Modeling
Models were run separately for each sample, personality trait, outcome, covariate, and moderator combination. The basic form of the model is as follows, where models are separately estimated for each sample, j:
Step 2: Results Pooling Using Meta-Analysis
Once all the models were run, we next combined the effects across samples via meta-analysis. We opted to use random-effects meta-analysis because its assumptions surrounding cross-study variance are often more aligned with research goals in psychology, but other forms of meta-analysis can be readily applied, when appropriate. To do so, we constructed three helper functions to (1) set up and run the meta-analytic models, (2) extract the meta-analytic estimates, and (3) extract heterogeneity estimates. The meta-analytic model is as follows:
where
Method 4: Separate Analyses Reported Together
For Method 4, we report only the separately estimated sample-specific effects, with no aggregation across samples. Thus, the models for Method 4 were identical to those estimated in the first part of Method 3 with no subsequent meta-analysis.
Results
As detailed above, each method differs in what and how overall, sample-specific, and heterogeneity estimates are estimated. Below, we summarize overall and sample-specific findings across methods. Detailed sample results sections for each method can be found in the Supplemental Material.
Overall Estimates
Table 4 presents the fully adjusted (age, gender, and education) estimates of the key terms from all unmoderated and participant-level moderators of overall estimates across all samples and methods. Across all methods, the most consistent associations were between neuroticism (−) and openness (+) with later crystallized abilities, which appeared in all methods except Method 1B. Less consistent were associations between extraversion (−), agreeableness (−), and conscientiousness (+), which only appeared in Method 1A. Notably, despite applying Satterthwaite t-tests, which are most appropriate for relatively small numbers of samples (k <20), Method 1B provided much more conservative estimates than all other methods by a large margin. In addition, Methods 1A and 1B both showed markedly different associations from Methods 2A, 2B, and 3, which likely reflects the ecological bias in meta-analysis.
Cross-Method Comparison of Prospective Overall Effects and Participant-Level Moderators of Personality-Crystallized Domain Associations.
Note. All terms except Personality indicate interaction with personality trait levels (e.g., Age = Age x Personality). Bold indicates 95% confidence intervals that did not overlap with 0. All parameter estimates were extracted from separate models. E = Extraversion, A = Agreeableness, C = Conscientiousness, N = Neuroticism, O = Openness to Experience.
Sample-Specific Estimates
Some methods (Methods 2A–B, 3, and 4) also estimate sample-specific estimates, which are shown in Figure 2. As can be seen in the forest plots, openness and neuroticism showed the most consistent associations with crystallized abilities. Openness was positively associated with crystallized abilities in five of the seven samples in Methods 2A and 2B (all except ROS or GSOEP) or six of the seven samples (all except ROS) in Method 3. Neuroticism was negatively associated with crystallized abilities in 5 of 11 samples in Methods 2A and 3 and 7 of 11 samples in Method 2B. In a few cases, a relatively large number of sample-specific estimates were significant, despite non-significant meta-analytic associations. For example, although there was no overall association between conscientiousness and crystallized abilities, SATSA showed a significant negative association, and ROS and HRS showed positive associations across all three methods.

Forest plot of fully adjusted prospective associations between Big Five personality traits and crystallized abilities across samples for using one-stage pooled integrative data analysis via effects coded contrasts (Method 2A). Overall point estimates (squares) represent the grand-mean estimates of the association across samples, while sample point estimates represent regression terms or linear combinations of regression terms. Error bars capture the 95% CI around the point estimate. Arrows indicate the confidence band was truncated to better visualize the estimates.
Person-Level Moderators
Next, we examined whether there were person-level moderators of the association between the Big Five and Crystallized abilities (see Table 4). First, we examined the overall associations across all studies. Most consistently, we observed that gender moderated the association between Extraversion and crystallized abilities across samples for all methods except Method 2B (see Figure 2). Even so, all point and interval estimates were within two-hundredths of estimates from both Methods 2A and 3. In Method 1A, the overall prospective association between personality–crystallized ability association was negative for men (b = −0.06, 95% CI [−0.08, −0.04]) and null for women (b = −0.01, 95% CI [−0.03, 0.006]). In Method 1B, the overall prospective association between personality–crystallized ability association was null for both men (b = −0.09, 95% CI [−0.24, −0.05]) and women (b = −0.05, 95% CI [−0.17, 0.08]). In Method 2A, the overall association between personality traits and later crystallized ability was null for extraversion for men (b =−0.01, 95% CI [−0.06, 0.03]) but positive for women (b = 0.05, 95% CI [0.004, 0.09]), such that women who were higher in extraversion tended to score higher on crystallized domain tasks. In Method 2B, the interaction indicates that the association was null for both men (b = −0.02, 95% CI [−0.04, 0.01]) and women (b = 0.03, 95% CI [−0.01, 0.08]), and the associations did differ from one another.
We also examined sample-specific estimates of moderators of personality–cognition association. Given that the most consistent effect was gender moderating the association between extraversion and crystallized abilities, Figure 3 shows a forest plot of the sample-specific gender moderation associations across studies for each of the Big Five, and Figure 4 illustrates a simple effects plots of the over-all and sample-specific estimates across methods. As is clear in both figures, estimates are much more similar for Methods 2A and 3 than for Method 2B, with near exact patterns of sample-specific associations. In Method 2B, however, shrinkage due to partial pooling is apparent is the greater consistency of estimates across samples evidenced by fewer crossing lines.

Forest plot of fully adjusted gender moderation of the prospective associations between Big Five personality traits and crystallized abilities across samples for using one-stage pooled integrative data analysis via effects coded contrasts (Method 2A). Gender was coded as 0 = men and 1 = women, so point estimates represent the difference in association between women and men (positive = women had higher absolute magnitude; negative; men had higher absolute magnitude). Overall point estimates (squares) represent the grand-mean estimates of the association across samples, while sample point estimates represent regression terms or linear combinations of regression terms. Error bars capture the 95% CI around the point estimate. Arrows indicate the confidence band was truncated to better visualize the estimates.

Prospective sample-specific and overall associations between extraversion (in POMP units, 0–10) and crystallized abilities (in POMP units, 0–10) across genders (men, women) for Methods 2A, 2B, and 3. Different colors and line types indicate different samples. Thicker, black lines indicate the average association across samples, while thinner lines indicate sample-specific associations.
Learning More
For brevity, above, we briefly summarized main findings across methods for the main effect of Big Five personality traits predicting later crystallized abilities. In the Supplementary Material, we provide sample results sections for each method separately and sample-level moderators (i.e., meta-regressions).
Discussion
The present article differentiated and demonstrated techniques for synthesizing IPD across samples. First, we outlined methods of data synthesis, which we organized into five broad categories of data synthesis (see Table 1), ranging from fully pooled regression models that do not include (Method 1A) or correct for cross-sample heterogeneity (Method 1B) to fully separate regression models that does not include pooling of effect sizes across samples (Method 4) or traditional meta-analysis (Method 5). Second, we demonstrated how to carry out each form of IPD meta-analysis (Methods 1–4) using the R programming language. Third, we examined prospective associations between Big Five personality traits and crystallized abilities. In alignment with previous investigations that compared different combinations of methods used in our taxonomy (Burke et al., 2017; Debray et al., 2013; Legha et al., 2018), we found that most results replicated across methods, particularly for associations between personality and cognitive ability, with some exceptions that we discuss in more detail below. Notably, four of the six modeling techniques yielded convergent results, specifically pooled analysis using contrasts (2A) and random effects (2B), as well as both CDA techniques (with [3] and without [4] random-effects meta-analysis). The two techniques that did not yield results consistent with the others were the standard pooled analysis (1A), which yielded all significant effects across traits, and pooled analysis with cluster-corrected errors (1B), which yielded all null effects across traits (see below for further discussion of this pattern). Fourth, consistent with prior investigations, we found robust prospective associations between both openness and neuroticism and crystallized abilities. For openness, this association was present in 5 (71.4%) to 6 (85.7%) of the 7 samples across methods, while for neuroticism this association was present for 5 (45.5%) to 7 (63.6%) of the 11 samples across methods. These results are also mostly consistent with the recent traditional meta-analysis which found associations for neuroticism and openness (Anglim et al., 2022). Below, we discuss each of these methods in turn and provide recommendations for best practices in data synthesis.
Choosing a Method of Data Synthesis
In the present study, we applied data synthesis parameterizations that spanned four of the five levels of our taxonomy of methods for synthesizing data. These methods differ with respect to the types of data used (i.e., aggregated vs. individual participant), the number of models estimated, the type of models estimated, whether sample-specific and/or overall estimates are included, and the degree of harmonization of sample variables necessary. Based on the present study and our review of the literature on individual participant meta-analysis, we recommend that investigators answer the following questions to guide their decisions about which method is most appropriate for the research questions they will be addressing (Table 5).
Decision Points and Recommendations for Conducting Data Synthesis.
Note. Method 1A = Pooled analysis of IPD; Method 1B = Pooled analysis of IPD with cluster-corrected errors; Method 2A = Pooled analysis of IPD using contrasts; Method 2B = Pooled analysis of IPD using random effects; Method 3 = Separate analyses of IPD followed by meta-analysis; Method 4 = Separate analyses of IPD Reported Together (but not pooled); Method 5 = Traditional Meta-Analyses.
In the presence of complex models, we suggest applying meta-analytic structural equation modeling.
Below, we discuss each question above and their implications in detail. First, however, we want to make a small number of prescriptive recommendations that apply more broadly across cases. First, we do not recommend the use of Method 1A or Method 1B or any model that treats variance across samples as “nuisance” variance. These models appear to show biased parameter estimates, which may be particularly pronounced that even samples collected from the same population at different times may not be fully exchangeable due to timing effects (e.g., seasonal sleep-wake difference, global events, etc.). Importantly, this is true not only in the basic regression context (Cameron et al., 2024) but also in the context of MASEM (Groot et al., 2025). Second, in most circumstances, we suggest using Method 2B (one-stage approach) or Method 3A or 3B (two-stage approach with random-effects meta-analysis or MASEM), each of which show comparable results, allow for sample-level and overall-estimates, and allow for sample-level moderators.
A first and basic question (#1 in Table 5) is whether IPD are available either directly to the researchers or through an external analyst. If the answer to both is no, then Methods 1 to 4, which were the focus of the present study, would not be possible to estimate fully. The only option available would be to conduct a traditional meta-analysis in which effect sizes of interest are extracted from the published and unpublished literature (where possible). MASEM with correlation or covariance matrices may also be appropriate if the model allows and the matrices are available via publications or data maintainers. If IPD are directly available to the researcher, then pooled analysis is possible (Method 1 or 2), although data harmonization considerations need to be met prior to selecting one of these methods (see below). If IPD is not directly available to the research, and an external analyst is necessary, then pooled analysis is not possible, and one of the CDA approaches (Methods 3 and 4) are most appropriate.
Second, researchers need to next ask the extent to which the core variables are either logically or analytically harmonizable (Cole et al., 2023; #2 in Table 5). Harmonization can occur either before (ex ante) or after (ex post) data are collected (Dubrow & Tomescu-Dubrow, 2016). Ex post harmonization is particularly challenging given the necessity of harmonizing across diverse datasets (Cole et al., 2023; Fortier et al., 2017). If ex post harmonization is not possible, then the data synthesis methods outlined in the present study are likely not appropriate and Method 4 or 5 should typically be used. In the present study, we utilized logical ex post harmonization, wherein we identified studies that used validated Big Five inventories and crystallized intelligence measures whose convergence was established in prior work (Langa et al., 2020; Soto & John, 2017) and linearly transformed them onto the same scale using POMP-scoring. But harmonization is a non-trivial issue that warrants particular consideration. While those considerations are beyond the scope of the current study, excellent overviews and tutorials exist (Cheng et al., 2024; Cole et al., 2023; Dubrow & Tomescu-Dubrow, 2016).
Third, if harmonizable IPD are available, researchers should decide whether they want to estimate meta-analytic estimates (#4 in Table 5). This decision depends both on the complexity of the models researchers desire to use (#3 in Table 5) and the specific effect sizes that are desirable to synthesize. While our taxonomy is broadly applicable to many types of models and effect sizes, single-stage models, like Methods 1 and 2, require that synthesized terms are parameterized as model-based coefficients (e.g., regression coefficients, mean differences, etc.). Thus, synthesis of effect sizes, such as zero-order correlations, that cannot be directly estimated in multivariable or multilevel models would have to be calculated in a secondary step via common effect size conversions. If that is not desirable, then a typical two-stage meta-analytic approach (i.e., Method 3) is likely most appropriate. Moreover, the authors of the present study frequently find that research questions that rely on more sophisticated models may not be appropriate to meta-analytically combine due to necessities in parameterizing models to get them to converge, and the smaller number of studies meeting inclusion criteria for more complex models leading to lower study-level power for meta-analysis (Graham et al., 2017; Neupert et al., 2024). Rather, in these cases, we advocate for clearly reporting differences in how models were estimated across samples and making descriptive comparisons in effect sizes, where appropriate. Alternatively, if correlation or covariance matrices are available, MASEM is likely an optimal approach. However, particularly for complex models with within-person effects and complex random-effects structures typically handled via multilevel models, estimating an MASEM model is a non-trivial issue, even when using multilevel structural equation modeling (SEM).
Fourth, researchers should ask whether they are interested in separately parameterizing estimates for each sample (#5 in Table 5). If not, fully pooled regression (Method 1A) or pooled regression with cluster-corrected errors (Method 1B) may be sufficient to answer questions of interest. However, we generally do not recommend this for the following reasons. First, Method 1B had much larger standard errors than all other methods. Likely, this is due to a commonly observed issue within cluster-corrected regression (Leyrat et al., 2018) in which small numbers of or imbalanced clusters can result in over-rejection of the null. Given that for many pooled analyses within psychology (e.g., internal meta-analysis, Goh et al., 2016) will have similar or even fewer numbers of clusters than those used in the present study, such over-rejection of the null is likely to be a common issue. Method 1A had narrower standard errors than other methods, which likely overestimates certainty by failing to account nested study information. Much like the ecological bias in longitudinal data can lead to biased estimates by confounding within- and between-person information, failing to account for sample-level information can lead to ecological bias in meta-analysis by confounding person- and sample-level information (Berlin et al., 2002; Hua et al., 2017). Second, results from the 11 individual samples varied in the magnitude, direction, and significance of associations (see Supplemental Material). Therefore, we recommend methods that parameterize sample-specific estimates, even if these are not focal.
When estimating sample-specific associations, researchers are faced with choosing between one- and two-stage approaches. Consistent with previous investigations of one- and two-stage IPD-MA, estimates from these methods largely converged (Burke et al., 2017; Debray et al., 2013; Legha et al., 2018). Current recommendations suggest that one-stage approaches are preferred when binary indicators are unbalanced, continuous indicators are non-normally distributed, or sample sizes are small (Burke et al., 2017; Cooper & Patall, 2009; Debray et al., 2013). One-stage approaches allow both focal and adjustment terms to be jointly estimated and for their residuals to be correlated, which is often desirable when studying predictor–outcome relationships (Burke et al., 2017; Cooper & Patall, 2009). Method 2A, which uses contrasts to estimate sample-specific effects can be useful when (1) correlations among random effects are not of interest; (2) samples are not thought to be drawn from a larger population of samples; or (3) the number of samples is small (these models have fewer parameters than the multilevel models used in Method 2B; Curran & Hussong, 2009; Legha et al., 2018; Riley et al., 2020). But in some cases, interest in variances of and correlations among random effects (or predicting that variance via sample-level moderators/meta-regression) as well as the desire to include sample-level moderators/meta-regression may necessitate the use of Method 2B (see #8 in Table 5). Method 2B uses shrinkage via partial pooling in multilevel models (Method 2B), which can reduce overall error by shrinking outliers toward the average. However, when outliers are not errors, this could lead to biased parameter estimates of those samples (see #9 in Table 5).
Finally, we encourage researchers to ask whether they are interested in sample-level moderators or in predicting variability (e.g., tau) or error (e.g., sigma; see #6 in Table 5). If these are not of interest, then Method 2A, which uses effect codes, should provide convergent evidence with other methods with a much simpler parameterization (i.e., variances and covariances among sample-specific estimates are not estimated). If sample-level moderators or the prediction of variability/error are of interest, then researchers can implement Method 2B or 3. In the present study, we highlight sample-level moderators as an opportunity, but not something we emphasized because the number of studies we synthesized left us underpowered to conduct robust tests. Yet, we believe they are a key discussion and cannot overemphasize the care that should be taken when conducting such tests. When the number of samples is relatively small (e.g., <10) and there is either little variability in continuous sample-level moderators or a large number of categories in categorical sample-level moderators, power will tend to be particularly low and the likelihood of Type I errors high. Moreover, sample-level moderators may be particularly sensitive to differences in how sample-level estimates are parameterized. Because sample-level moderators are essentially meta-regressions in which heterogeneity in effect sizes are predicted by sample-level characteristics, moderator estimates depend on how such heterogeneity is estimated. For example, when conducting these moderator tests in Method 2B, there are likely to be differences in the estimates due to shrinkage (from partial pooling). As an alternative, we recommend researchers focus on descriptive comparisons of effect sizes across samples rather than relying on statistical inferences when the number of samples is small.
Lastly, given the choice to use a meta-analytic model, which meta-analytic model to use is non-trivial that span from basic choices between fixed- and random-effects meta-analysis (#9 in Table 5) to more complex considerations, including applying corrections for validity and reliability attenuated correlations (Schmidt & Hunter, 1998). An ever-growing number of such models exists, and recommendations for appropriate models will change accordingly. But broadly, we recommend basic random-effects meta-analysis for common univariate questions, including heterogeneous variances, and one-stage MASEM for more complex multivariate questions, including measurement models, and measurement invariance testing. Many of the same considerations that govern the choice between multilevel models and SEM (e.g., random-effects structures, measurement models) also dictate the choice between these meta-analytic choices.
In sum, there is no single answer to the question of which method of data synthesis is best. While our general recommendations are that Methods 2B and 3 should typically be used when addressing questions of overall estimates, sample-level moderators, and predictors of variability/error, researchers should carefully weigh the promises and pitfalls of different methods with reference to the data they aim to synthesize and choose accordingly.
Limitations and Future Directions
The present study aimed to delineate and demonstrate methods for synthesizing IPD. To do so, we used data from 11 samples to investigate prospective associations between the Big Five personality traits and crystallized abilities using six parameterizations across four categories of our data synthesis taxonomy.
Despite its strengths (multi-sample, multi-national, sample-size, multi-method, etc.), the present study has a number of limitations and lingering questions. First, we opted for an empirical demonstration of these methods in order to highlight real challenges in documenting, harmonizing, and analyzing data across samples that differ in many ways that cannot answer lingering methodological questions that could be answered by simulations (e.g., sample size, effect size, etc.; c.f. Riley et al., 2020). Second, each of the 11 samples were all drawn from Western, democratic, educated, and predominantly white samples, which limits generalizability. Third, we focused on a specific demonstration of considerations using a regression-based framework. While we believe this framework covers most analytic approaches and effect sizes within psychology, there a number of more nuanced considerations for specific analytic methods that are beyond the scope of this study. Fourth, as noted previously, we used a relatively small number of samples (11) in our IPD approaches to data synthesis, which left us underpowered for most sample-level moderator tests. Fifth, in the present study, we relied on logical harmonization of both personality traits and crystallized abilities. That coupled with too few studies for meta-regression left us unable to answer whether sample-level variation may have been methodological.
Conclusion
The current study provides a demonstration of how to conduct data synthesis across four levels of our newly presented taxonomy of methods to synthesize data via IDA. We delineated differences across these methods and made concrete recommendations both when each is appropriate and which are more reliable under different circumstances. We also estimated prospective associations between Big Five personality traits and crystallized ability across one of the largest investigations of these associations to date. We found reliable prospective associations between both Openness to Experience and Neuroticism and crystallized abilities across samples. These associations held even when adjusting for covariates and few participant-level moderators of those associations. Openness and Neuroticism are uniquely associated with crystallized abilities and most methods of data synthesis provided convergent evidence of this.
Supplemental Material
sj-docx-1-psp-10.1177_01461672261451901 – Supplemental material for ArticleA Taxonomy of Data Synthesis
Supplemental material, sj-docx-1-psp-10.1177_01461672261451901 for ArticleA Taxonomy of Data Synthesis by Emorie D. Beck, Emily C. Willroth, Julia A. M. Delius, David A. Bennett, Lisa L. Barnes, Bryan D. James, Richard B. Lipton, Mindy Katz, Linda B. Hassing, Martijn Huisman, Daniel K. Mroczek and Eileen K. Graham in Personality and Social Psychology Bulletin
Footnotes
Author Contributions
E.D.B., E.C.W., D.K.M., and E.K.G. conceptualized the idea, co-wrote the Introduction, and edited the manuscript. E.D.B. wrote the preregistration, ran the analyses, wrote the Method, Results, and Discussion, and prepared all Supplemental Material. D.A.B., L.L.B., R.B.L., M.K., M.H., J.A.M.D., and L.B.H. collected the data and prepared the data. All authors participated in editing and finalizing the manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Emorie Beck, Emily Willroth, Daniel Mroczek, and Eileen Graham were supported by National Institute on Aging Grants R01-AG067622, R01-AG082954, and R01-AG018436. Emily Willroth was additionally supported by National Institute on Aging Grant K99/R00AG071838. Richard B. Lipton and Mindy Katz supported by: NIH/NIA P01 AG03949, the Czap Foundation and the Leonard and Sylvia Marx Foundation. Julia A. M. Delius was supported by the Max Planck Society. The Berlin Aging Study (BASE;
) was initiated by the late Paul B. Baltes in collaboration with Hanfried Helmchen, psychiatry; Elisabeth Steinhagen-Thiessen, internal medicine and geriatrics; and Karl Ulrich Mayer, sociology. Financial support for BASE was provided by the Max Planck Society; Freie Universität Berlin; the German Federal Ministry for Research and Technology (1989–1991, 13 TA 011 & 13 TA 011/A); the German Federal Ministry for Family, Senior Citizens, Women, and Youth (1992–1998, 314-1722-102/9 & 314-1722-102/9a); and the Berlin-Brandenburg Academy of Sciences Research Group on Aging and Societal Development (1994–1999). The OCTO-Twin Study was supported by a grant from the National Institute of Aging (NIA: AG 08861) of the National Institutes of Health.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
All code, results, figures, and tables are available online on GitHub (https://github.com/emoriebeck/data-synthesis-tutorial) and the OSF (https://osf.io/zut7b/). Code can be explored via a rendered bookdown webpage (https://emoriebeck.github.io/data-synthesis-tutorial/index.html) and results can be explored via an R Shiny Webapp (
).
Supplemental Material
Supplemental material is available online with this article.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
