A Taxonomy of Data Synthesis

Abstract

As efforts to improve the credibility of psychological and other social sciences continue, researchers aim to conduct multi-study or multi-sample research and synthesize findings using different parameterizations of individual participant meta-analysis. No overarching organizational framework exists, and only a few simulation-based or empirical examples comparing these parameterizations. This article has two goals. First, we provide an overview of six common parameterizations of individual participant meta-analysis, organized into a taxonomy based on different features (e.g., sample-specific parameters, meta-analytic parameters, and number of models). Second, using empirical data from 26,205 participants across 11 longitudinal studies, we provide comparisons of each parameterization testing prospective associations between the personality traits and crystallized abilities. We found that openness was a robust predictor of crystallized abilities across samples. Across methods, we observed consistency in model estimates, with some exceptions. We conclude with recommendations for choosing an approach given a team’s goals, questions, data availability, and model features.

Keywords

meta-analysis data synthesis coordinated data analysis longitudinal methodology replication aging personality integrative data analysis

Introduction

Over the past decade, increasing concerns over the replicability, reproducibility, and generalizability of psychological and other social sciences has driven advances in psychological research methods (Nosek et al., 2022; Open Science Collaboration, 2012, 2015; Wiggins & Christopherson, 2019). Discourse has focused on replicating psychological experiments (Chartier et al., 2020; Ebersole et al., 2016; Klein et al., 2018, 2022), underemphasizing the importance of non-experimental, observational, and quasi-experimental psychological science (Brooks-Gunn et al., 1991; Mroczek et al., 2011; Weston et al., 2019). Such non-experimental approaches often rely on complex and sometimes intensive longitudinal studies for which replication is much more difficult (Hofer & Piccinin, 2009) given that data often take decades to accumulate.

Integrative data analysis (IDA) is a core tool for synthesizing multiple sources of data (Hofer & Piccinin, 2009) and arose in response for the need for specialized methods within longitudinal data analysis to better understand the replicability and generalizability of findings by synthesizing existing data from a variety of sources and samples (Graham et al., 2022; Hill & Stine-Morrow, 2022; Mroczek et al., 2022; Willroth et al., 2022). This methodological framework and its specialized tools have been generative for the study of personality and aging in the past two decades (Beck & Jackson, 2022; Graham et al., 2017; Jokela et al., 2013; Willroth et al., 2025; Yoneda et al., 2022). These methods also have potential to be applied to other areas of psychology as well.

Yet despite their promise, the diverse available methods for the synthesis of data and results have hampered wide-scale adoption. Here, we use the term data synthesis as an umbrella term that encapsulates multiple methods for integrating different data sources, including IDA (e.g., see special issue on IDA, Bauer & Hussong, 2009; Cooper & Patall, 2009; Curran, 2009; Curran & Hussong, 2009; Hofer & Piccinin, 2009; McArdle et al., 2009), individual participant data (IPD) meta-analysis (Debray et al., 2015; L. A. Stewart et al., 2015), coordinated data analysis (CDA; Graham et al., 2022; Willroth et al., 2022), pooled analysis (Bidard et al., 2014), and meta-analytic structural equation modeling (MASEM; Jak & Cheung, 2020), and traditional meta-analysis (Viechtbauer, 2007), among others. Thus, researchers must choose the best form of data synthesis for different sets of circumstances, despite a limited number of systematic conceptual discussions or empirical investigations of their similarities and differences. For example, within the psychological literature, there is little discussion of (1) what different data synthesis approaches are available or (2) how their estimates may differ from one another (see Burke et al., 2017; Debray et al., 2015; G. B. Stewart et al., 2012). As a result, different techniques are applied depending on researcher preferences and little is known about how well these methods replicate one another, particularly outside the context of clinical trials.

The current article has three objectives. First, we provide an overview of some of the most commonly used parameterizations of data synthesis in psychology. We organize these into a taxonomy based on different features of each parameterization (e.g., sample-specific parameters, meta-analytic parameters, number of models required). We describe each method in detail and provide R code to carry them out. Second, we test associations between Big Five personality traits and crystallized abilities using four of the five levels of our taxonomy, along with four moderators of these associations, using empirical data from 26,205 participants across 11 longitudinal studies. Third, we compare convergence and divergence of findings across methods, outline pros and cons of each approach, and make recommendations for best practices. While we focus specifically on application to regression coefficients and IPD, the issues described herein and the taxonomy presented apply to other types of effect sizes and statistical models, including mean differences, odds ratios, risk ratios, and more.

Issues in Data Synthesis

As the need for data synthesis is increasingly recognized and the availability of data to synthesize grows, more researchers may be interested in conducting data synthesis. One of the first questions is which method researchers should use to synthesize data, but this is not a straightforward question because there is no single standard method. Instead, subfields and individual researchers have adopted numerous different approaches to and names for data synthesis (Beck & Jackson, 2022; Curran & Hussong, 2009; Debray et al., 2015; Graham et al., 2022; Hofer & Piccinin, 2009). Adding to the complexity of this question, inconsistent and imprecise terminology has been applied to these different approaches (e.g., mega-analysis, CDA, and IPD meta-analysis have been used largely interchangeably; Beck et al., 2024; Graham et al., 2022; Legha et al., 2018). We believe that data synthesis is a useful umbrella term to capture all of these methods; there are currently few resources available that summarize and compare the different approaches to data synthesis.

A clear taxonomy of the data synthesis methods will benefit the field in at least three ways. First, a taxonomy of data synthesis methods will serve as a useful resource for researchers who are new to data synthesis. Second, a clear taxonomy will help to address the problem of inconsistent terminology to refer to the same methodological approaches. Third, a taxonomy of data synthesis methods will facilitate conceptual and empirical comparisons of different methods. The present study aims to provide such a taxonomy, compare different methods, and make recommendations about when and how to use each.

Methods of Data Synthesis

We build our taxonomy based on five broad categories of data synthesis, two of which have multiple parameterizations that we will describe in more detail: (1) pooled analysis of IPD (Batty et al., 2018), without (Method 1A) and with (Method 1B) cluster-corrected standard errors; (2) pooled analysis of IPD using contrasts and interactions (Method 2A; Debray et al., 2013) or random effects (Method 2B), (3) sample-specific analyses followed by random-effects meta-analysis (Graham et al., 2017; Yoneda et al., 2022), (4) sample-specific analyses reported together (Graham et al., 2020; Graham, James, Jackson, Willroth, Boyle, et al., 2021), and (5) traditional meta-analysis (Barrick & Mount, 1991; Poropat, 2009). Although there are a variety of analytical choices researchers face within each broad category and each can be conceptualized in terms of traditional, univariate fixed or random-effect meta-analyses, network meta-analysis, Bayesian meta-analysis, or MASEM, we emphasize consistent broad differences across categories and center our empirical example around the most commonly applied univariate parameterizations in psychology. Before describing each of them in detail, we first delineate the key features that differentiate them, which are summarized in Table 1.

Table 1.

Key Features of Five Levels of Data Synthesis.

Method	Names	IPD	Number of models	Sample-specific estimates	Degree of harmonization required to apply method	Examples
Single model
1	Pooled analysis of IPD	Yes	One	No	High	Batty et al. (2018)
2	Pooled analysis of IPD using contrasts or random effects	Yes	One	Yes, random effects or contrasts + interactions	High	Beck and Jackson (2022); Paige et al. (2017)
Multiple models
3	Separate analyses of IPD followed by meta-analysis	Yes	Number of studies + meta-analysis model	Yes, original study effect sizes	Moderate	Graham et al. (2017); Jokela et al. (2020)
4	Separate IPD analyses reported together	Yes	Number of studies	Yes, original study effect sizes	Moderate	Graham et al. (2020)
5	Traditional meta-analyses	No	One (Meta-analysis model)	Yes, original study effect sizes	None	Barrick and Mount (1991); Poropat (2009)

Note. Meta-analysis is used in the broad sense to include heterogeneous meta-analysis models, including fixed and random-effect meta-analysis and MASEM. IPD = individual participant data; MASEM = meta-analytic structural equation models.

We argue there are at least four key factors that differentiate the taxonomy:

Individual participant data

Number of models needed

The inclusion of sample-specific estimates

The degree of harmonization required to apply the method

Each of these considerations will be discussed in more detail in the section “Method” when outlining core characteristics of each approach. However, one aspect of differentiating these methods deserves special consideration, which is that these methods differ in the degree of harmonization necessary to conduct the analyses. Harmonization refers to how variables of interest are pulled, recoded, and included in order to allow data from different sources to have exact or conceptual mapping across samples (for a more thorough treatment, see Cheng et al., 2024; Dubrow & Tomescu-Dubrow, 2016). One-stage (single model) pooled analyses require the highest level of harmonization, with the operationalization of key variables in each sample essential to model estimation and interpretation. Of course, there are methods, like standardization or other transformations of variables, that can transform variables onto the same scale, but such harmonization choices should be made carefully as transformations may not always be sufficient for harmonization (e.g., when syntactic, structural, or semantic considerations are highly heterogeneous; Cheng et al., 2024). In contrast, coordinated analyses and meta-analyses of existing studies can have somewhat more flexible harmonization because effect sizes are estimated separately and can be converted to standardized metrics for inferential equivalence before carrying out a meta-analytic procedure. However, particularly for coordinated analyses using raw data, harmonization is an important consideration for later interpretation and pooling (Bauer & Hussong, 2009; Cole et al., 2023; Graham et al., 2022). Finally, in traditional meta-analyses, harmonization of key variables in raw data sometimes considered unnecessary as it relies on effect size estimates alone (and measurement differences can be tested via meta-regression). A more nuanced discussion of data harmonization is beyond the scope of this necessarily brief introduction to methods for data synthesis focusing on specific modeling parameters, assuming prior careful harmonization. For a more nuanced discussion, we suggest a growing body of work specifically on harmonization (Cheng et al., 2024; Cole et al., 2023; Dubrow & Tomescu-Dubrow, 2016; Fortier et al., 2017; Griffith et al., 2013) and more recent work on MASEM that combines measurement models, measurement invariance, and path models into a combined framework (Jak & Cheung, 2020).

In choosing the specific methods, we chose to parameterization as illustration, two omitted methods are of note. The first are additional parameterizations that adjust for study membership but assume homogeneous predictor–outcome associations across studies. These models adjusts for study membership only through fixed study indicators or random intercepts without allowing slopes to vary across studies. Because our goal was to illustrate the key conceptual distinctions among synthesis approaches rather than enumerate every possible parameterization, we did not include these additional variants, as doing so would substantially increase the number of models presented without introducing new conceptual categories.

The second omitted parameterization is MASEM, which provides an important alternative framework for synthesizing effect estimates across studies. One-stage MASEM, in particular, allows researchers to estimate a meta-analytic model directly from study-level correlation or covariance matrices, including the integration of measurement and structural components when appropriate (Jak & Cheung, 2020). Conceptually, MASEM occupies a position between the Individual Participant Meta-Analysis (IPD-MA) approaches emphasized here and traditional meta-analysis: it addresses the univariate limitations of conventional meta-analytic models by leveraging the full correlation or covariation structure of each study, while still operating at the study level rather than the individual level. Given the already broad set of regression-based synthesis methods illustrated in this article, and our focus on approaches that are readily accessible to applied researchers, we do not provide a full MASEM demonstration. Nonetheless, we acknowledge MASEM as a valuable option for more complex modeling goals and encourage interested readers to consult existing methodological resources.

Substantive Empirical Question: Personality and Crystalized Abilities

To demonstrate the power of data synthesis for answering common psychological research questions, we conducted a study investigating associations between Big Five personality traits and crystallized cognitive abilities. We chose these constructs because they are consequential (Beck & Jackson, 2022) and readily harmonized at the construct level based on prior construct validation studies (Griffith et al., 2013; Langa et al., 2020; Soto & John, 2017). Moreover, recent findings suggest that personality traits likely are associated with crystallized abilities, but these effects may be strongest for openness to experience and, to a lesser extent, neuroticism and extraversion (Hultsch et al., 1999; Jorm et al., 1993; Rammstedt, 2018; Soubelet & Salthouse, 2010, 2011; Wettstein et al., 2017; Zimprich et al., 2009). There is a growing need for replication of these findings, as the reported associations above are limited by the number of datasets used. We extend previous work by examining prospective Big Five personality trait–crystallized ability associations across (1) 11 samples and (2) several sets of covariates and moderators (e.g., education) that are thought to explain associations between them. Because traditional meta-analysis is beyond the scope of this article, we instead compare our results to a recent traditional meta-analysis (Anglim et al., 2022), which found significant meta-analytic estimates among crystallized intelligence and the Big Five, specifically for neuroticism, extraversion, and openness.

The Present Study

The present study directly tests and compares four broad categories of our data synthesis taxonomy. To demonstrate these various data synthesis approaches, we applied each approach to investigate whether the Big Five prospectively predict crystallized abilities in 11 longitudinal panel samples.

Method

Transparency and Openness

This study was preregistered on the Open Science Framework (https://osf.io/rzym7). In addition, all code, model objects, figures, and tables are available in the Supplemental Material on the OSF (https://osf.io/zut7b/) and GitHub (https://github.com/emoriebeck/data-synthesis-tutorial). Data are publicly available or available by application (see Supplemental Material for details on accessing specific samples). There were a few minor deviations from the pre-registration, and these are described in Supplemental Table S1 in a preregistration deviation table (Willroth & Atherton, 2024). Finally, rendered results are available as a standalone web page on GitHub (https://emoriebeck.github.io/data-synthesis-tutorial) and in an online R Shiny webapp (https://emoriebeck.shinyapps.io/data-synth-tutorial/). We recommend using the web page for code, and the web app to explore tables and figures.

Data cleaning, analyses, and results communication were done using the following R (version 4.2.0; R Core Team, 2022) packages: psych (version 2.2.5; Revelle, 2022), knitr (version 1.40; Xie, 2014), kableExtra (version 1.3.4.9000; Zhu, 2022), brms (version 2.18.0; Bürkner, 2021), readxl (version 1.4.1; Wickham & Bryan, 2022), haven (version 2.5.1; Wickham et al., 2022), estimatr (version 1.0.0; Blair et al., 2022), lme4 (version 1.1.30; Bates et al., 2015), broom.mixed (version 0.2.9.4; Bolker & Robinson, 2022), bootpredictlme4 (version 0.1; Duursma, 2022), effectsize (version 0.7.0.5; Ben-Shachar et al., 2020), metafor (version 3.8.1 Viechtbauer, 2010), rstan (version 2.21.7; Stan Development Team, 2022), tidybayes (version 3.0.2; Kay, 2022), cowplot (version 1.1.1; Wilke, 2020), plyr (version 1.8.7; Wickham, 2011), tidyverse (version 1.3.2; Wickham et al., 2019), and furrr (version 0.3.1; Vaughan & Dancho, 2022).

Participants

The current study included 26,205 participants from 11 longitudinal samples. These samples spanned three continents and five countries. We chose samples based on prior work using different data synthesis methods with IPD to examine associations between personality or cognitive ability and a number of life outcomes (Beck & Jackson, 2022; Graham et al., 2020; Graham, James, Jackson, Willroth, Luo, et al., 2021; Hakulinen et al., 2015; Jokela et al., 2013, 2020; Sutin et al., 2019; Yoneda et al., 2022). Study selection and inclusion criteria are outlined in Figure 1. These final k = 11 studies represent a mix of publicly available data (The German Socioeconomic Panel Study [GSOEP], The Household, Income and Labour Dynamics in Australia Study [HILDA], the Health and Retirement Study [HRS], and the Swedish Adoption/Twin Study of Aging [SATSA]) and data we obtained through data use agreements with the sample maintainers (The Berlin Aging Study [BASE], The Einstein Aging Study [EAS], The Longitudinal Aging Study Amsterdamn [LASA], The Religious Orders Study [ROS], The Memory and Aging Project [MAP], The Minority Aging Reserach Study [MARS], and Origin of Variances in the Oldest-Old: Octogenarian Twins [OCTO-Twin]). For each sample, we used the latest data release, and participants were included in all models in which they had all necessary data (i.e., participants within studies vary across combinations of personality traits, covariates, and moderators when necessary). Detailed descriptions for each sample, including the steps necessary to access the data, are included in the Supplemental Material and R Shiny Web app (https://emoriebeck.shinyapps.io/data-synth-tutorial/).

Figure 1.

Flowchart of sample identification and inclusion at time of analysis.

Measures

In this study, we tested how the Big Five personality traits are associated with crystallized abilities, while adjusting for no covariates, adjusting for age, gender, and education individually, as well as combinations of each. We also examined the covariates as three participant-level moderators of personality–cognition relationships.¹ For a full overview of which personality and outcome measures are available across datasets, see Supplemental Table S1(and also webapp).

Personality Traits

We examined the Big Five (extraversion, agreeableness, conscientiousness, neuroticism, and openness to experience). Full information on the scales used for each of these measures for each sample is presented in Table 2, and full descriptions of each scale, including lists of items, are available in the Supplemental Material. Many of the measures are on different scales, so all personality indicators were operationalized as Percentages Of the Maximum Possible (POMP) scores in the data synthesis procedure (Cohen et al., 1999), which allows for interpretation in relative percentiles.² To aid in convergence, we deviated from traditional POMP scoring and multiply the ratio by 10:

\begin{matrix} POMP = \frac{observed - \min}{\max - \min} \times 10 . \end{matrix}

(1)

Table 2.

Sample Characteristics and Sample-Level Characteristics.

Sample	Country (continent)	Prediction interval	Personality traits					Crystallized abilities
Sample	Country (continent)	Prediction interval	Measure	Scale	Domains	Median year (SD)	Baseline age	Measure(s)	Median year (SD)
BASE (Baltes et al., 1999; Baltes & Mayer, 1999)	Germany (Europe)	8.56 (3.51)	NEO-FFI	1–5	E, N, O	1990 (0)	78.23 (6.66)	Vocabulary spot a word	1997 (3.51)
EAS (Katz et al., 2016)	United States (North America)	2.15 (2.41)	IPIP NEO	1–5	E, A, C, N, O	2011 (3.18)	79.47 (5.36)	Boston Naming TestInformation	2014 (3.09)
GSOEP (Goebel et al., 2019)	Germany (Europe)	7.00 (0)	BFI-S	1–7	E, A, C, N, O	2005 (0)	49.83 (15.83)	Vocabulary	2012 (0)
HILDA (Wilkins et al., 2018, 2019)	Australia (Australia)	7.00 (0)	TDA-40	1–7	E, A, C, N, O	2005 (0)	44.51 (16.92)	Vocabulary	2012 (0)
HRS (Juster & Suzman, 1995)	United States (North America)	4.00 (0)	MIDI	1–4	E, A, C, N, O	2006/8 (0)	71.71 (6.97)	Vocabulary	2010 (0)
LASA (Hoogendijk et al., 2016; Huisman et al., 2011)	The Netherlands (Europe)	8.39 (9.45)	DPQ	1–3	N	1992 (1.19)	61.46 (15.71)	Vocabulary	1995 (9.09)
MAP (Bennett, Schneider, Buchman, et al., 2012)	United States (North America)	6.75 (4.53)	NEO-FFI	1–5	E, A, C, N		79.45 (7.32)	Boston Naming Test
MARS (Barnes et al., 2012)	United States (North America)	6.46 (3.86)	NEO-FFI	1–5	N, O		73.60 (2.66)	Boston Naming Test
OCTO-Twin (McClearn et al., 1997)	Sweden (Europe)	6.21 (2.82)	Eysenck	0–1	E, N	1991 (0)	82.99 (2.66)	Information	1997
ROS (Bennett, Schneider, Arvanitakis, & Wilson, 2012)	United States (North America)	9.53 (6.42)	NEO-FFI	1–5	E, A, C, N, O		75.87 (7.38)	Boston Naming Test
SATSA (Pedersen et al., 1991)	Sweden (Europe)	15.00 (0)	Eysenck	1–5	E, A, C, N, O	1984 (0)	54.77 (9.84)	Information	1999 (0)

Note. Prediction interval was calculated by taking each participant’s first personality measurement year from their last cognitive ability measurement year. Baseline age is the average participant age at their first personality assessment. E = Extraversion; A = Agreeableness; C = Conscientiousness; N = Neuroticism; O = Openness. NEO-FFI = 60-item NEO Five Factor Inventory (Costa & McCrae, 1992); IPIP NEO = International Item Pool in Personality NEO (Johnson, 2014); BFI-S = Big Five Inventory, Short Form (German; Hahn et al., 2012); TDA-40 = Trait Descriptive Adjectives-40 (Saucier, 1994); MIDI = The Midlife Development Inventory (Lachman & Weaver, 1997); DPQ = Dutch Personality Questionnaire (Barelds & Luteijn, 2002); Eysenck = Eysenck Personality Questionnaire (Eysenck & Eysenck, 1965).

Crystallized Abilities

We examined crystallized abilities as our primary outcome. Full information on the tests used for each of these measures for each sample is presented in Table 2, and full details on each test are included in the Supplemental Material. As with personality traits, many of the measures are on different scales, so all crystallized ability indicators will be operationalized as POMP, with higher scores indicating better performance.

Participant-Level Covariates and Moderators

In addition, we adjusted for three participant-level covariates – age, gender, and education.³ These covariates were included because each has a long-documented association with both crystallized ability and some personality domains. Age was defined as the participants’ age at their baseline personality assessment, centered at 60 years. Gender was dummy coded as 0 (men) and 1 (women). Finally, education was measured in years of education at baseline personality assessment, centered at 12 years of education. We examined how adjusting for each covariate separately as well as all covariates simultaneously, impacted the estimated personality trait–crystallized ability associations.

Data Preparation

Extensive details about the data preparation procedure can be found in the Supplemental Material and Open Practices. Descriptive statistics of all harmonized variables for each sample are presented in Table 3. Zero-order correlations among measures within samples are presented in the Supplemental Material and web app.

Table 3.

Descriptive Statistics of All Harmonized Measures Across Samples.

Sample	Personality characteristics					Crystallized/Knowledge	Age (years)	Education (years)	% Women	Valid n (range)
Sample	E	A	C	N	O	Crystallized/Knowledge	Age (years)	Education (years)	% Women	Valid n (range)
BASE	5.77 (1.64)	–	–	3.77 (1.99)	5.80 (1.88)	5.47 (2.50)	78.23 (6.66)	11.04 (2.79)	50.25	197–197
EAS	5.81 (1.56)	6.90 (1.67)	7.01 (1.60)	2.97 (1.65)	6.50 (1.64)	6.57 (2.47)	79.49 (5.35)	14.40 (3.36)	61.23	1,864–1,967
GSOEP	6.47 (1.87)	7.40 (1.74)	8.03 (1.76)	4.86 (2.10)	6.03 (2.05)	8.31 (1.12)	49.84 (15.84)	11.45 (2.46)	52.89	1,864–1,866
HILDA	5.73 (1.80)	7.32 (1.52)	6.83 (1.73)	3.01 (1.81)	5.41 (1.75)	5.77 (2.11)	44.53 (16.94)	13.49 (2.51)	54.24	7,744–7,759
HRS	7.43 (1.80)	8.46 (1.54)	7.88 (1.57)	3.32 (1.98)	6.44 (1.85)	5.57 (2.05)	71.72 (6.97)	12.52 (3.03)	59.41	8,392–8,432
LASA	–	–	–	2.03 (1.98)	–	6.43 (1.96)	61.46 (15.71)	9.84 (3.62)	51.61	2,335–2,335
MAP	5.75 (1.60)	–	6.61 (1.40)	3.36 (1.57)	–	8.67 (1.89)	79.64 (7.44)	14.97 (3.30)	25.60	1,219–1,826
MARS	–	–	–	3.23 (1.50)	–	7.98 (2.47)	73.60 (6.37)	14.98 (3.46)	23.20	681–681
OCTO-Twin	5.42 (2.36)	–	–	2.70 (2.36)	–	5.32 (3.05)	82.98 (2.66)	7.30 (2.41)	64.25	399–400
ROS	5.26 (1.63)	5.35 (1.37)	6.28 (1.38)	4.60 (1.62)	5.55 (1.31)	8.05 (2.46)	75.87 (7.38)	18.41 (3.34)	28.25	1,373–1,378
SATSA	5.03 (1.86)	5.67 (1.63)	5.34 (1.85)	4.80 (1.95)	5.04 (1.28)	7.26 (1.98)	54.58 (9.94)	10.63 (1.89)	59.34	470–518

Note. Age, education, and gender were assessed at the first baseline personality assessment. Valid N (range) indicates the range of valid observations with complete personality trait and outcome data across different trait measures. E = Extraversion; A = Agreeableness; C = Conscientiousness; N = Neuroticism; O = Openness.

Analysis Plan

To test whether personality predicts cognitive domain scores, we covered four levels of our taxonomy of data synthesis, some of which are broken down into sub-methods (1) One-Stage Pooled Analysis without Sample-Specific Effects, (2) One-Stage Pooled Analysis with Sample-Specific Effects, (3) Two-Stage, Separate Analysis with Meta-Analytic Pooling, and (4) Separate Analysis of Individual Participant Data Reported Together. As noted previously, depending on the research question, different approaches may be chosen. We opted for a basic regression framework, including random-effects meta-analysis, but a growing body of research demonstrates that our taxonomy is compatible with other meta-analytic frameworks, including MASEM with cluster corrected standard errors (Groot et al., 2025) and IPD (Groot et al., 2024).

For each of these, we will estimate all combinations of Big Five personality trait (5) × crystallized abilities (1) × participant-level moderator (3) × covariate (5; unadjusted, fully adjusted, partially adjusted for each covariate) combination. Below, we provide a brief overview of each Method. A more thorough description and sample analytic plan for each method can be found in the Supplemental Material. Notably, each of the techniques described are linear regression-based statistical models. As a result, they are bound both by the flexibility and assumptions of regression. As a result, we see as broad classes of approaches that can be fine-tuned to the specific nuances of a set of research questions and/or available data, much as regression can be flexible applied across a huge range of circumstances, including both observational and experimental research. Thus, when detailing the regression equations below, we use predictor and outcome as the core observed variables across studies, which could include a broad array of continuous and nominal variables. But the specific parameterization of the model will depend on the specific research questions under investigation.

Method 1: Pooled Analysis of IPD

Method 1 is a fully pooled procedure where a single estimate of a prospective Big Five personality characteristic–crystallized ability association is estimated across samples, and no sample-specific estimates are estimated. In other words, in this method, data from all samples are combined and associations are estimated in a single model, ignoring sample membership.

Method 1A: Pooled Simple Linear Regression

The basic, unadjusted form of the model is as follows:

\begin{matrix} o u t c o m e_{i j} = b_{0} + b_{1} * p r e d i c t o r_{i j} + ϵ_{i j} \\ ϵ_{i j} ~ N (0, σ^{2}) \end{matrix}

(2)

where $o u t c o m e_{i j}$ represents the observed crystallized cognitive test score for individual i in study j, $b_{0}$ is grand-mean crystallized scores across all sample, $b_{1}$ represents the overall effect of personality predicting the outcome, $ϵ_{i j}$ is the residual for individual i in study j, and $σ^{2}$ is the residual variance. Models were estimated using the lm() function.

Method 1B: Pooled Linear Regression With Cluster Robust Standard Errors

Method 1B estimates only an overall effect of personality on cognition by including all data in a single regression model, with cluster robust standard errors (Gaure, 2013; J. E. Pustejovsky & Tipton, 2018).⁴ Correcting for dependencies without explicitly modeling cross-sample heterogeneity is sometimes called nuisance clustering (Fitzmaurice & Laird, 1995). The basic form of the model is the same as Equation 2, with sample as a cluster, where $b_{1}$ represents the overall effect of personality predicting the outcome. Models were estimated using the lm() function. Cluster corrected standard errors were estimated using the vcovCR() function in the clubSandwich package in R (version 0.5.11; J. Pustejovsky, 2024), with cluster=study and se_type=“CR2.” Because we had a small number of clusters (11), we additionally conducted coefficient tests using Satterthwaite t-tests using the conf_int() function for coefficients and linear_contrast() for model predictions. This approach has been shown to perform well with a small number of clusters relative to other available techniques (Leyrat et al., 2018).

Method 2: Pooled Analysis of IPD Using Contrasts or Random Effects

Method 2 is also a fully pooled procedure that provides a single estimate of associations across samples. Unlike Method 1, Method 2 provides sample-specific estimates of the association.

Method 2A: Interactions

Method 2A estimates both an overall effect as well as sample-specific estimates using effects coded contrasts and interaction terms. The basic form of the model is as follows:

\begin{matrix} o u t c o m e_{i j} & = b_{0} + b_{1} * p r e d i c t o r_{i j} \\ + b 2 * s t u d y 1_{i j} + \dots + b_{k} * s t u d y k_{i j} \\ + b_{k + 1} * p r e d i c t o r_{i j} * s t u d y 1_{i j} + \dots \\ + b_{2 * k} * p r e d i c t o r_{i j} * s t u d y k_{i j} + ϵ_{i j} \\ ϵ_{i j} ~ N (0, σ^{2}), \end{matrix}

(3)

where k indicates the number of samples −1. Study is effects coded, which results in a term that captures the overall estimated effect and k − 1 terms capturing sample-specific deviations from the overall estimate (all sample estimates can then be recovered via linear combinations). This is the same approach utilized in analysis of variance. Of interest are two key sets of terms. $b_{1}$ indicates the average personality–cognition relationship, and $b_{k + 1}$ to $b_{2 k}$ represent effect coded sample-specific differences in outcome associations (i.e., the estimate for a sample is $b_{1} + b_{2 k}$ ). All other terms capture sample-specific differences in overall cognitive ability levels, if any. Models were tested using the base R lm() function in R. For sample-specific effects, effects coded linear combinations for each sample were provided to the glht() function from the multcomp package (version 1.4.20; Hothorn et al., 2008).

Method 2B: Random Effects

Method 2B uses multilevel models (Raudenbush, 2002) in which participants are nested within studies to decompose the variance into different sources. Unlike the regression techniques above that rely on ordinary least squares estimation, MLM uses restricted maximum likelihood estimation or maximum likelihood estimation. The simple model of a basic predictor–outcome association is as follows:

Level 1

\begin{matrix} o u t c o m e_{i j} = β_{0 J} + β_{1 J} * p r e d i c t o r_{i j} + ϵ_{i j} \end{matrix}

(4)

\begin{matrix} ϵ_{i j} ~ N (0, σ^{2}) \end{matrix}

Level 2

\begin{matrix} β_{0 J} = γ_{00} + u_{0 j} \\ β_{1 J} = γ_{10} + u_{1 j} \end{matrix}

\begin{matrix} u_{0 j} \\ u_{1 j} \end{matrix} ~ N (\begin{matrix} τ_{00}^{2} & τ_{01} \\ τ_{10} & τ_{11}^{2} \end{matrix})

where the sample-specific prospective associations are captured by $β_{1 j}$ , j indicates each sample 1 to j, and $u_{1 j}$ captures the sample specific deviation from the overall estimate of the personality–cognition relationship $γ_{10}$ . Unlike Method 2A with a single residual, the multilevel models in 2B decompose the residual variance into different sources, in this case, participant-level residual variance ( $σ^{2}$ ) and sample-level residual variance ( $τ_{00}^{2}$ and $τ_{11}^{2}$ ). The models also use partial pooling, which pools information across samples and shrinks sample-level estimates toward the fixed effect.⁵ Heterogeneity estimates are captured in τ matrix, where $τ_{00}^{2}$ captures the variance in the random intercepts (i.e., variance in average levels of cognitive ability across samples), $τ_{11}^{2}$ captures the variance in random slope (i.e., variance in the personality–cognitive ability association across samples), and $τ_{01}$ captures the correlation between average levels of cognitive ability and personality–cognitive ability associations across samples (e.g., is the association stronger in samples with higher average crystallized ability). Models were tested using the lme4 package in R.

Method 3: Separate Analyses Followed by Random-Effects Meta-Analysis

Method 3 estimated both a prospective overall personality–cognitive ability association as well as sample-specific estimates. However, unlike in Method 2, sample-specific estimates will not be estimated simultaneously with overall estimates. The procedure for this method is as follows:

Step 1: Sample-Specific Statistical Modeling

Models were run separately for each sample, personality trait, outcome, covariate, and moderator combination. The basic form of the model is as follows, where models are separately estimated for each sample, j:

\begin{array}{l} o u t c o m e_{i j} = b_{0 j} + b_{1 j} * p r e d i c t o r_{i j} + ϵ_{i j} \\ \begin{matrix} ϵ_{i j} ~ N (0, σ_{j}^{2}) \end{matrix} \end{array}

(5)

Step 2: Results Pooling Using Meta-Analysis

Once all the models were run, we next combined the effects across samples via meta-analysis. We opted to use random-effects meta-analysis because its assumptions surrounding cross-study variance are often more aligned with research goals in psychology, but other forms of meta-analysis can be readily applied, when appropriate. To do so, we constructed three helper functions to (1) set up and run the meta-analytic models, (2) extract the meta-analytic estimates, and (3) extract heterogeneity estimates. The meta-analytic model is as follows:

\begin{matrix} T_{j} = μ + ζ_{j} + ϵ_{j} \\ ϵ_{j} ~ N (0, σ^{2}) \\ ζ_{j} ~ N (0, τ^{2}) \\ \begin{matrix} C o v (ζ_{j}, ϵ_{j}) = 0 \end{matrix} \end{matrix}

(6)

where $T_{j}$ is the sample-specific effect of sample j (i.e., b_1j in Equation 5), μ is the overall meta-analytic estimate, $ζ_{j}$ is true sample variability of sample i from the overall estimate, and $ϵ_{j}$ is sampling error. Random effects meta-analyses were estimated using the metafor package in R (version 4.8-0, Viechtbauer, 2010).

Method 4: Separate Analyses Reported Together

For Method 4, we report only the separately estimated sample-specific effects, with no aggregation across samples. Thus, the models for Method 4 were identical to those estimated in the first part of Method 3 with no subsequent meta-analysis.

Results

As detailed above, each method differs in what and how overall, sample-specific, and heterogeneity estimates are estimated. Below, we summarize overall and sample-specific findings across methods. Detailed sample results sections for each method can be found in the Supplemental Material.

Overall Estimates

Table 4 presents the fully adjusted (age, gender, and education) estimates of the key terms from all unmoderated and participant-level moderators of overall estimates across all samples and methods. Across all methods, the most consistent associations were between neuroticism (−) and openness (+) with later crystallized abilities, which appeared in all methods except Method 1B. Less consistent were associations between extraversion (−), agreeableness (−), and conscientiousness (+), which only appeared in Method 1A. Notably, despite applying Satterthwaite t-tests, which are most appropriate for relatively small numbers of samples (k <20), Method 1B provided much more conservative estimates than all other methods by a large margin. In addition, Methods 1A and 1B both showed markedly different associations from Methods 2A, 2B, and 3, which likely reflects the ecological bias in meta-analysis.

Table 4.

Cross-Method Comparison of Prospective Overall Effects and Participant-Level Moderators of Personality-Crystallized Domain Associations.

Term	E	A	C	N	O
Term	b [CI]	b [CI]	b [CI]	b [CI]	b [CI]
1A: Pooled analysis of individual participant data
Personality	−0.03 [−0.05, −0.02]	−0.05 [−0.06, −0.03]	0.03 [0.02, 0.05]	−0.03 [−0.04, −0.01]	−0.03 [−0.05, −0.02]
Age	−0.000 [−0.002, 0.001]	−0.001 [−0.003, 0.000]	0.001 [−0.001, 0.002]	0.002 [0.001, 0.003]	−0.000 [−0.002, 0.001]
Gender (men vs. women)	0.05 [0.02, 0.08]	0.03 [−0.01, 0.06]	0.01 [−0.02, 0.05]	−0.01 [−0.04, 0.01]	0.05 [0.02, 0.08]
Education (years)	0.007 [0.002, 0.01]	−0.001 [−0.007, 0.005]	−0.007 [−0.01, −0.001]	−0.002 [−0.006, 0.001]	0.007 [0.002, 0.01]
1B: Pooled analysis of individual participant data with cluster corrected errors
Personality	−0.07 [−0.22, 0.08]	−0.07 [−0.23, 0.08]	−0.01 [−0.14, 0.12]	−0.03 [−0.13, 0.08]	−0.07 [−0.22, 0.08]
Age	0.000 [−0.004, 0.005]	−0.000 [−0.004, 0.004]	−0.001 [−0.004, 0.003]	0.002 [−0.001, 0.005]	0.000 [−0.004, 0.005]
Gender (men vs. women)	0.05 [0.001, 0.10]	0.04 [−0.002, 0.08]	0.03 [−0.07, 0.13]	−0.01 [−0.08, 0.05]	0.05 [0.001, 0.10]
Education (years)	0.005 [−0.008, 0.02]	−0.003 [−0.03, 0.02]	−0.006 [−0.03, 0.02]	−0.002 [−0.02, 0.01]	0.005 [−0.008, 0.02]
2A: Pooled analysis of individual participant data using contrasts
Personality	0.02 [−0.007, 0.05]	−0.007 [−0.04, 0.02]	0.010 [−0.02, 0.04]	−0.07 [−0.09, −0.05]	0.17 [0.14, 0.21]
Age	−0.003 [−0.008, 0.002]	−0.000 [−0.004, 0.003]	−0.003 [−0.007, 0.001]	0.004 [−0.000, 0.008]	0.000 [−0.004, 0.005]
Gender (men vs. women)	0.06 [0.001, 0.12]	0.02 [−0.04, 0.08]	−0.004 [−0.06, 0.05]	−0.04 [−0.09, 0.01]	−0.008 [−0.08, 0.06]
Education (years)	−0.002 [−0.01, 0.009]	−0.003 [−0.02, 0.010]	0.01 [−0.001, 0.02]	−0.003 [−0.01, 0.006]	−0.004 [−0.02, 0.01]
2B: Pooled analysis of individual participant data using random effects
Personality	0.01 [−0.02, 0.05]	0.003 [−0.03, 0.03]	0.01 [−0.04, 0.07]	−0.06 [−0.09, −0.03]	0.16 [0.07, 0.25]
Age	0.001 [−0.03, 0.03]	0.000 [−0.09, 0.09]	−0.001 [−0.008, 0.006]	0.003 [−0.004, 0.009]	0.001 [−0.09, 0.09]
Gender (men vs. women)	0.05 [0.009, 0.09]	0.02 [−0.02, 0.06]	0.01 [−0.03, 0.06]	−0.02 [−0.05, 0.01]	−0.000 [−0.06, 0.06]
Education (years)	−0.006 [−0.07, 0.06]	−0.006 [−0.02, 0.006]	0.007 [−0.02, 0.04]	0.002 [−0.006, 0.010]	−0.01 [−0.12, 0.09]
3. Separate analyses followed by random-effects meta-analysis
Personality	0.01 [−0.02, 0.05]	0.02 [−0.001, 0.03]	0.02 [−0.03, 0.07]	−0.07 [−0.10, −0.04]	0.17 [0.10, 0.25]
Age	−0.000 [−0.003, 0.002]	−0.001 [−0.003, 0.002]	−0.000 [−0.005, 0.004]	0.000 [−0.004, 0.005]	0.001 [−0.006, 0.007]
Gender (men vs. women)	0.04 [−0.004, 0.09]	0.02 [−0.02, 0.05]	0.005 [−0.03, 0.04]	−0.000 [−0.03, 0.03]	−0.04 [−0.07, −0.01]
Education (years)	−0.004 [−0.02, 0.008]	−0.008 [−0.02, 0.008]	0.003 [−0.01, 0.02]	−0.001 [−0.01, 0.008]	−0.007 [−0.02, 0.005]

Note. All terms except Personality indicate interaction with personality trait levels (e.g., Age = Age x Personality). Bold indicates 95% confidence intervals that did not overlap with 0. All parameter estimates were extracted from separate models. E = Extraversion, A = Agreeableness, C = Conscientiousness, N = Neuroticism, O = Openness to Experience.

Sample-Specific Estimates

Some methods (Methods 2A–B, 3, and 4) also estimate sample-specific estimates, which are shown in Figure 2. As can be seen in the forest plots, openness and neuroticism showed the most consistent associations with crystallized abilities. Openness was positively associated with crystallized abilities in five of the seven samples in Methods 2A and 2B (all except ROS or GSOEP) or six of the seven samples (all except ROS) in Method 3. Neuroticism was negatively associated with crystallized abilities in 5 of 11 samples in Methods 2A and 3 and 7 of 11 samples in Method 2B. In a few cases, a relatively large number of sample-specific estimates were significant, despite non-significant meta-analytic associations. For example, although there was no overall association between conscientiousness and crystallized abilities, SATSA showed a significant negative association, and ROS and HRS showed positive associations across all three methods.

Figure 2.

Forest plot of fully adjusted prospective associations between Big Five personality traits and crystallized abilities across samples for using one-stage pooled integrative data analysis via effects coded contrasts (Method 2A). Overall point estimates (squares) represent the grand-mean estimates of the association across samples, while sample point estimates represent regression terms or linear combinations of regression terms. Error bars capture the 95% CI around the point estimate. Arrows indicate the confidence band was truncated to better visualize the estimates.

Person-Level Moderators

Next, we examined whether there were person-level moderators of the association between the Big Five and Crystallized abilities (see Table 4). First, we examined the overall associations across all studies. Most consistently, we observed that gender moderated the association between Extraversion and crystallized abilities across samples for all methods except Method 2B (see Figure 2). Even so, all point and interval estimates were within two-hundredths of estimates from both Methods 2A and 3. In Method 1A, the overall prospective association between personality–crystallized ability association was negative for men (b = −0.06, 95% CI [−0.08, −0.04]) and null for women (b = −0.01, 95% CI [−0.03, 0.006]). In Method 1B, the overall prospective association between personality–crystallized ability association was null for both men (b = −0.09, 95% CI [−0.24, −0.05]) and women (b = −0.05, 95% CI [−0.17, 0.08]). In Method 2A, the overall association between personality traits and later crystallized ability was null for extraversion for men (b =−0.01, 95% CI [−0.06, 0.03]) but positive for women (b = 0.05, 95% CI [0.004, 0.09]), such that women who were higher in extraversion tended to score higher on crystallized domain tasks. In Method 2B, the interaction indicates that the association was null for both men (b = −0.02, 95% CI [−0.04, 0.01]) and women (b = 0.03, 95% CI [−0.01, 0.08]), and the associations did differ from one another.

We also examined sample-specific estimates of moderators of personality–cognition association. Given that the most consistent effect was gender moderating the association between extraversion and crystallized abilities, Figure 3 shows a forest plot of the sample-specific gender moderation associations across studies for each of the Big Five, and Figure 4 illustrates a simple effects plots of the over-all and sample-specific estimates across methods. As is clear in both figures, estimates are much more similar for Methods 2A and 3 than for Method 2B, with near exact patterns of sample-specific associations. In Method 2B, however, shrinkage due to partial pooling is apparent is the greater consistency of estimates across samples evidenced by fewer crossing lines.

Figure 3.

Forest plot of fully adjusted gender moderation of the prospective associations between Big Five personality traits and crystallized abilities across samples for using one-stage pooled integrative data analysis via effects coded contrasts (Method 2A). Gender was coded as 0 = men and 1 = women, so point estimates represent the difference in association between women and men (positive = women had higher absolute magnitude; negative; men had higher absolute magnitude). Overall point estimates (squares) represent the grand-mean estimates of the association across samples, while sample point estimates represent regression terms or linear combinations of regression terms. Error bars capture the 95% CI around the point estimate. Arrows indicate the confidence band was truncated to better visualize the estimates.

Figure 4.

Prospective sample-specific and overall associations between extraversion (in POMP units, 0–10) and crystallized abilities (in POMP units, 0–10) across genders (men, women) for Methods 2A, 2B, and 3. Different colors and line types indicate different samples. Thicker, black lines indicate the average association across samples, while thinner lines indicate sample-specific associations.

Learning More

For brevity, above, we briefly summarized main findings across methods for the main effect of Big Five personality traits predicting later crystallized abilities. In the Supplementary Material, we provide sample results sections for each method separately and sample-level moderators (i.e., meta-regressions).

Discussion

The present article differentiated and demonstrated techniques for synthesizing IPD across samples. First, we outlined methods of data synthesis, which we organized into five broad categories of data synthesis (see Table 1), ranging from fully pooled regression models that do not include (Method 1A) or correct for cross-sample heterogeneity (Method 1B) to fully separate regression models that does not include pooling of effect sizes across samples (Method 4) or traditional meta-analysis (Method 5). Second, we demonstrated how to carry out each form of IPD meta-analysis (Methods 1–4) using the R programming language. Third, we examined prospective associations between Big Five personality traits and crystallized abilities. In alignment with previous investigations that compared different combinations of methods used in our taxonomy (Burke et al., 2017; Debray et al., 2013; Legha et al., 2018), we found that most results replicated across methods, particularly for associations between personality and cognitive ability, with some exceptions that we discuss in more detail below. Notably, four of the six modeling techniques yielded convergent results, specifically pooled analysis using contrasts (2A) and random effects (2B), as well as both CDA techniques (with [3] and without [4] random-effects meta-analysis). The two techniques that did not yield results consistent with the others were the standard pooled analysis (1A), which yielded all significant effects across traits, and pooled analysis with cluster-corrected errors (1B), which yielded all null effects across traits (see below for further discussion of this pattern). Fourth, consistent with prior investigations, we found robust prospective associations between both openness and neuroticism and crystallized abilities. For openness, this association was present in 5 (71.4%) to 6 (85.7%) of the 7 samples across methods, while for neuroticism this association was present for 5 (45.5%) to 7 (63.6%) of the 11 samples across methods. These results are also mostly consistent with the recent traditional meta-analysis which found associations for neuroticism and openness (Anglim et al., 2022). Below, we discuss each of these methods in turn and provide recommendations for best practices in data synthesis.

Choosing a Method of Data Synthesis

In the present study, we applied data synthesis parameterizations that spanned four of the five levels of our taxonomy of methods for synthesizing data. These methods differ with respect to the types of data used (i.e., aggregated vs. individual participant), the number of models estimated, the type of models estimated, whether sample-specific and/or overall estimates are included, and the degree of harmonization of sample variables necessary. Based on the present study and our review of the literature on individual participant meta-analysis, we recommend that investigators answer the following questions to guide their decisions about which method is most appropriate for the research questions they will be addressing (Table 5).

Table 5.

Decision Points and Recommendations for Conducting Data Synthesis.

No.	Considerations:	Recommendation if Yes	Recommendation if No
1.	Are individual participant data available?	Method 2A,B	Method 5
		Method 3
		Method 4
2.	Are variables harmonizable?	Method 2A,B	Method 5
		Method 3
		Method 4
3.	Is your model complex and your samples differ in concrete ways?^a	Method 3	Method 2A,B
		Method 4
		Method 5
4.	Are you interested in overall, cross-sample estimates?	Method 2A,B	Method 4
		Method 3
		Method 4
5.	Are you interested in sample-specific effects?	Method 2A,B	Focus on fixed-effects of Method 2A,B
		Method 3
		Method 4
6.	Are you interested in sample-level moderators – OR – in predicting variability/error?	Method 2B	Method 2A
		Method 3
		Method 5
7.	Do you care about distributions of sample-specific effects?	Method 2B	Method 2A
		Method 3
		Method 5
8.	Do you have a complex error structure?	Method 2B	Method 2A
			Method 3
			Method 4
			Method 5
9.	Do you think that all samples are drawn from different “true” populations, such that extremes reflect only true variance?	Method 2A	Method 2B
		Method 3
		Method 4
		Method 5

Note. Method 1A = Pooled analysis of IPD; Method 1B = Pooled analysis of IPD with cluster-corrected errors; Method 2A = Pooled analysis of IPD using contrasts; Method 2B = Pooled analysis of IPD using random effects; Method 3 = Separate analyses of IPD followed by meta-analysis; Method 4 = Separate analyses of IPD Reported Together (but not pooled); Method 5 = Traditional Meta-Analyses.

In the presence of complex models, we suggest applying meta-analytic structural equation modeling.

Below, we discuss each question above and their implications in detail. First, however, we want to make a small number of prescriptive recommendations that apply more broadly across cases. First, we do not recommend the use of Method 1A or Method 1B or any model that treats variance across samples as “nuisance” variance. These models appear to show biased parameter estimates, which may be particularly pronounced that even samples collected from the same population at different times may not be fully exchangeable due to timing effects (e.g., seasonal sleep-wake difference, global events, etc.). Importantly, this is true not only in the basic regression context (Cameron et al., 2024) but also in the context of MASEM (Groot et al., 2025). Second, in most circumstances, we suggest using Method 2B (one-stage approach) or Method 3A or 3B (two-stage approach with random-effects meta-analysis or MASEM), each of which show comparable results, allow for sample-level and overall-estimates, and allow for sample-level moderators.

A first and basic question (#1 in Table 5) is whether IPD are available either directly to the researchers or through an external analyst. If the answer to both is no, then Methods 1 to 4, which were the focus of the present study, would not be possible to estimate fully. The only option available would be to conduct a traditional meta-analysis in which effect sizes of interest are extracted from the published and unpublished literature (where possible). MASEM with correlation or covariance matrices may also be appropriate if the model allows and the matrices are available via publications or data maintainers. If IPD are directly available to the researcher, then pooled analysis is possible (Method 1 or 2), although data harmonization considerations need to be met prior to selecting one of these methods (see below). If IPD is not directly available to the research, and an external analyst is necessary, then pooled analysis is not possible, and one of the CDA approaches (Methods 3 and 4) are most appropriate.

Second, researchers need to next ask the extent to which the core variables are either logically or analytically harmonizable (Cole et al., 2023; #2 in Table 5). Harmonization can occur either before (ex ante) or after (ex post) data are collected (Dubrow & Tomescu-Dubrow, 2016). Ex post harmonization is particularly challenging given the necessity of harmonizing across diverse datasets (Cole et al., 2023; Fortier et al., 2017). If ex post harmonization is not possible, then the data synthesis methods outlined in the present study are likely not appropriate and Method 4 or 5 should typically be used. In the present study, we utilized logical ex post harmonization, wherein we identified studies that used validated Big Five inventories and crystallized intelligence measures whose convergence was established in prior work (Langa et al., 2020; Soto & John, 2017) and linearly transformed them onto the same scale using POMP-scoring. But harmonization is a non-trivial issue that warrants particular consideration. While those considerations are beyond the scope of the current study, excellent overviews and tutorials exist (Cheng et al., 2024; Cole et al., 2023; Dubrow & Tomescu-Dubrow, 2016).

Third, if harmonizable IPD are available, researchers should decide whether they want to estimate meta-analytic estimates (#4 in Table 5). This decision depends both on the complexity of the models researchers desire to use (#3 in Table 5) and the specific effect sizes that are desirable to synthesize. While our taxonomy is broadly applicable to many types of models and effect sizes, single-stage models, like Methods 1 and 2, require that synthesized terms are parameterized as model-based coefficients (e.g., regression coefficients, mean differences, etc.). Thus, synthesis of effect sizes, such as zero-order correlations, that cannot be directly estimated in multivariable or multilevel models would have to be calculated in a secondary step via common effect size conversions. If that is not desirable, then a typical two-stage meta-analytic approach (i.e., Method 3) is likely most appropriate. Moreover, the authors of the present study frequently find that research questions that rely on more sophisticated models may not be appropriate to meta-analytically combine due to necessities in parameterizing models to get them to converge, and the smaller number of studies meeting inclusion criteria for more complex models leading to lower study-level power for meta-analysis (Graham et al., 2017; Neupert et al., 2024). Rather, in these cases, we advocate for clearly reporting differences in how models were estimated across samples and making descriptive comparisons in effect sizes, where appropriate. Alternatively, if correlation or covariance matrices are available, MASEM is likely an optimal approach. However, particularly for complex models with within-person effects and complex random-effects structures typically handled via multilevel models, estimating an MASEM model is a non-trivial issue, even when using multilevel structural equation modeling (SEM).

Fourth, researchers should ask whether they are interested in separately parameterizing estimates for each sample (#5 in Table 5). If not, fully pooled regression (Method 1A) or pooled regression with cluster-corrected errors (Method 1B) may be sufficient to answer questions of interest. However, we generally do not recommend this for the following reasons. First, Method 1B had much larger standard errors than all other methods. Likely, this is due to a commonly observed issue within cluster-corrected regression (Leyrat et al., 2018) in which small numbers of or imbalanced clusters can result in over-rejection of the null. Given that for many pooled analyses within psychology (e.g., internal meta-analysis, Goh et al., 2016) will have similar or even fewer numbers of clusters than those used in the present study, such over-rejection of the null is likely to be a common issue. Method 1A had narrower standard errors than other methods, which likely overestimates certainty by failing to account nested study information. Much like the ecological bias in longitudinal data can lead to biased estimates by confounding within- and between-person information, failing to account for sample-level information can lead to ecological bias in meta-analysis by confounding person- and sample-level information (Berlin et al., 2002; Hua et al., 2017). Second, results from the 11 individual samples varied in the magnitude, direction, and significance of associations (see Supplemental Material). Therefore, we recommend methods that parameterize sample-specific estimates, even if these are not focal.

When estimating sample-specific associations, researchers are faced with choosing between one- and two-stage approaches. Consistent with previous investigations of one- and two-stage IPD-MA, estimates from these methods largely converged (Burke et al., 2017; Debray et al., 2013; Legha et al., 2018). Current recommendations suggest that one-stage approaches are preferred when binary indicators are unbalanced, continuous indicators are non-normally distributed, or sample sizes are small (Burke et al., 2017; Cooper & Patall, 2009; Debray et al., 2013). One-stage approaches allow both focal and adjustment terms to be jointly estimated and for their residuals to be correlated, which is often desirable when studying predictor–outcome relationships (Burke et al., 2017; Cooper & Patall, 2009). Method 2A, which uses contrasts to estimate sample-specific effects can be useful when (1) correlations among random effects are not of interest; (2) samples are not thought to be drawn from a larger population of samples; or (3) the number of samples is small (these models have fewer parameters than the multilevel models used in Method 2B; Curran & Hussong, 2009; Legha et al., 2018; Riley et al., 2020). But in some cases, interest in variances of and correlations among random effects (or predicting that variance via sample-level moderators/meta-regression) as well as the desire to include sample-level moderators/meta-regression may necessitate the use of Method 2B (see #8 in Table 5). Method 2B uses shrinkage via partial pooling in multilevel models (Method 2B), which can reduce overall error by shrinking outliers toward the average. However, when outliers are not errors, this could lead to biased parameter estimates of those samples (see #9 in Table 5).

Finally, we encourage researchers to ask whether they are interested in sample-level moderators or in predicting variability (e.g., tau) or error (e.g., sigma; see #6 in Table 5). If these are not of interest, then Method 2A, which uses effect codes, should provide convergent evidence with other methods with a much simpler parameterization (i.e., variances and covariances among sample-specific estimates are not estimated). If sample-level moderators or the prediction of variability/error are of interest, then researchers can implement Method 2B or 3. In the present study, we highlight sample-level moderators as an opportunity, but not something we emphasized because the number of studies we synthesized left us underpowered to conduct robust tests. Yet, we believe they are a key discussion and cannot overemphasize the care that should be taken when conducting such tests. When the number of samples is relatively small (e.g., <10) and there is either little variability in continuous sample-level moderators or a large number of categories in categorical sample-level moderators, power will tend to be particularly low and the likelihood of Type I errors high. Moreover, sample-level moderators may be particularly sensitive to differences in how sample-level estimates are parameterized. Because sample-level moderators are essentially meta-regressions in which heterogeneity in effect sizes are predicted by sample-level characteristics, moderator estimates depend on how such heterogeneity is estimated. For example, when conducting these moderator tests in Method 2B, there are likely to be differences in the estimates due to shrinkage (from partial pooling). As an alternative, we recommend researchers focus on descriptive comparisons of effect sizes across samples rather than relying on statistical inferences when the number of samples is small.

Lastly, given the choice to use a meta-analytic model, which meta-analytic model to use is non-trivial that span from basic choices between fixed- and random-effects meta-analysis (#9 in Table 5) to more complex considerations, including applying corrections for validity and reliability attenuated correlations (Schmidt & Hunter, 1998). An ever-growing number of such models exists, and recommendations for appropriate models will change accordingly. But broadly, we recommend basic random-effects meta-analysis for common univariate questions, including heterogeneous variances, and one-stage MASEM for more complex multivariate questions, including measurement models, and measurement invariance testing. Many of the same considerations that govern the choice between multilevel models and SEM (e.g., random-effects structures, measurement models) also dictate the choice between these meta-analytic choices.

In sum, there is no single answer to the question of which method of data synthesis is best. While our general recommendations are that Methods 2B and 3 should typically be used when addressing questions of overall estimates, sample-level moderators, and predictors of variability/error, researchers should carefully weigh the promises and pitfalls of different methods with reference to the data they aim to synthesize and choose accordingly.

Limitations and Future Directions

The present study aimed to delineate and demonstrate methods for synthesizing IPD. To do so, we used data from 11 samples to investigate prospective associations between the Big Five personality traits and crystallized abilities using six parameterizations across four categories of our data synthesis taxonomy.

Despite its strengths (multi-sample, multi-national, sample-size, multi-method, etc.), the present study has a number of limitations and lingering questions. First, we opted for an empirical demonstration of these methods in order to highlight real challenges in documenting, harmonizing, and analyzing data across samples that differ in many ways that cannot answer lingering methodological questions that could be answered by simulations (e.g., sample size, effect size, etc.; c.f. Riley et al., 2020). Second, each of the 11 samples were all drawn from Western, democratic, educated, and predominantly white samples, which limits generalizability. Third, we focused on a specific demonstration of considerations using a regression-based framework. While we believe this framework covers most analytic approaches and effect sizes within psychology, there a number of more nuanced considerations for specific analytic methods that are beyond the scope of this study. Fourth, as noted previously, we used a relatively small number of samples (11) in our IPD approaches to data synthesis, which left us underpowered for most sample-level moderator tests. Fifth, in the present study, we relied on logical harmonization of both personality traits and crystallized abilities. That coupled with too few studies for meta-regression left us unable to answer whether sample-level variation may have been methodological.

Conclusion

The current study provides a demonstration of how to conduct data synthesis across four levels of our newly presented taxonomy of methods to synthesize data via IDA. We delineated differences across these methods and made concrete recommendations both when each is appropriate and which are more reliable under different circumstances. We also estimated prospective associations between Big Five personality traits and crystallized ability across one of the largest investigations of these associations to date. We found reliable prospective associations between both Openness to Experience and Neuroticism and crystallized abilities across samples. These associations held even when adjusting for covariates and few participant-level moderators of those associations. Openness and Neuroticism are uniquely associated with crystallized abilities and most methods of data synthesis provided convergent evidence of this.

Supplemental Material

sj-docx-1-psp-10.1177_01461672261451901 – Supplemental material for ArticleA Taxonomy of Data Synthesis

Supplemental material, sj-docx-1-psp-10.1177_01461672261451901 for ArticleA Taxonomy of Data Synthesis by Emorie D. Beck, Emily C. Willroth, Julia A. M. Delius, David A. Bennett, Lisa L. Barnes, Bryan D. James, Richard B. Lipton, Mindy Katz, Linda B. Hassing, Martijn Huisman, Daniel K. Mroczek and Eileen K. Graham in Personality and Social Psychology Bulletin

Footnotes

ORCID iDs

Emorie D. Beck

Emily C. Willroth

Julia A. M. Delius

Author Contributions

E.D.B., E.C.W., D.K.M., and E.K.G. conceptualized the idea, co-wrote the Introduction, and edited the manuscript. E.D.B. wrote the preregistration, ran the analyses, wrote the Method, Results, and Discussion, and prepared all Supplemental Material. D.A.B., L.L.B., R.B.L., M.K., M.H., J.A.M.D., and L.B.H. collected the data and prepared the data. All authors participated in editing and finalizing the manuscript.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Emorie Beck, Emily Willroth, Daniel Mroczek, and Eileen Graham were supported by National Institute on Aging Grants R01-AG067622, R01-AG082954, and R01-AG018436. Emily Willroth was additionally supported by National Institute on Aging Grant K99/R00AG071838. Richard B. Lipton and Mindy Katz supported by: NIH/NIA P01 AG03949, the Czap Foundation and the Leonard and Sylvia Marx Foundation. Julia A. M. Delius was supported by the Max Planck Society. The Berlin Aging Study (BASE; ) was initiated by the late Paul B. Baltes in collaboration with Hanfried Helmchen, psychiatry; Elisabeth Steinhagen-Thiessen, internal medicine and geriatrics; and Karl Ulrich Mayer, sociology. Financial support for BASE was provided by the Max Planck Society; Freie Universität Berlin; the German Federal Ministry for Research and Technology (1989–1991, 13 TA 011 & 13 TA 011/A); the German Federal Ministry for Family, Senior Citizens, Women, and Youth (1992–1998, 314-1722-102/9 & 314-1722-102/9a); and the Berlin-Brandenburg Academy of Sciences Research Group on Aging and Societal Development (1994–1999). The OCTO-Twin Study was supported by a grant from the National Institute of Aging (NIA: AG 08861) of the National Institutes of Health.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

All code, results, figures, and tables are available online on GitHub (https://github.com/emoriebeck/data-synthesis-tutorial) and the OSF (https://osf.io/zut7b/). Code can be explored via a rendered bookdown webpage (https://emoriebeck.github.io/data-synthesis-tutorial/index.html) and results can be explored via an R Shiny Webapp ().

Open Practices

This study was preregistered on the Open Science Framework (OSF; ).

Supplemental Material

Supplemental material is available online with this article.

Notes

References

Anglim

Dunlop

P. D.

Wee

Horwood

Wood

J. K.

Marty

(2022). Personality and intelligence: A meta-analysis. Psychological Bulletin, 148(5–6), 301–336. https://doi.org/10.1037/bul0000373

Baltes

P. B.

Mayer

K. U.

(Eds.). (1999). The Berlin Aging Study: Aging from 70 to 100. The Berlin aging study: Aging from 70 to 100, xii, 552–xii, 552.

Baltes

P. B.

Mayer

K. U.

Helmchen

Steinhagen-Thiessen

(1999). The Berlin Aging Study (BASE): Sample, design, and overview of measures. In The Berlin aging study: Aging from 70 to 100 (pp. 15–55). Cambridge University Press.

Barelds

D. P.

Luteijn

(2002). Measuring personality: A comparison of three personality questionnaires in the Netherlands. Personality and Individual Differences, 33(4), 499–510.

Barnes

L. L.

Shah

R. C.

Aggarwal

N. T.

Bennett

D. A.

Schneider

J. A.

(2012). The Minority Aging Research Study: Ongoing efforts to obtain brain donation in African Americans without dementia. Current Alzheimer Research, 9(6), 734–745. https://doi.org/10.2174/156720512801322627

Barrick

M. R.

Mount

M. K.

(1991). The big five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1), 1–26. https://doi.org/10.1111/j.1744-6570.1991.tb00688.x

Bates

Mächler

Bolker

Walker

(2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01

Batty

G. D.

Shipley

M. J.

Kvaavik

Russ

Hamer

Stamatakis

Kivimaki

(2018). Biomarker assessment of tobacco smoking exposure and risk of dementia death: Pooling of individual participant data from 14 cohort studies. Journal of Epidemiology and Community Health, 72(6), 513. https://doi.org/10.1136/jech-2017-209922

Bauer

D. J.

Hussong

A. M.

(2009). Psychometric approaches for developing commensurate measures across independent studies: Traditional and new models. Psychological Methods, 14, 101–125. https://doi.org/10.1037/a0015583

10.

Beck

E. D.

Jackson

J. J.

(2022). A mega-analysis of personality prediction: Robustness and boundary conditions. Journal of Personality and Social Psychology, 122(3), 523–553. https://doi.org/10.1037/pspp0000386

11.

Beck

E. D.

Yoneda

James

B. D.

Bennett

D. A.

Hassenstab

Katz

M. J.

Lipton

R. B.

Morris

Mroczek

D. K.

Graham

E. K.

(2024). Personality predictors of dementia diagnosis and neuropathological burden: An individual participant data meta-analysis. Alzheimer’s & Dementia, 20(3), 1497–1514. https://doi.org/10.1002/alz.13523

12.

Bennett

D. A.

Schneider

J. A.

Arvanitakis

Wilson

R. S.

(2012). Overview and findings from the Religious Orders Study. Current Alzheimer Research, 9(6), 628–645. https://doi.org/10.2174/156720512801322573

13.

Bennett

D. A.

Schneider

L. S.

Buchman

A. S.

Barnes

L. L.

Boyle

Wilson

(2012). Overview and findings from the rush Memory and Aging Project. Current Alzheimer Research, 9(6), 646–663.

14.

Ben-Shachar

M. S.

Lüdecke

Makowski

(2020). effectsize: Estimation of Effect Size Indices and Standardized Parameters. Journal of Open Source Software, 5(56), 2815. https://doi.org/10.21105/joss.02815

15.

Berlin

J. A.

Santanna

Schmid

C. H.

Szczech

L. A.

Feldman

H. I.

(2002). Individual patient- versus group-level data meta-regressions for the investigation of treatment effect modifiers: Ecological bias rears its ugly head. Statistics in Medicine, 21(3), 371–387. https://doi.org/10.1002/sim.1023

16.

Bidard

F.-C.

Peeters

D. J.

Fehm

Nolé

Gisbert-Criado

Mavroudis

Grisanti

Generali

Garcia-Saenz

J. A.

Stebbing

Caldas

Gazzaniga

Manso

Zamarchi

de Lascoiti

A. F.

De Mattos-Arruda

Ignatiadis

Lebofsky

van Laere

S. J.

. . . Michiels

(2014). Clinical validity of circulating tumour cells in patients with metastatic breast cancer: A pooled analysis of individual patient data. The Lancet Oncology, 15(4), 406–414. https://doi.org/10.1016/S1470-2045(14)70069-5

17.

Blair

Cooper

Coppock

Humphreys

Sonnet

(2022). estimatr: Fast estimators for design-based inference. https://CRAN.R-project.org/package=estimatr

18.

Bolker

Robinson

(2022). broom.mixed: Tidying methods for mixed models. https://CRAN.R-project.org/package=broom.mixed

19.

Brooks-Gunn

Phelps

Elder

G. H.

(1991). Studying lives through time: Secondary data analyses in developmental psychology. Developmental Psychology, 27, 899–910. https://doi.org/10.1037/0012-1649.27.6.899

20.

Burke

D. L.

Ensor

Riley

R. D.

(2017). Meta-analysis using individual participant data: One-stage and two-stage approaches, and why they may differ. Statistics in Medicine, 36(5), 855–875. https://doi.org/10.1002/sim.7141

21.

Bürkner

P.-C.

(2021). Bayesian item response modeling in R with brms and Stan. Journal of Statistical Software, 100(5), 1–54. https://doi.org/10.18637/jss.v100.i05

22.

Cameron

C. E.

McClelland

M. M.

Grammer

Morrison

F. J.

(2024). Self-regulation and academic achievement. In Bell

M. A.

(Ed.), Child development at the intersection of emotion and cognition (2nd ed., pp. 213–234). American Psychological Association. https://doi.org/10.1037/0000406-011

23.

Chartier

C. R.

Arnal

J. D.

Arrow

Bloxsom

N. G.

Bonfiglio

D. B. V.

Brumbaugh

C. C.

Corker

K. S.

Ebersole

C. R.

Garinther

Giessner

S. R.

Hughes

Inzlicht

Lin

Mercier

Metzger

Rangel

Saunders

Schmidt

Storage

Tocco

(2020). Many Labs 5: Registered replication of Albarracín et al. (2008), Experiment 5. Advances in Methods and Practices in Psychological Science, 3(3), 332–339. https://doi.org/10.1177/2515245920945963

24.

Cheng

Messerschmidt

Bravo

Waldbauer

Bhavikatti

Schenk

Grujic

Model

Kubinec

Barceló

(2024). A general primer for data harmonization. Scientific Data, 11(1), 152. https://doi.org/10.1038/s41597-024-02956-3

25.

Cohen

Aiken

L. S.

West

S. G.

(1999). The problem of units and the circumstance for POMP. Multivariate Behavioral Research, 34(3), 315–346. https://doi.org/10.1207/S15327906MBR3403_2

26.

Cole

V. T.

Hussong

A. M.

Gottfredson

N. C.

Bauer

D. J.

Curran

P. J.

(2023). Informing harmonization decisions in integrative data analysis: Exploring the measurement multiverse. Prevention Science, 24(8), 1595–1607. https://doi.org/10.1007/s11121-022-01466-1

27.

Cooper

Patall

E. A.

(2009). The relative benefits of meta-analysis conducted with individual participant data versus aggregated data. Psychological Methods, 14, 165–176. https://doi.org/10.1037/a0015565

28.

Costa

P. T.

McCrae

R. R.

(1992). Neo PI-R professional manual. Odessa, FL: Psychological Assessment Resources, 396, 653–665.

29.

Curran

P. J.

(2009). The seemingly quixotic pursuit of a cumulative psychological science: Introduction to the special issue. Psychological Methods, 14, 77–80. https://doi.org/10.1037/a0015972

30.

Curran

P. J.

Hussong

A. M.

(2009). Integrative data analysis: The simultaneous analysis of multiple data sets. Psychological Methods, 14, 81–100. https://doi.org/10.1037/a0015914

31.

Debray

T. P. A.

Moons

K. G. M.

Abo-Zaid

G. M. A.

Koffijberg

Riley

R. D.

(2013). Individual participant data meta-analysis for a binary outcome: One-stage or two-stage? PLOS ONE, 8(4), e60650. https://doi.org/10.1371/journal.pone.0060650

32.

Debray

T. P. A.

Moons

K. G. M.

van Valkenhoef

Efthimiou

Hummel

Groenwold

R. H. H.

Reitsma

J. B.

, & on behalf of the GetReal methods review group. (2015). Get real in individual participant data (IPD) meta-analysis: A review of the methodology. Research Synthesis Methods, 6(4), 293–309. https://doi.org/10.1002/jrsm.1160

33.

Dubrow

J. K.

Tomescu-Dubrow

(2016). The rise of cross-national survey data harmonization in the social sciences: Emergence of an interdisciplinary methodological field. Quality & Quantity, 50(4), 1449–1467. https://doi.org/10.1007/s11135-015-0215-z

34.

Duursma

(2022). bootpredictlme4: Predict method for lme4 with bootstrap. R package version 0.1. https://github.com/RemkoDuursma/bootpredictlme4

35.

Ebersole

C. R.

Atherton

O. E.

Belanger

A. L.

Skulborstad

H. M.

Allen

J. M.

Banks

J. B.

Baranski

Bernstein

M. J.

Bonfiglio

D. B. V.

Boucher

Brown

E. R.

Budiman

N. I.

Cairo

A. H.

Capaldi

C. A.

Chartier

C. R.

Chung

J. M.

Cicero

D. C.

Coleman

J. A.

Conway

J. G.

. . . Nosek

B. A.

(2016). Many Labs 3: Evaluating participant pool quality across the academic semester via replication. Special Issue: Confirmatory, 67, 68–82. https://doi.org/10.1016/j.jesp.2015.10.012

36.

Eysenck

H. J.

Eysenck

S. G. B.

(1965). The Eysenck Personality Inventory. British Journal of Educational Studies, 14(1), 140–140.

37.

Fitzmaurice

G. M.

Laird

N. M.

(1995). Regression models for a bivariate discrete and continuous outcome with clustering. Journal of the American Statistical Association, 90(431), 845–852. https://doi.org/10.1080/01621459.1995.10476583

38.

Fortier

Raina

Van den Heuvel

E. R.

Griffith

L. E.

Craig

Saliba

Doiron

Stolk

R. P.

Knoppers

B. M.

Ferretti

Granda

Burton

(2017). Maelstrom Research guidelines for rigorous retrospective data harmonization. International Journal of Epidemiology, 46(1), 103–105. https://doi.org/10.1093/ije/dyw075

39.

Gaure

(2013). OLS with multiple high dimensional category variables. Computational Statistics & Data Analysis, 66, 8–18. https://doi.org/10.1016/j.csda.2013.03.024

40.

Goebel

Grabka

M. M.

Liebig

Kroh

Richter

Schröder

Schupp

(2019). The German socio-economic panel (SOEP). Jahrbücher Für Nationalökonomie Und Statistik, 239(2), 345–360.

41.

Goh

J. X.

Hall

J. A.

Rosenthal

(2016). Mini meta-analysis of your own studies: Some arguments on why and a primer on how. Social and Personality Psychology Compass, 10(10), 535–549. https://doi.org/10.1111/spc3.12267

42.

Graham

E. K.

James

B. D.

Jackson

K. L.

Willroth

E. C.

Boyle

Wilson

Bennett

D. A.

Mroczek

D. K.

(2021). Associations between personality traits and cognitive resilience in older adults. The Journals of Gerontology: Series B, 76(1), 6–19. https://doi.org/10.1093/geronb/gbaa135

43.

Graham

E. K.

James

B. D.

Jackson

K. L.

Willroth

E. C.

Luo

Beam

C. R.

Pedersen

N. L.

Reynolds

C. A.

Katz

Lipton

R. B.

Boyle

Wilson

Bennett

D. A.

Mroczek

D. K.

(2021). A coordinated analysis of the associations among personality traits, cognitive decline, and dementia in older adulthood. Journal of Research in Personality, 92, 104100. https://doi.org/10.1016/j.jrp.2021.104100

44.

Graham

E. K.

Rutsohn

J. P.

Turiano

N. A.

Bendayan

Batterham

P. J.

Gerstorf

Katz

M. J.

Reynolds

C. A.

Sharp

E. S.

Yoneda

T. B.

Bastarache

E. D.

Elleman

L. G.

Zelinski

E. M.

Johansson

Kuh

Barnes

L. L.

Bennett

D. A.

Deeg

D. J. H.

Lipton

R. B.

. . . Mroczek

D. K.

(2017). Personality predicts mortality risk: An integrative data analysis of 15 international longitudinal studies. Journal of Research in Personality, 70, 174–186. https://doi.org/10.1016/j.jrp.2017.07.005

45.

Graham

E. K.

Weston

S. J.

Gerstorf

Yoneda

T. B.

Booth

Beam

C. R.

Petkus

A. J.

Drewelies

Hall

A. N.

Bastarache

E. D.

Estabrook

Katz

M. J.

Turiano

N. A.

Lindenberger

Smith

Wagner

G. G.

Pedersen

N. L.

Allemand

Spiro

. . . Mroczek

D. K.

(2020). Trajectories of Big Five Personality Traits: A coordinated analysis of 16 longitudinal samples. European Journal of Personality, 34(3), 301–321. https://doi.org/10.1002/per.2259

46.

Graham

E. K.

Willroth

E. C.

Weston

S. J.

Muniz-Terrera

Clouston

S. A. P.

Hofer

S. M.

Mroczek

D. K.

Piccinin

A. M.

(2022). Coordinated data analysis: Knowledge accumulation in lifespan developmental psychology. Psychology and Aging, 37, 125–135. https://doi.org/10.1037/pag0000612

47.

Griffith

van den Heuvel

Fortier

Hofer

Raina

Sohel

Payette

Wolfson

Belleville

(2013). Harmonization of cognitive measures in individual participant data and aggregate data meta-analysis. AHRQ Methods for Effective Health Care. http://europepmc.org/abstract/MED/23617017

48.

Groot

L. J.

Kan

K.-J.

Jak

(2024). Checking the inventory: Illustrating different methods for individual participant data meta-analytic structural equation modeling. Research Synthesis Methods, 15(6), 872–895. https://doi.org/10.1002/jrsm.1735

49.

Groot

L. J.

Kan

K. J.

Jak

(2025). Does cluster-robust estimation provide within-study effects? A comparison of individual participant data methods in MASEM. Structural Equation Modeling : A Multidisciplinary Journal, 32, 801–813. https://doi.org/10.1080/10705511.2025.2505995

50.

Hahn

Gottschling

Spinath

F. M.

(2012). Short measurements of personality – Validity and reliability of the GSOEP Big Five Inventory (BFI-S). Journal of Research in Personality, 46(3), 355–359. https://doi.org/10.1016/j.jrp.2012.03.008

51.

Hakulinen

Hintsanen

Munafò

M. R.

Virtanen

Kivimäki

Batty

G. D.

Jokela

(2015). Personality and smoking: Individual-participant meta-analysis of nine cohort studies. Addiction, 110(11), 1844–1852.

52.

Hill

P. L.

Stine-Morrow

E. A. L.

(2022). Introduction to the special issue on transparency, replicability, and discovery in the psychological science of adult development and aging. Psychology and Aging, 37, 6–9. https://doi.org/10.1037/pag0000673

53.

Hofer

S. M.

Piccinin

A. M.

(2009). Integrative data analysis through coordination of measurement and analysis protocol across independent longitudinal studies. Psychological Methods, 14, 150–164. https://doi.org/10.1037/a0015566

54.

Hoogendijk

E. O.

Deeg

D. J. H.

Poppelaars

van der Horst

Broese van Groenou

M. I.

Comijs

H. C.

Pasman

H. R. W.

van Schoor

N. M.

Suanet

Thomése

van Tilburg

T. G.

Visser

Huisman

(2016). The Longitudinal Aging Study Amsterdam: Cohort update 2016 and major findings. European Journal of Epidemiology, 31(9), 927–945. https://doi.org/10.1007/s10654-016-0192-0

55.

Hothorn

Bretz

Westfall

(2008). Simultaneous inference in general parametric models. Biometrical Journal, 50(3), 346–363.

56.

Hua

Burke

D. L.

Crowther

M. J.

Ensor

Tudur Smith

Riley

R. D.

(2017). One-stage individual participant data meta-analysis models: Estimation of treatment-covariate interactions must avoid ecological bias by separating out within-trial and across-trial information. Statistics in Medicine, 36(5), 772–789. https://doi.org/10.1002/sim.7171

57.

Huisman

Poppelaars

van der Horst

Beekman

A. T.

Brug

van Tilburg

T. G.

Deeg

D. J.

(2011). Cohort profile: The longitudinal aging study Amsterdam. International Journal of Epidemiology, 40(4), 868–876. https://doi.org/10.1093/ije/dyq219

58.

Hultsch

D. F.

Hertzog

Small

B. J.

Dixon

R. A.

(1999). Use it or lose it: Engaged lifestyle as a buffer of cognitive decline in aging? Psychology and Aging, 14, 245–263. https://doi.org/10.1037/0882-7974.14.2.245

59.

Jak

Cheung

M. W.-L.

(2020). Meta-analytic structural equation modeling with moderating effects on SEM parameters. Psychological Methods, 25(4), 430–455. https://doi.org/10.1037/met0000245

60.

Johnson

J. A.

(2014). Measuring thirty facets of the Five Factor Model with a 120-item public domain inventory: Development of the IPIP-NEO-120. Journal of Research in Personality, 51, 78–89. https://doi.org/10.1016/j.jrp.2014.05.003

61.

Jokela

Airaksinen

Virtanen

Batty

G. D.

Kivimäki

Hakulinen

(2020). Personality, disability-free life years, and life expectancy: Individual participant meta-analysis of 131,195 individuals from 10 cohort studies. Journal of Personality, 88(3), 596–605.

62.

Jokela

Batty

G. D.

Nyberg

S. T.

Virtanen

Nabi

Singh-Manoux

Kivimäki

(2013). Personality and all-cause mortality: Individual-participant meta-analysis of 3,947 deaths in 76,150 adults. American Journal of Epidemiology, 178(5), 667–675. https://doi.org/10.1093/aje/kwt170

63.

Jorm

A. F.

Mackinnon

A. J.

Christensen

Henderson

Scott

Korten

(1993). Cognitive functioning and neuroticism in an elderly community sample. Personality and Individual Differences, 15(6), 721–723. https://doi.org/10.1016/0191-8869(93)90013-S

64.

Juster

F. T.

Suzman

(1995). An overview of the health and retirement study. The Journal of Human Resources, 30, S7–S56. JSTOR. https://doi.org/10.2307/146277

65.

Katz

M. J.

Derby

C. A.

Wang

Sliwinski

M. J.

Ezzati

Zimmerman

M. E.

Zwerling

J. L.

Lipton

R. B.

(2016). Influence of perceived stress on incident amnestic mild cognitive impairment: Results from the Einstein Aging Study. Alzheimer Disease & Associated Disorders, 30(2), 93–98.

66.

Kay

(2022). tidybayes: Tidy data and geoms for Bayesian Models. https://doi.org/10.5281/zenodo.1308151

67.

Klein

R. A.

Cook

C. L.

Ebersole

C. R.

Vitiello

Nosek

B. A.

Hilgard

Ahn

P. H.

Brady

A. J.

Chartier

C. R.

Christopherson

C. D.

Clay

Collisson

Crawford

J. T.

Cromar

Gardiner

Gosnell

C. L.

Grahe

Hall

Howard

. . . Ratliff

K. A.

(2022). Many Labs 4: Failure to replicate mortality salience effect with and without original author involvement. Collabra: Psychology, 8(1), 35271. https://doi.org/10.1525/collabra.35271

68.

Klein

R. A.

Vianello

Hasselman

Adams

B. G.

Adams

R. B.

Alper

Aveyard

Axt

J. R.

Babalola

M. T.

Bahník

Š.

Batra

Berkics

Bernstein

M. J.

Berry

D. R.

Bialobrzeska

Binan

E. D.

Bocian

Brandt

M. J.

Busching

. . . Nosek

B. A.

(2018). Many Labs 2: Investigating variation in replicability across samples and settings. Advances in Methods and Practices in Psychological Science, 1(4), 443–490. https://doi.org/10.1177/2515245918810225

69.

Lachman

M. E.

Weaver

S. L.

(1997). The Midlife Development Inventory (MIDI) personality scales: Scale construction and scoring. Waltham, MA: Brandeis University, 7, 1–9.

70.

Langa

K. M.

Ryan

L. H.

McCammon

R. J.

Jones

R. N.

Manly

J. J.

Levine

D. A.

Sonnega

Farron

Weir

D. R.

(2020). The Health and Retirement Study Harmonized Cognitive Assessment Protocol Project: Study design and methods. Neuroepidemiology, 54(1), 64–74. https://doi.org/10.1159/000503004

71.

Legha

Riley

R. D.

Ensor

Snell

K. I. E.

Morris

T. P.

Burke

D. L.

(2018). Individual participant data meta-analysis of continuous outcomes: A comparison of approaches for specifying and estimating one-stage models. Statistics in Medicine, 37(29), 4404–4420. https://doi.org/10.1002/sim.7930

72.

Leyrat

Morgan

K. E.

Leurent

Kahan

B. C.

(2018). Cluster randomized trials with a small number of clusters: Which analyses should be used? International Journal of Epidemiology, 47(1), 321–331. https://doi.org/10.1093/ije/dyx169

73.

McArdle

J. J.

Grimm

K. J.

Hamagami

Bowles

R. P.

Meredith

(2009). Modeling life-span growth curves of cognition using longitudinal data with multiple samples and changing scales of measurement. Psychological Methods, 14, 126–149. https://doi.org/10.1037/a0015857

74.

McClearn

G. E.

Johansson

Berg

Pedersen

N. L.

Ahern

Petrill

S. A.

Plomin

(1997). Substantial genetic influence on cognitive abilities in twins 80 or more years old. Science, 276(5318), 1560–1563. https://doi.org/10.1126/science.276.5318.1560

75.

Mroczek

D. K.

Pitzer

Miller

Turiano

Fingerman

(2011). The use of secondary data in adult development and aging research. In Trzesniewski

K. H.

Donnellan

M. B.

Lucas

R. E.

(Eds.), Secondary data analysis: An introduction for psychologists (pp. 121–132). American Psychological Association. https://doi.org/10.1037/12350-007

76.

Mroczek

D. K.

Weston

S. J.

Graham

E. K.

Willroth

E. C.

(2022). Data overuse in aging research: Emerging issues and potential solutions. Psychology and Aging, 37, 141–147. https://doi.org/10.1037/pag0000605

77.

Neupert

S. D.

Graham

E. K.

Ogle

Ali

Zavala

D. V.

Kincaid

Hughes

M. L.

R. X.

Antonucci

Suitor

J. J.

Gilligan

Ajrouch

K. J.

Scott

S. B.

(2024). A coordinated data analysis of four studies exploring age differences in social interactions and loneliness during a global pandemic. The Journals of Gerontology: Series B, 79(8), gbae086. https://doi.org/10.1093/geronb/gbae086

78.

Nosek

B. A.

Hardwicke

T. E.

Moshontz

Allard

Corker

K. S.

Dreber

Fidler

Hilgard

Kline Struhl

Nuijten

M. B.

Rohrer

J. M.

Romero

Scheel

A. M.

Scherer

L. D.

Schönbrodt

F. D.

Vazire

(2022). Replicability, robustness, and reproducibility in psychological science. Annual Review of Psychology, 73(1), 719–748. https://doi.org/10.1146/annurev-psych-020821-114157

79.

Open Science Collaboration. (2012). An open, large-scale, collaborative effort to estimate the reproducibility of psychological science. Perspectives on Psychological Science, 7(6), 657–660. https://doi.org/10.1177/1745691612462588

80.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. https://doi.org/10.1126/science.aac4716

81.

Paige

Barrett

Pennells

Sweeting

Willeit

Di Angelantonio

Gudnason

Nordestgaard

B. G.

Psaty

B. M.

Goldbourt

Best

L. G.

Assmann

Salonen

J. T.

Nietert

P. J.

Verschuren

W. M. M.

Brunner

E. J.

Kronmal

R. A.

Salomaa

Bakker

S. J. L.

. . . Wood

(2017). Use of repeated blood pressure and cholesterol measurements to improve cardiovascular disease risk prediction: An individual-participant-data meta-analysis. American Journal of Epidemiology, 186(8), 899–907. https://doi.org/10.1093/aje/kwx149

82.

Pedersen

N. L.

McClearn

G. E.

Plomin

Nesselroade

J. R.

Berg

DeFaire

(1991). The Swedish adoption twin study of aging: An update. Acta Geneticae Medicae et Gemellologiae: Twin Research, 40(1), 7–20. https://doi.org/10.1017/S0001566000006681

83.

Poropat

A. E.

(2009). A meta-analysis of the five-factor model of personality and academic performance. Psychological Bulletin, 135, 322–338. https://doi.org/10.1037/a0014996

84.

Pustejovsky

J. E.

(2024). clubSandwich: Cluster-Robust (Sandwich) variance estimators with small-sample corrections. https://CRAN.R-project.org/package=clubSandwich

85.

Pustejovsky

J. E.

Tipton

(2018). Small-sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models. Journal of Business & Economic Statistics, 36(4), 672–683.

86.

Rammstedt

B.-L.

Clemens

AU-Danner DanielTI . (2018). Relationships between personality and cognitive ability: A facet-level analysis. Journal of Intelligence, 6(2), 28. https://doi.org/10.3390/jintelligence6020028

87.

Raudenbush

S. W.

(2002). Hierarchical linear models: Applications and data analysis methods (Advanced quantitative techniques in the social sciences series). SAGE.

88.

R Core Team. (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

89.

Revelle

(2022). psych: Procedures for psychological, psychometric, and personality research. Northwestern University. https://CRAN.R-project.org/package=psych

90.

Riley

R. D.

Legha

Jackson

Morris

T. P.

Ensor

Snell

K. I. E.

White

I. R.

Burke

D. L.

(2020). One-stage individual participant data meta-analysis models for continuous and binary outcomes: Comparison of treatment coding options and estimation methods. Statistics in Medicine, 39(19), 2536–2555. https://doi.org/10.1002/sim.8555

91.

Saucier

(1994). Mini-Markers: A brief version of Goldberg’s unipolar big-five markers. Journal of Personality Assessment, 63(3), 506–516. https://doi.org/10.1207/s15327752jpa6303_8

92.

Schmidt

F. L.

Hunter

J. E.

(1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124(2), 262–274. https://doi.org/10.1037/0033-2909.124.2.262

93.

Soto

C. J.

John

O. P.

(2017). The next Big Five Inventory (BFI-2): Developing and assessing a hierarchical model with 15 facets to enhance bandwidth, fidelity, and predictive power. Journal of Personality and Social Psychology, 113(1), 117–143. https://doi.org/10.1037/pspp0000096

94.

Soubelet

Salthouse

T. A.

(2010). The role of activity engagement in the relations between Openness/Intellect and cognition. Personality and Individual Differences, 49(8), 896–901. https://doi.org/10.1016/j.paid.2010.07.026

95.

Soubelet

Salthouse

T. A.

(2011). Personality–cognition relations across adulthood. Developmental Psychology, 47, 303–310. https://doi.org/10.1037/a0021816

96.

Stan Development Team. (2022). RStan: The R interface to Stan. https://mc-stan.org/

97.

Stewart

G. B.

Altman

D. G.

Askie

L. M.

Duley

Simmonds

M. C.

Stewart

L. A.

(2012). Statistical analysis of individual participant data meta-analyses: A comparison of methods and recommendations for practice. PLOS ONE, 7(10), e46042. https://doi.org/10.1371/journal.pone.0046042

98.

Stewart

L. A.

Clarke

Rovers

Riley

R. D.

Simmonds

Stewart

Tierney

J. F.

, & for the PRISMA-IPD Development Group. (2015). Preferred reporting items for a systematic review and meta-analysis of individual participant data: The PRISMA-IPD Statement. JAMA, 313(16), 1657–1665. https://doi.org/10.1001/jama.2015.3656

99.

Sutin

A. R.

Stephan

Damian

R. I.

Luchetti

Strickhouser

J. E.

Terracciano

(2019). Five-factor model personality traits and verbal fluency in 10 cohorts. Psychology and Aging, 34(3), 362–373. https://doi.org/10.1037/pag0000351

100.

Vaughan

Dancho

(2022). furrr: Apply Mapping Functions in Parallel using Futures. https://CRAN.R-project.org/package=furrr

101.

Viechtbauer

(2007). Accounting for heterogeneity via random-effects models and moderator analyses in meta-analysis. Zeitschrift Für Psychologie/Journal of Psychology, 215, 104–121. https://doi.org/10.1027/0044-3409.215.2.104

102.

Viechtbauer

(2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1–48.

103.

Weston

S. J.

Ritchie

S. J.

Rohrer

J. M.

Przybylski

A. K.

(2019). Recommendations for increasing the transparency of analysis of preexisting data sets. Advances in Methods and Practices in Psychological Science, 2(3), 214–227. https://doi.org/10.1177/2515245919848684

104.

Wettstein

Tauber

Kuźma

Wahl

H.-W.

(2017). The interplay between personality and cognitive ability across 12 years in middle and late adulthood: Evidence for reciprocal associations. Psychology and Aging, 32, 259–277. https://doi.org/10.1037/pag0000166

105.

Wickham

(2011). The split-apply-combine strategy for data analysis. Journal of Statistical Software, 40(1), 1–29.

106.

Wickham

Averick

Bryan

Chang

McGowan

L. D.

François

Grolemund

Hayes

Henry

Hester

Kuhn

Pedersen

T. L.

Miller

Bache

S. M.

Müller

Ooms

Robinson

Seidel

D. P.

Spinu

. . . Yutani

(2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

107.

Wickham

Bryan

(2022). readxl: Read Excel Files. https://CRAN.R-project.org/package=readxl

108.

Wickham

Miller

Smith

(2022). haven: Import and Export “SPSS”, “Stata” and “SAS” Files. https://CRAN.R-project.org/package=haven

109.

Wiggins

B. J.

Christopherson

C. D.

(2019). The replication crisis in psychology: An overview for theoretical and philosophical psychology. Journal of Theoretical and Philosophical Psychology, 39, 202–217. https://doi.org/10.1037/teo0000137

110.

Wilke

C. O.

(2020). cowplot: Streamlined Plot Theme and Plot Annotations for “ggplot2.” https://CRAN.R-project.org/package=cowplot

111.

Wilkins

Laß

Butterworth

Vera-Toscano

(2018). The household, income and labour dynamics in Australia Survey: Selected findings from waves 1 to 16. https://apo.org.au/node/184431

112.

Wilkins

Laß

Butterworth

Vera-Toscano

(2019). The household. Income and labour dynamics in Australia Survey: Selected findings from waves, 1.

113.

Willroth

E. C.

Atherton

O. E.

(2024). Best laid plans: A guide to reporting preregistration deviations. Advances in Methods and Practices in Psychological Science, 7(1), 25152459231213802. https://doi.org/10.1177/25152459231213802

114.

Willroth

E. C.

Beck

Yoneda

T. B.

Beam

C. R.

Deary

I. J.

Drewelies

Gerstorf

Huisman

Katz

M. J.

Lipton

R. B.

Muniz Tererra

Pedersen

N. L.

Reynolds

C. A.

Spiro

III Turiano

N. A.

Willis

Mroczek

D. K.

Graham

E. K.

(2025). Associations of personality trait level and change with mortality risk in 11 longitudinal studies. Journal of Personality and Social Psychology, 128(2), 392–409. https://doi.org/10.1037/pspp0000531

115.

Willroth

E. C.

Graham

E. K.

Mroczek

D. K.

(2022). Challenges and opportunities in preregistration of coordinated data analysis: A tutorial and template. Psychology and Aging, 37, 136–140. https://doi.org/10.1037/pag0000611

116.

Xie

(2014). knitr: A comprehensive tool for reproducible research in R. In Stodden

Leisch

Peng

R. D.

(Eds.), Implementing reproducible computational research. Chapman and Hall/CRC. http://www.crcpress.com/product/isbn/9781466561595

117.

Yoneda

Marroig

Graham

E. K.

Willroth

E. C.

Watermeyer

Beck

E. D.

Zelinski

E. M.

Reynolds

C. A.

Pedersen

N. L.

Hofer

S. M.

Mroczek

D. K.

Muniz-Terrera

(2022). Personality predictors of cognitive dispersion: A coordinated analysis of data from seven international studies of older adults. Neuropsychology, 36(2), 103–115. https://doi.org/10.1037/neu0000782

118.

Zhu

(2022). kableExtra: Construct complex table with “kable” and pipe syntax. R package version 1.4.0.19. https://haozhu233.r-universe.dev/kableExtra

119.

Zimprich

Allemand

Dellenbach

(2009). Openness to experience, fluid intelligence, and crystallized intelligence in middle-aged and old adults. Journal of Research in Personality, 43(3), 444–454. https://doi.org/10.1016/j.jrp.2009.01.018

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

2.77 MB