Sage Journals: Discover world-class research

Abstract

Concerns for the replicability, reliability, and generalizability of MRI and functional MRI (fMRI) research have led to debates over the contributions of sample size, open-science practices, and recruitment methods, particularly in the psychological sciences. Key to understanding the state of a science is an assessment of reporting practices. In this structured review, we evaluated select reporting practices across three domains: (a) demographic (e.g., age), (b) methodological (e.g., inclusion/exclusion criteria), and (c) open science and generalizability (e.g., preregistration, target-population identification). Included were 919 published MRI and fMRI studies from 2019 in nine top-ranked journals. Reporting across domains was infrequent; participant racial-ethnic identity (14.8%), reasons for missing imaging data (31.2%), and identification of a target population (19.4%) were particularly low. Reporting likelihood varied by study characteristics (e.g., journal) and was correlated across domains. Finally, study sample size but not reporting frequency was positively associated with 2-year citation counts. Results call for recentering transparency in reporting practices in MRI and fMRI studies, with direct implications for study generalizability.

Keywords

neuroimaging open science metascience MRI fMRI open data open materials

Over the last several decades, structural and functional neuroimaging have become widespread tools to study thoughts, emotions, behavior, and health in psychological science. The human brain is a topic of public fascination; neuroscience research has affected major legal decisions (e.g., Roper v. Simmons, 2005) and is the subject of global scientific initiatives (e.g., the Brain Research Through Advancing Innovative Neurotechnologies initiative; Yuste & Bargmann, 2017). In combination with psychological inquiry, brain science further informs other fields (e.g., philosophy; Bennett & Hacker, 2022) and affects what people think makes them human (J. Greene, 2014) and even which behaviors people believe are “normal” versus “abnormal” (Gazzaniga, 2005).

However, for any scientific research to be useful to practitioners, policymakers, or the public, study findings must be reliable (i.e., the construct can be reproduced from repeated measurements), replicable (i.e., analysts can reproduce the same results given the same data and methodological plan), and externally valid (i.e., study findings generalize to a broader population). Most discussions on how to improve the utility of human-neuroimaging research have centered on the links between sample size, statistical power, and reliability (Flournoy et al., 2024) and the adoption of open-science practices that promote reproducibility and replicability (Nichols et al., 2017). A strong focus on analytical transparency is critical (e.g., Botvinik-Nezer et al., 2020), and several guidelines for reporting study practices, such as preprocessing workflows, scan parameters, and data sharing, already exist (Nichols et al., 2017; Niso et al., 2022). However, the decisions that researchers make (and report) in the production of study samples—from the strategies to recruit participants to the data filtering that leads to the ultimate analytic sample size—also need to be understood. In this structured review, we evaluate select reporting practices in nearly 1,000 published MRI and functional MRI (fMRI) neuroimaging studies across three domains: sociodemographic composition (e.g., sex, racial-ethnic identity), methods (e.g., recruitment method, quality control), and open science and generalizability.

Sample Sociodemographic Composition and Population Generalizability

Following long-standing calls to diversify sample representation in psychological research (Henrich et al., 2010), scientists have begun to attend to the sociodemographic composition of human-neuroimaging studies as a key indicator of population generalizability (Falk et al., 2013). Empirical evaluations of sample composition in select journals (Dotson & Duarte, 2020) or topical areas (Qu & Telzer, 2017) and perspective pieces on the state of the field (Garcini et al., 2022) indicate that participants in neuroimaging studies are more likely to be socioeconomically advantaged; of majority racial-ethnic and cultural groups; and/or from White, educated, industrialized, rich, democratic populations (Henrich et al., 2010). Yet there is not a comprehensive field-wide assessment of reporting practices in neuroimaging research that goes beyond racial-ethnic identity to include other key sociodemographic factors.

Sample diversity along intersecting sociodemographic dimensions has direct implications for the basic scientific understanding of human beings and the application of precision-medicine techniques to reduce disease burden and promote well-being. When predictive models using neuroimaging data are trained in homogeneous samples, the models perform worse in samples with greater sociodemographic diversity (A. S. Greene et al., 2022). In another example, Gard et al. (2023) leveraged the Adolescent Brain Cognitive Development Study to report that accounting for sampling weights, which up- or down-weighted participants so that the sample demographics matched the target population, led to substantially different patterns of associations between socioeconomic resources and several metrics of brain development.

Sampling and Recruitment

Once participants are recruited to participate in a neuroimaging study, eligibility requirements and quality-control procedures further reduce the number of “usable” participant data to a final analytic sample, described as the “flow of participants” (American Psychological Association, 2020; Strengthening the Reporting of Observational Studies in Epidemiology [STROBE] reporting guidelines, von Elm et al., 2008). Each decision with respect to eligibility/ineligibility, quality control (e.g., removing participant with low behavioral performance), and missing data (e.g., listwise deletion) influences the sociodemographic makeup of the final analytical sample. To evaluate the degree of population generalizability in human-neuroimaging studies and to develop protocols to improve recruitment and retention, the field needs a clear understanding of from where and how participants are recruited and the decisions that contribute to the composition of the final analytic sample.

Open Science and Generalizability

Even in the case of successfully recruiting and retaining a generalizable sample, some research practices can undermine reliability, replicability, and generalizability. Open-science tools work to guard against questionable scientific practices, such as p-hacking and hypothesizing after the results are known (Kerr, 1998). “Multiverse analyses” (Botvinik-Nezer et al., 2020; Simonsohn et al., 2019); preregistration (Nosek et al., 2018); posting unthresholded statistical maps, parcellations, and atlases in open repositories, such as NeuroVault (Gorgolewski et al., 2015); and leveraging standardized Brain Imaging Data Structure (BIDS) file formats (Poldrack et al., 2024) have all been cited as paths to improving reliability and replicability (see also Nichols et al., 2017; Niso et al., 2022). However, authors’ assessment of the generalizability of their scientific findings is also a measure of transparency (Simons et al., 2017).

Empirical Evaluation of Reporting Practices

In this study, we attempt to address these knowledge gaps through a systematic evaluation of select reporting practices of all MRI and fMRI studies published in 2019 in top-ranked psychology-related journals. The first aim was to document reporting rates regarding sample demographic features (e.g., sex, racial-ethnic identity), methods (e.g., recruitment, quality control), and open-science and generalizability practices (e.g., preregistration). Although decisions related to analytic flexibility are also central in discussions of reliability, reproducibility, and generalizability, we opted to focus on researcher decisions and reporting practices most central to generalizability. Given the foundational role of study design in research-methods training, we hypothesized that studies would report methodological features at the highest rates but made no specific hypotheses about the exact rates of reporting. The second aim was to evaluate how study characteristics were associated with reporting in all three domains, but we made no explicit hypotheses. Third, the associations between study characteristics and sample size were evaluated, providing a much needed update of previous metascience investigations (e.g., Button et al., 2013). We hypothesized that studies with larger sample sizes would be more likely to be composed of adult-age participants (i.e., because of lower compliance and data quality in child samples) and to leverage structural MRI (sMRI; i.e., because of the large influence of motion artifacts in fMRI). The last aim was to evaluate to the degree to which reporting likelihood and study sample size were “valued” by the field, using citation counts as an indirect measure of value. Although we hypothesized positive associations between study sample size and reporting likelihood and citation count, no comparative hypotheses were generated.

Method

Identification and selection of studies

This systematic review was conducted using Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Moher et al., 2009), and the protocol (but not the hypotheses) was preregistered (https://osf.io/6tpsh/) before target articles were identified, coded, or analyzed. Training procedures and deviations from the registered protocol are described in Table S4 in the Supplemental Material available online.

We first selected psychology-related journals to identify target articles published in 2019. The Web of Science Journal Citation Reports tool (Clarivate, 2024) was used to sort journals by impact factor in the categories of “psychiatry,” “neuroscience,” and “neuroimaging.” To balance the scope of the review with team resources, we chose the two highest ranked journals in each category with the following exclusion criteria: (a) journals that primarily publish review articles, (b) journals that are not indexed on PubMed or PsychInfo, and (3) journals that do not include the words “neuroimaging” or “neuroscience” in the public statement of aims and scope. We also included three specialty journals for their coverage of human-neuroimaging research in cognitive, affective, and developmental domains (i.e., Developmental Cognitive Neuroscience, Journal of Cognitive Neuroscience, and Social Cognitive and Affective Neuroscience). In total, nine journals were included in the structured review (Table S5 in the Supplemental Material). In July 2020, articles published during 2019 in all selected journals were imported from PubMed or PsychInfo into the systematic-review software DistillerSR (2023).

Titles and abstracts were screened by two trained research assistants (RAs; A. A. Albrecht and A. Lurie), and conflicts were adjudicated by A. M. Gard (Polanin et al., 2019). Exclusion criteria included the following: (a) The article is a review article, commentary, systematic review, meta-analysis, case study, or methods article/technical report; (b) the study subjects are nonhuman subjects (e.g., animals, cell lines); and (c) the study does not include MRI or fMRI data. Articles that included both MRI/fMRI and another imaging modality (e.g., functional near-infrared spectroscopy EEG [fNIRS], electroencephalography [EEG], magnetoencephalography [MEG], positron emission tomography [PET], single photon emission computed tomography [SPECT], transcranial magnetic stimulation [TMS]) were excluded.

Data extraction

A coding system was developed by A. M. Gard, C. Mitchell, and L. W. Hyde. Codes were designed to capture the process of conducting a neuroimaging study, from recruitment to generalizing study findings to a broader population. Categories of codes included (a) global study features (e.g., participant age, study type [observational/intervention], imaging modality, smallest and largest analytic sample size), (b) sociodemographic information (e.g., gender or sex, race-ethnicity, socioeconomic resources), (c) methods (e.g., report of recruitment procedures, inclusion/exclusion criteria, quality-control procedures, missing values analyses), and (d) generalizability and open science (e.g., power analyses, preregistration, identification of target population). The codebook was generated using reporting guidelines from the seventh edition of the American Psychological Association (APA) style guide (APA, 2020) and STROBE (von Elm et al., 2008). Table 1 provides a summary of the codebook; for the complete codebook, see Table S1 in the Supplemental Material. Throughout the training phase and during the first several months of coding, the coding manual was updated iteratively with definitions of terms (e.g., intervention study criteria) and data-entry instructions (e.g., round numbers to the nearest percentage).

Table 1.

Full-Text Data-Extraction Codebook and Interrater Reliability

Item	Response options	Interrater reliability
Study-participant ages	Adults/children/all ages	κ = .78
Patient population	Yes/no	κ = .84
Study type	Observational/intervention	κ = .92
Imaging modality	fMRI/sMRI/both	κ = .67
Consortium study	Yes/no	κ = .79
Largest analytic sample size	[Number]	ICC (1, 1) = .94
Smallest analytic sample size	[Number]	ICC (1, 1) = .93
Report sex or gender	Yes/no	κ = .91
Sex or gender makeup	[Number]	ICC (1, 1) = .98
Report race-ethnicity	Race/ethnicity/both/neither/something else	κ = .86
Racial-ethnic makeup	[Number for each of several categories: White, Black, Asian, American Indian or Alaska Native, Native Hawaiian or Pacific Islander, Biracial, Hispanic]	ICC (1, 1) = 1.00 (all groups except biracial = NaN because of insufficient reporting)
Report age	Average/range/both/neither	κ = .86
Report socioeconomic makeup	Income/education/poverty ratio/combination/something else/none	κ = .85
Socioeconomic makeup	[Number for each of several categories: average annual income, % education high school or less, % education college or more, average years of education]	ICC (1, 1) = 1.00 (% education college or more, average years of education); all other codes = NaN because of insufficient reporting
Report country of origin	[Text]	κ = .83
Report recruitment procedures	Explicit (i.e., report of from where and how participants were recruited)/some information (i.e., report either from where or how participants were recruited)/none	κ = .69
Report N of initial refusals vs. consents to participate	Yes/no	κ = .86
Report date range of data collection	Yes/no	κ = .84
Report inclusion/exclusion	Yes/no	κ = .76
Report N refusals or unable to participate in neuroimaging	Yes/no	κ = .66
Report quality-control exclusions	Yes, a total number and exclusion criteria/yes, a total number but no exclusion criteria/yes, a number for each of several exclusion criteria/no	κ = .69
Missing-values analysis	Yes/no	κ = .86
Identification of a target population to generalize to	Yes/no	κ = .81
Comment on limitations of generalizability	Yes/no	κ = .78
Prestudy power analysis	Yes/no	κ = .79
Preregistration	Yes/no	κ = .84

Note: One hundred thirty-one articles were double-coded (14.4% of the 919 total articles analyzed) in six research assistant–A. M. Gard pairs. However, interrater reliability could be estimated only for the three research assistants who coded the majority (97%) of the articles. The reported estimates reflect Cohen’s kappas (i.e., for categorical codes) and ICCs (1, 1) (i.e., for continuous codes) weighted by the number of articles doubled-coded by each research assistant–A. M. Gard pair (weights = 0.56, 0.32, 0.12). ICCs were calculated as one-way random-effects models with consistency and a single measurement. ICC = intraclass correlation; κ = Conger’s kappa; fMRI = functional MRI; sMRI = structural MRI; NaN = not a number.

To ensure that all data were collected from each article, RAs coded whether an article cited previously published work to describe the methods. Subsequently, two coders referenced the cited articles to determine if the information was available. If so, the information in the cited article was used to code that article’s reporting practices (see Supplemental Methods in the Supplemental Material).

Articles that passed the abstract- and title-screening phase were randomized to one RA for full-text data extraction. To ensure reliability in the data-extraction phase, A. M. Gard extracted data for a random 15% of articles reviewed by each RA (Polanin et al., 2019). Interrater reliability between each RA and A. M. Gard was calculated for every extracted data point using Conger’s kappa (unweighted) for categorical data and intraclass correlations (one-way random effects, consistency, single measurement) for continuous data (Table 1).

Analytic plan

The first aim was to document current reporting practices in three domains: (a) demographic characteristics; (b) methods, from recruitment to analysis; and (c) generalizability and open science. In addition to item-level frequencies, we constructed a cumulative reporting index for each domain. The demographics reporting index consisted of four items: race or ethnicity, a measure of socioeconomic resources, sample age, and sex or gender. The methods reporting index included seven items: recruitment strategy, initial recruitment efforts, date of data collection, inclusion/exclusion criteria, reasons for no imaging data, quality-control exclusions, and missing-values analyses. The generalizability and open-science reporting index included four items: identification of a target population to which the study would generalize, description of the limitations of study generalizability, preregistration, and prestudy power analyses. In the construction of the reporting indices, dichotomous items (e.g., Was sample sex or gender composition reported?) were coded as 1 = yes, reported or 0 = no, not reported. Categorical items with more than two options were coded in “strict” or “loose” form to account for variability in field definitions of what constitutes complete reporting. For example, for sample age, the strict reporting index assigned a value of 1 to studies that reported both age range and average age, studies that reported only average age or age range were assigned 0.5, and studies that reported nothing about sample age were assigned 0; the loose reporting index assigned studies with any information about sample age (i.e., average or range) a value of 1 (see Table S1 in the Supplemental Material). In the main text, we present the results of features coded in strict form; results for loose definitions of reporting are in Supplemental Results in the Supplemental Material.

Second, study characteristics associated with reporting sociodemographic characteristics, methods, and open-science and generalizability practices were identified using multivariate linear regression. Study characteristics used to predict each of the three reporting indices included analytic sample size, sample age group, imaging modality, journal, consortia-study status, study type, and whether the study was composed of a patient (vs. community) sample. Categorical variables were entered as dummy variables such that the reference group was always the largest (e.g., for modality, sMRI studies). We also examined whether study characteristics were associated with analytic sample size (i.e., sample size was the dependent variable) using independent-samples t tests and one-way analysis of variance tests for categorical variables. Sample-size outliers (±3 SD) were first winsorized, and then sample size was log-transformed.

Finally, we examined the extent to which reporting practices in each domain were “valued” in the field. By leveraging the Clarivate’s Cited Reference Search tool (Clarivate, 2024), we constructed a measure of “value” defined as the number of times each article was cited. Citation frequency was recorded in March 2022. Multivariate linear-regression models were used to predict citation frequency; reporting indices, study characteristics, and sample size were entered as predictors. As with sample size, citation outliers (±3 SD) were first winsorized, and citation frequency was log-transformed following the addition of a constant = 1 (i.e., to account for articles with zero citations). Because both citation frequency and study sample size were log-transformed variables, unstandardized estimates can be interpreted as percentage change.

Transparency and openness

We adhered to the PRISMA 2020 guidelines for systematic reviews (Page et al., 2021). All data gathered as part of this empirical exercise and analytic code used in the current article are publicly available in the OSF repository (https://osf.io/6tpsh/). The protocol for the structured review but not the hypotheses or analytic plan was preregistered prospectively, before data was gathered (also available at the OSF link above).

Results

What were the characteristics of studies included for review?

Initially, 3,856 articles were identified (Fig. 1). During the title- and abstract-screen phase, 2,847 articles were excluded because the subjects were nonhuman (n = 694); the study was a review article, commentary, systematic review, meta-analysis, case study, or methods article/technical report (n = 1,342); and/or the neuroimaging modality was not MRI or fMRI (n = 1,772). A total of 1,009 articles were submitted to full-text screening and data extraction, of which, another 90 articles were excluded. The final sample size was 919 articles. Table 1 summarizes the codebook with estimates of interrater reliability.

Fig. 1.

Modified Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram for study identification.

Most studies were published in NeuroImage (n = 393; 43%) or Human Brain Mapping (n = 260; 28%); the fewest studies published were in Nature Neuroscience (n = 16; 1.7%) and Neuron (n = 7; 0.8%). Nearly 75% of the included studies were conducted in adult samples, followed by child samples (n = 142; 15%) and studies with participants across the life span (n = 99; 11%). One-quarter (n = 228) of the studies were classified as patient samples, in which participants met criteria for a medical diagnosis. Most studies were observational (n = 878; 96%) and implemented exclusively fMRI (n = 624; 68%) versus exclusively sMRI (25%) or combined fMRI and sMRI (7.5%). Finally, 13% of the studies leveraged consortium-level data in that data from multiple sites or studies were combined into a single analysis. See also Table S2 in the Supplemental Material.

How large were the included studies, and which study characteristics were associated with sample size?

Across all studies, the largest analytic sample size reported ranged from N = 5 to N = 45,615, with an average of N = 55 (median) and N = 253 (mean). Sample size was associated with other study features such that larger studies were more likely to leverage consortium data, t(121.47) = 5.08, p < .001, and adopt an observational design, t(449.85) = 6.35, p < .001. Sample size was also associated with participant developmental stage, F(2, 916) = 17.60, p < .001, and imaging modality, F(2, 916) = 32.34, p < .001. Post hoc Tukey tests revealed that studies of participants of all ages were more likely to be larger than studies of children-only (<18 years) or adult-only participants (M differences = 290.08 and 339.16, respectively; ps < .001). The fMRI studies were likely to be smaller in size than both sMRI-only studies (M difference = 300.45, p < .001) and multimodal fMRI/sMRI studies (M difference = 287.22, p < .001). Analytic sample size was not significantly associated with whether the study examined a patient population (p = .97). Results were indistinguishable when using the smallest analytic sample size reported (e.g., in cases when the article reported a sensitivity analysis with a smaller sample size; Supplemental Results in the Supplemental Material).

Do studies report sociodemographic information about their samples?

Figure 2 displays the proportion of studies that reported demographic, methodological, and generalizability and open-science features. Of the studies reviewed (N = 919), 14.8% reported the racial or ethnic identity of participants, and 27.9% reported something about the socioeconomic background of participants (e.g., income, education, poverty ratio, Hollingshead score), whereas 96% reported gender or sex, and 98% reported complete (41.7%; both age range and mean age) or partial (56.3%; age range or mean age) information about participant age. Across all four demographic features coded, the average number of features reported per study was 2 (50% of the total number of features. For results with an alternative coding scheme (Table S1 in the Supplemental Material), which produced similar reporting rates, see Figure S1 in the Supplemental Material.

Fig. 2.

Demographics, methods, and generalizability/open-science reporting in 919 studies published in 2019. N = 919. For definitions of whether a feature was reported, partially reported, or not reported, see the main text and in detail in Table S1 in the Supplemental Material available online. The proportions shown here reflect “strict” definitions (e.g., for age, complete reporting was operationalized as reporting both age range and mean age, and partial reporting was operationalized as reporting either age range or mean age). For a display of reporting proportions using “loose” definitions, see Figure S1 in the Supplemental Material.

In the 136 (≈15%) studies that reported participant racial-ethnic identity, most participants were racialized as White (M = 56.15%, Mdn = 64.5%), followed by Hispanic/Latinx (M = 15.10%, Mdn = 6.0%), Black (M = 12.89%, Mdn = 4.0%), Asian (M = 16.65%, Mdn = 0.0%), and biracial or multiracial (M = 2.92%, Mdn = 0.0%); very small proportions of participants were racialized as American Indian and Alaskan Native or Hawaiian and Pacific Islander (Fig. 3; ordered by median). We acknowledge that these categories are fluid constructs that have changed throughout the historical record and reflect imperfect and incomplete measures of identity and social position (Cardenas-Iniguez & Gonzalez, 2024). These racial-ethnic categories are also highly U.S.-centric but nevertheless capture the degree to which sociodemographic diversity is represented in the neuroimaging articles reviewed.

Fig. 3.

Race-ethnicity reporting. Among studies that report sample race-ethnicity, most participants are racialized as White. Includes 136 (14.8%) studies that reported participant race, ethnicity, or both. Box plots depict study-specific proportions of participants across seven U.S.-centric racial-ethnic groupings. These figures do not add up to 100% because every study did not report racial-ethnic breakdown for all of the identity categories coded in this structured review.

Of the 256 (27.9%) articles that reported something about the socioeconomic background of participants, most (n = 155) reported education continuously (i.e., average education in years) or categorically (e.g., percentage less than high school degree, percentage college-educated). The average educational attainment of participants in studies that reported education continuously was 14.3 years (minimum = 6; maximum = 21), suggesting that on average, most participants had some education beyond high school or secondary school. Because there is wide variability in educational attainment globally (Goujon et al., 2016), we calculated the difference between a study sample’s average education in years and the average years of schooling in the country of recruitment (Fig. 4) for the 140 studies that reported education continuously and reported a single country of recruitment. Eleven studies reported that the study sample was less educated than the average education (in years) in the country of recruitment, and 29 studies reported sample education levels within 1 year of the country average. The remaining 100 studies included participants whose average years of schooling exceeded the country-specific average education level (Fig. 4).

Fig. 4.

Education reporting. Studies overrepresent highly educated participants relative to the recruitment country average years of schooling. Of the 140 (15.2%) studies that reported both country of recruitment (n = 317; 34.5%) and mean education in years (n = 155; 16.9%), average education of the study samples recruited in each country is plotted against that country’s average education in years. Country population estimates were drawn from Our World in Data (Hannah et al., 2023). Studies that included several countries of recruitment were excluded from this analysis (n = 7). Nineteen countries are represented in this figure.

Do studies report the flow of participants from recruitment to final analytic sample?

As with sociodemographic reporting, there was wide variability in the reporting of methodological features related to the flow of participants from recruitment to final analytic sample (Fig. 2; Fig. S1 in the Supplemental Material). Few studies reported the number of participants initially contacted for recruitment (4.7%), the dates of data collection (6.7%), or a missing-values analysis comparing participants with and without usable data (4.4%). Far more studies reported information about recruitment procedures; 29.1% stated from where participants were recruited and through what method. Forty-six percent of studies reported inclusion/exclusion or eligibility/ineligibility criteria, 31.2% reported reasons for missing imaging data (if applicable), and 56.4% reported the quality-control criteria that resulted in the additional exclusion of participants from the analytic sample(s). Across the seven methodical features coded, the average number of features reported per study was 2 (≈30%).

Do studies report open-science practices and attention to generalizability?

Finally, in the open-science and generalizability domain, the average number of features reported by each study was fewer than 1. Only 19.4% of studies explicitly defined the target population for inference, 23.2% of studies commented on the limitations of the generalizability of their study sample, and 1.7% and 6% of studies preregistered their aims and hypotheses or conducted a prestudy power analysis, respectively (Fig. 2). Results using the alternative loose coding scheme were similar (see Supplemental Results and Fig. S1 in the Supplemental Material).

Which study characteristics are associated with reporting likelihood?

We next sought to explore the characteristics of studies that report demographic, methodological, and generalizability and open-science features. Zero-order correlations indicated that reporting in one domain was associated with reporting in another domain (.19 < r < .28, all ps < .001, adjusted for false-discovery rate [FDR]). Sample size was not associated with the demographic, methods, or overall reporting indices, but larger studies were more likely to report study features related to open science and generalizability (r = .12, FDR-adjusted p < .001). The same patterns of association were observed when we used the looser definitions of reporting and the smallest sample size that was reported (Supplemental Results in the Supplemental Material). Next, four multivariate models were estimated, one for each reporting index (demographic, methods, open science and generalizability) and the overall reporting index as the dependent variables (Table 2). All study characteristics and sample size were entered as independent variables. Sample age group, imaging modality, and journal were each uniquely associated with overall reporting frequency. Child and life-span studies were more likely to report study features than adult-only studies (overall reporting index: Fig. 5a), sMRI and multimodal (sMRI/fMRI) studies were more likely to report than fMRI-only studies (overall reporting index: Fig. 5b), and studies published in Developmental Cognitive Neuroscience, the American Journal of Psychiatry, and Molecular Psychiatry were more likely to report study features than studies published in Neuron, Journal of Cognitive Neuroscience, and NeuroImage (overall reporting index: Fig. 5c). Patterns were similar using the domain-specific (sociodemographic, methods, or open science and generalizability) reporting indices (Table 2). Note that nonconsortia studies were more likely than consortia studies to report demographic features, and studies using patient samples were more likely to report methods and open-science and generalizability features than studies that did not recruit patients. Consistent with the zero-order associations, sample size was not associated with the overall, sociodemographic, or methods reporting indices and was no longer associated with reporting in the open-science and generalizability domain after study characteristics were included in the multivariate models. Multivariate results were comparable in terms of both strength and direction using the loose reporting criteria (Table S3 in the Supplemental Material).

Table 2.

Study Characteristics Are Associated With Reporting Frequency

	Demographics	Methods	Open science and generalizability	Overall reporting index
	B (SE), β	B (SE), β	B (SE), β	B (SE), β
Sample size	0.000002 (0.00005), 0.001	−0.00006 (0.00008), –0.02	0.00007 (0.00005), 0.05	0.000013 (0.00013), 0.003
Patient population	0.10 (0.06), 0.05	0.30 (0.10), 0.11**	0.16 (0.06), 0.10**	0.06 (0.02), 0.12***
Consortia data	−0.02 (0.08), −0.11**	−0.20 (0.13), −0.05	0.10 (0.07), 0.05	−0.03 (0.02), −0.06
Treatment/intervention	−0.01 (0.01), −0.03	0.60 (0.20), 0.10**	0.09 (0.11), 0.03	0.057 (0.03), 0.06
Sample age	F(2, 13.42) = 12.10***	F(2, 40.57) = 14.11***	F(2, 4.61) = 4.97**	F(2, 147.92) = 22.09***
Children	Reference	Reference	Reference	Reference
Adults	−0.40 (0.08), −0.21***	−0.65 (0.12), −0.22***	−0.21 (0.07), −0.13**	−1.22 (0.02), −0.27***
All ages	−0.20 (0.10), −0.06	−0.30 (0.16), −0.07	−0.06 (0.09), −0.03	−0.05 (0.02), −0.08*
Modality	F(2, 19.83) = 17.87***	F(2, 1.28) = 0.45	F(2, 5.64) = 6.08**	F(2, 62.75) = 9.37***
Structural MRI	Reference	Reference	Reference	Reference
Functional MRI	−0.40 (0.06), −0.22***	−0.09 (0.10), −0.03	−0.20 (0.06), −0.13***	−0.067 (0.02), −0.16***
Multimodal	−0.30 (0.01), −0.09*	−0.020 (0.17), −0.005	−0.10 (0.10), −0.04	−0.038 (0.03), −0.05
Journal	F(8, 5.58) = 1.26	F(8, 40.61) = 3.53***	F(2, 4.31) = 1.16	F(8, 70.92) = 2.65**
NeuroImage	Reference	Reference	Reference	Reference
Developmental Cognitive Neuroscience	−0.01 (0.01), −0.004	0.55 (0.18), 0.11**	−0.05 (0.10), −0.02	0.049 (0.03), 0.06
Human Brain Mapping	−0.08 (0.06), 0.04	0.23 (0.10), 0.08*	−0.06 (0.06), −0.04	0.025 (0.02), 0.06
Journal of Cognitive Neuroscience	−0.09 (0.01), −0.03	−0.10 (0.17), 0.02	−0.08 (0.10), −0.03	−0.01 (0.03), −0.01
Molecular Psychiatry	0.10 (0.01), 0.02	0.47 (0.23), 0.07*	0.17 (0.13), 0.04	0.07 (0.04), 0.07*
Nature Neuroscience	−0.30 (0.20), −0.04	0.96 (0.31), 0.10**	0.15 (0.18), 0.003	0.07 (0.05), 0.05
Neuron	−0.30 (0.03), −0.03	−0.29 (0.46), −0.02	−0.35 (0.26), −0.04	−0.09 (0.07), −0.04
Social Cognitive and Affective Neuroscience	0.15 (0.09), 0.05	0.41 (0.15), 0.09**	0.13 (0.09), 0.05	0.07 (0.02), 0.10**
American Journal of Psychiatry	0.24 (0.02), 0.05	0.72 (0.29), 0.08*	0.05 (0.16), 0.009	1.01 (0.04), 0.07*

Note: N = 910. “Sample size” refers to the largest sample size reported, which was first winsorized to +3 SD from the mean and then log-transformed. “Multimodal” refers to studies that included both structural and functional MRI. One-way analyses of variance for categorical variables evaluate the change in model fit when that predictor was removed from the model compared with the full model. The reporting indices here reflect “strict” definitions (see Supplemental Methods in the Supplemental Material available online); Supplemental Results in the Supplemental Material includes models using the reporting indices construct with “loose” definitions and models with the smallest sample size reported.

p < .05. **p < .01. ***p < .001.

Fig. 5.

Sample age, imaging modality, and journal associations with overall reporting frequency. N = 919. (a) Overall index of the number of study features reported (out of a possible 15 features), plotted by the reported sample age group. Adult-only studies were less likely than both child-only and life-span studies to report study features across domains (Table 2). (b) Overall index of the number of study features reported, plotted by imaging modality. Studies employing multimodal or structural MRI only were more likely to report study features across domains than functional-MRI-only studies (Table 2). (c) Overall index of the number of study features reported, plotted by journal (for multivariate results, see Table 2).

What are the associations between reporting frequency, sample size, and citation counts?

Given the ubiquity of reporting standards across research fields (e.g., APA, 2020; STROBE, von Elm et al., 2008) combined with the wide variability in reporting identified here, the in the last empirical exercise, we evaluated the extent to which reporting features within and across domains were valued in terms of citation count within 2 years of publication. Operationalizing value through citation frequency is an imperfect measure and one with biases, including weak prediction of research quality (Dougherty & Horne, 2022). At the same time, more frequently cited articles are also likely more visible to both the research community and the public (McKiernan et al., 2019; Sternberg, 2016). In the ≈2 years after publication (January 2020–March 2022), the number of times each article was cited ranged from 0 to 112 (M = 11.51, SD = 12.15, Mdn = 8). After accounting for all study characteristics (i.e., journal, participant age group, modality, study type, consortia status, patient population) and sample size, none of the domain-specific reporting indices (demographics index: B = −0.004, SE = 0.032, p = .90; methods index: B = −0.016, SE = 0.021, p = .43; open-science and generalizability index: B = 0.014, SE = 0.035, p = .68) were associated with citation frequency. By contrast, study sample size was strongly associated with citation frequency (B = 0.127, SE = 0.025, p < .001). For every 1-unit increase in sample size, citation frequency increased by 12.7%. This figure was consistent in models with the overall reporting index and using the loosely defined transparency indices (Supplemental Results in the Supplemental Material).

Discussion

In the last 30 years, advances in MRI and fMRI technology have enabled scientists to study the brain bases of human behavior, cognition, and health and disease. In turn, imaging methods have captured the attention of scientists, policymakers, and the public at large. Metascience inquiries are needed to evaluate the state of this field, with particular emphasis on the generalizability and reproducibility of study findings. In this structured review of select methodological practices in human MRI and fMRI studies, we documented reporting practices across three domains (i.e., demographic, methods, open science and generalizability). Results indicated that although some study features were widely reported (e.g., gender or sex), many were not (e.g., reasons for missing imaging data). Coded study characteristics such as sample age group, journal, and imaging modality were related to a study’s likelihood of reporting across domains. Moreover, when examining associations between study characteristics and citation frequency, we found that study sample size but not reporting frequency was strongly associated with greater visibility in the field.

Two surprising results were low reporting rates of inclusion/exclusion criteria and reasons for missing MRI or fMRI data. Without this information, it is impossible to know the characteristics of the population that are not represented in each study (APA, 2020; von Elm et al., 2008). More than half (54%) of included studies in this structured review did not explicitly list inclusion/exclusion or eligibility/ineligibility criteria. We found widespread nonspecific statements such as “All participants were healthy and right-handed” or “Participants were free of major psychiatric disorders.” On the other hand, studies with clear reporting included numbered lists of exclusion criteria and definitions for each criterion (e.g., van Harmelen et al., 2014). An even larger proportion of studies (68.8%) failed to report the contributions of missing MRI or fMRI data to the flow of participants. More often, studies stated the number of participants used in the current analysis without reference to any of the reasons why participants did not have imaging data. By identifying characteristics of individuals who do not enter the scanning environment (e.g., anxiety or hesitation related to the scan), researchers could adjust protocols to promote retention (e.g., mock scanners, hiring staff with similar experiences to participants; Gard et al., 2023). One excellent example of reporting the flow of participants can be found in Hahm et al. (2019), who provided a detailed account of how participant data were lost at every stage of filtering.

Another concerning finding was the low reporting of select demographic characteristics, including the racial-ethnic identity and socioeconomic background of participants. Race-ethnicity is not a biological construct; it reflects a constellation of environmental, personal, and community experiences, including historical and current structural racism (Cardenas-Iniguez & Gonzalez, 2024; Varnum & Kitayama, 2017). Racial-ethnic identity is an especially salient sociodemographic characteristic that describes participants’ lived experiences and leads to culturally specific trajectories of development (Iruka et al., 2022). In turn, a cadre of studies have shown that racialized experiences (both promotive and potentially harmful) shape brain function and structure (Constante et al., 2023; Hyde et al., 2020). Thus, studies that ignore racialized experiences and environmental factors related to the provision of economic resources may neglect contributors to heterogeneity in brain development. Indeed, a recent investigation using a large study of early adolescent youths reported greater heterogeneity in neurodevelopment in marginalized and structurally disadvantaged groups compared with their more advantaged peers (Bottenhorn et al., 2024).

That human neuroscience emerged from animal neuroscience may help to explain why reporting rates of sample sociodemographic features are so low. All humans, regardless of social status, resources, identity, and/or geographical location, share fundamental biological processes. Dendritic arborization, cell death, synaptic pruning, and myelination are examples of brain-based molecular processes that function in the same way across all humans. However, in contrast to animal neuroscience, human-neuroimaging studies do not measure processes at a cellular level. Rather, the indirect measures of brain structure and function used in neuroscience research capture multiple molecular processes simultaneously. As the level of inference shifted from molecules and cells in animal neuroscience to individuals and groups of people in human neuroscience, so, too, did the importance of attending to variation in environmental contexts (Falk et al., 2013). In short, one brain is not representative of all brains. However, among the studies included in this structured review, most (76.8%) did not comment on the limitations of sample generalizability or identify the target population to which the study sought to generalize to (80.6%). Instead, many studies invoked generic language (DeJesus et al., 2019) that implied universalisms (e.g., “Furthermore, these results suggest that the right IFG [inferior frontal gyrus] plays a crucial role in supporting pitch encoding in the typical brain”).

A related result is the recognition of the large role that study sample size plays in both the interpretations of generalizability and the visibility of research products. There were several examples of falsely equating sample size with population representation (e.g., “Since our sample size was comparatively larger than in previous studies . . . the found partial correlations can be seen as more representative of the true effect sizes in the population”). Larger sample sizes indeed enable greater statistical power to detect small effect sizes (Turner et al., 2018). However, the notion that sample size in and of itself is a marker of generalizability is faulty at best. Decades of research highlights the insidious impact of nonresponse bias on deriving population estimates (Groves et al., 2009). Although large sample sizes are necessary for detecting small effect sizes (Marek et al., 2022), studies using these same data sets also reveal that larger sample sizes may not produce estimates that generalize to a broader population of individuals (e.g., Gard et al., 2023; LeWinn et al., 2017). And yet, the current results made clear that reporting larger sample sizes was highly “valued” by the field in terms of citation count. Estimates from multivariate models demonstrated that for every additional 1-unit increase in study sample size, article citations increased by nearly 13%. By contrast, the association between reporting frequency and citation count was estimated at a precise zero.

Perhaps less surprising but no less important were the exceedingly low rates of preregistration (1.7%) and prestudy power analyses (6%). A previous survey of 283 largely cognitive neuroscientists found that 57.6% of respondents reported engaging in at least one format of preregistration (Paret et al., 2022). Given known selection biases and low response rates common to survey designs, we see the reporting rates in the current investigation as a lower bound on the frequency of preregistration implementation in MRI and fMRI studies.

The importance of complete reporting and adoption of open-science practices in MRI and fMRI research is not a new topic of discussion. In response to concerns for the reproducibility of neuroimaging research, the Organization for Human Brain Mapping (OHBM) in 2014 created the Committee on Best Practices in Data Analysis and Sharing (COBIDAS) to generate best practices for reporting and data sharing (Nichols et al., 2017). Recommendations centered on reporting related to experimental design, image acquisition, preprocessing, statistical modeling, results, and best practices for data sharing and enhancing reproducibility. Several study characteristics that were coded and reported in the current investigation (i.e., age, sex, socioeconomic status, race-ethnicity, number of participants scanned vs. analyzed, exclusion criteria, power analyses, population and recruitment strategy, clinical criteria if applicable) were also included in the COBIDAS report. As revealed by the current structured review, however, many of these study features are not sufficiently reported. As a complement to the current review, an opportunity exists to evaluate reporting practices in the other methodological domains highlighted by the COBIDAS report.

Recommendations

Although individual studies and researchers could be targeted for intervention to increase reporting transparency in human-neuroscience studies, intervention at the journal level might prove more effective and efficient for improving reporting culture. Indeed, such structural requirements already exist; journals in the Nature publishing group require authors to complete an editorial-policy checklist that includes a data-availability statement and confirmation of practices for the protection of research participants and biological samples. Forms such as these, which authors can complete upon manuscript submission to a journal, could be expanded to include information about sample demographics, methodological details, and open-science and generalizability practices. A complementary approach, inspired by the work of COBIDAS (Nichols et al., 2017), is to update field-wide software (e.g., BIDS and BIDS apps, e.g., fMRIprep) to necessitate the inclusion of these study features in data structures themselves. For example, fMRIprep could integrate prompts that ask users to input participant sociodemographic characteristics (e.g., socioeconomic status, age, gender, and sex), which are then stored in participant-level files in BIDS-formatted data streams. Likewise, in the “derivatives” folder, a quality-control file could be automatically generated that identifies, with dummy codes, participants who passed successive preprocessing steps—this would be an extension of MRIQC (Provins et al., 2023), which can be used to evaluate the quality of raw acquired data (Esteban et al., 2020). These technical advances are necessary and will take time to implement.

To bolster reporting practices in human-neuroimaging studies more immediately, we advocate for inclusion of a journal-level reporting checklist that authors complete at the time of manuscript submission (Table 3). This same checklist can augment the existing OHBM COBIDAS checklist that, at present, is more centrally focused on acquisition parameters, preprocessing steps, and data-analysis and -sharing practices. Checklist items might include the three domains of reporting that we evaluated in this review—sample sociodemographic characteristics, methodological features, and open-science and generalizability practices—in addition to those identified by COBIDAS (Nichols et al., 2017). For each item, authors select whether the manuscript includes the information, just as current submission checklists require authors to acknowledge that all authors have approved the manuscript. Even if journals do not have the capacity to enforce all reporting features, the simple act of acknowledging whether said information is included in the manuscript may encourage researchers to increase their reporting practices more widely. Although not ideal, researchers can also acknowledge that the requested information was not collected (e.g., reasons for ineligibility).

Table 3.

Proposed Submission Reporting Checklist to Promote Increased Transparency and Generalizability in Human-Neuroimaging Research

	Additional guidance
Sociodemographic features
Participant age	Report average, standard deviation, and range
Gender identity and/or sex	Gender identity (current), biological sex (male/female/intersex) at birthBoth offer distinct pieces of information.
Community of descent	Context-specificAny measure that captures some aspect of heritage or identityExamples include race-ethnicity, ethnicity, country of origin, language.These are not intended to be categories based in biology but, rather, markers of identities imposed in each context that conveys something about representation.
Socioeconomic resources	Any measure that captures social and/or economic resourcesExamples include income, education in years, area-level measures of resources.
Methodological
Recruitment procedures	Include both the location (e.g., schools) and method (e.g., flyers to all students).In studies of cases/controls, report whether recruitment procedures differed.
Dates of data collection
Inclusion and exclusion criteria	Use the standard terminology of “inclusion/exclusion” or “ineligible/eligible” for clarity.
Flow of participants from recruitment through analytic sample	Number of participants who completed each of the following: (a) initial recruitment, (b) completed imaging protocol, (c) had high-quality imaging data, (d) had acceptable data according to other nonimaging analytic decisions
Missing-data analysis	Comparison of the analytic sample and original sample on sociodemographic constructs and primary variables of interest
Reasons for data loss	List specific criteria (e.g., high motion; > 1 mm framewise displacement) to identify sources of data loss because of incomplete imaging data and low-quality imaging data.
Open science and generalizability
State intended target population	What are the characteristics of the target population to which the study intends to generalize?
Comment on limitations of generalizability	Are there features of the study design and the recruited participants that temper population generalizability?
Were any prestudy power analyses conducted?	If applicable, clarify that prestudy power analyses were conducted before study implementation.
Was any part of the analysis preregistered	If applicable, state and provide link.
Data and/or code availability	If applicable, state and provide link.

One major challenge to this proposal is that racial-ethnic identity is arguably a difficult study feature to code and compare worldwide. Here, we endorse a recommendation advanced by the ManyBabies international consortium for studies to adopt a measure of community of descent that is locally valid and captures aspects of heritage and identity (Singh et al., 2024). Multiple constructs are captured under the term “community of descent” proposed by Singh et al. (2024), including ancestry, race-ethnicity, religion, national origin, cultural practices, and native language use, among others.

Limitations, future directions, and conclusions

Despite the strengths of this structured review, we cannot address all methodological elements (e.g., preprocessing pipelines, statistical-modeling procedures) and open-science practices (e.g., data sharing) that contribute to the reliability, reproducibility, and generalizability of human-neuroimaging research. Our focus on MRI and fMRI further limits understanding of methodological practices outside of these neuroimaging modalities. Finally, our team was unable to collect information about biological sex and gender separately. During training, it became clear that few studies reported participant gender as distinct from sex, and nearly all articles categorized sex or gender as binary constructs. Thus, to balance team resources with data availability, we opted to code whether sex or gender was reported and if reported, only the percentage of the sample identified as female. Because growing evidence points to robust sex and gender differences in phenotypic presentation and underlying brain systems (Eliot et al., 2023), the fact that we do not know the complete sex and gender makeup of participants in neuroscience studies is a limitation of this article and a major shortcoming of the field. Thus, to fully gauge the state of the science, additional metascience inquiries should be undertaken to document reporting of more methodological elements, open-science practices, and sociodemographic characteristics; the coding procedures adopted here may be useful to other researchers interested in pursuing this work.

The question of whether and how to report methodological practices is distinct from efforts to increase diversity in human-neuroimaging research. The former can be implemented now (Table 3); the latter requires specialized training, experience, and resources (Habibi et al., 2015). Supporting researchers with specialized skills to recruit and retain communities historically excluded from scientific research is essential to increasing the generalizability of the field’s science. Ultimately, efforts to increase sample sociodemographic representation will require radical shifts in how researchers conduct neuroimaging studies (Gard et al., 2022; La Scala et al., 2023).

The “Age of the Brain” is very much on researchers; MRI and fMRI studies inform public life in policy, medicine, and public-health domains. At the same time, the opportunity to use human-neuroimaging data to understand human behavior and support human well-being comes with responsibility to participants, funders, and the public at-large. Promoting greater transparency in reporting practices is one of many efforts that are needed to gain a more generalizable, reliable, and reproducible understanding of the human brain across environmental contexts.

Supplemental Material

sj-docx-1-amp-10.1177_25152459251372115 – Supplemental material for A Window Into the State of the Science: Current Reporting Practices Related to Generalizability in MRI and Functional-MRI Studies

Supplemental material, sj-docx-1-amp-10.1177_25152459251372115 for A Window Into the State of the Science: Current Reporting Practices Related to Generalizability in MRI and Functional-MRI Studies by Arianna M. Gard, Deena Shariq, Alison A. Albrecht, Alaina Lurie, Hyung Cho Kim, Colter Mitchell and Luke W. Hyde in Advances in Methods and Practices in Psychological Science

Footnotes

Acknowledgements

Some results in this article were presented at the FLUX Society for Developmental Cognitive Neuroscience in fall 2020 and again at the same conference in 2024. The articles was also posted as a preprint on bioRxiv ().

Transparency

Action Editor: Rogier Kievit

Editor: David A. Sbarra

Author Contribution(s)

D. Shariq, Al. A. Albrecht, and A. Lurie contributed equally.

Arianna M. Gard: Conceptualization; Data curation; Formal analysis; Funding acquisition; Investigation; Methodology; Project administration; Supervision; Validation; Visualization; Writing – original draft; Writing – review & editing.

Deena Shariq: Data curation; Formal analysis; Methodology; Validation; Visualization; Writing – review & editing.

Alison A. Albrecht: Data curation; Investigation; Writing – review & editing.

Alaina Lurie: Data curation; Investigation; Writing – review & editing.

Hyung Cho Kim: Visualization; Writing – review & editing.

Colter Mitchell: Conceptualization; Resources; Writing – review & editing.

Luke W. Hyde: Conceptualization; Writing – review & editing.

ORCID iDs

Arianna M. Gard

Alaina Lurie

Hyung Cho Kim

Luke W. Hyde

Supplemental Material

Additional supporting information can be found at

References

American Psychological Association. (2020). Publication manual of the American Psychological Association: The official guide to APA style (7th ed.).

Bennett

M. R.

Hacker

P. M. S.

(2022). Philosophical foundations of neuroscience. John Wiley & Sons.

Bottenhorn

K. L.

Cardenas-Iniguez

Schachner

J. N.

Rosario

M. A.

Mills

K. L.

Laird

A. R.

Herting

M. M.

(2024). Adolescent neurodevelopmental variance across social strata. JAMA Network Open, 7(5), Article e2410441. https://doi.org/10.1001/jamanetworkopen.2024.10441

Botvinik-Nezer

Holzmeister

Camerer

C. F.

Dreber

Huber

Johannesson

Kirchler

Iwanir

Mumford

J. A.

Adcock

R. A.

Avesani

Baczkowski

B. M.

Bajracharya

Bakst

Ball

Barilari

Bault

Beaton

Beitner

. . . Schonberg

(2020). Variability in the analysis of a single neuroimaging dataset by many teams. Nature, 582(7810), 84–88. https://doi.org/10.1038/s41586-020-2314-9

Button

K. S.

Ioannidis

J. P. A.

Mokrysz

Nosek

B. A.

Flint

Robinson

E. S. J.

Munafò

M. R.

(2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365–376. https://doi.org/10.1038/nrn3475

Cardenas-Iniguez

Gonzalez

M. R.

(2024). Recommendations for the responsible use and communication of race and ethnicity in neuroimaging research. Nature Neuroscience, 27(4), 615–628. https://doi.org/10.1038/s41593-024-01608-4

Clarivate. (2024). Journal citation reports [Data set]. https://clarivate.com/

Constante

Demidenko

M. I.

Huntley

E. D.

Rivas-Drake

Keating

D. P.

Beltz

A. M.

(2023). Personalized neural networks underlie individual differences in ethnic identity exploration and resolution. Journal of Research on Adolescence, 33(1), 24–42. https://doi.org/10.1111/jora.12760

DeJesus

J. M.

Callanan

M. A.

Solis

Gelman

S. A.

(2019). Generic language in scientific communication. Proceedings of the National Academy of Sciences, 116(37), 18370–18377. https://doi.org/10.1073/pnas.1817706116

10.

DistillerSR Inc. (2023). DistillerSR (Version 2.35) [Computer software]. https://www.distillersr.com

11.

Dotson

V. M.

Duarte

(2020). The importance of diversity in cognitive neuroscience. Annals of the New York Academy of Sciences, 1464(1), 181–191. https://doi.org/10.1111/nyas.14268

12.

Dougherty

M. R.

Horne

(2022). Citation counts and journal impact factors do not capture some indicators of research quality in the behavioural and brain sciences. Royal Society Open Science, 9(8), Article 220334. https://doi.org/10.1098/rsos.220334

13.

Eliot

Beery

A. K.

Jacobs

E. G.

LeBlanc

H. F.

Maney

D. L.

McCarthy

M. M.

(2023). Why and how to account for sex and gender in brain and behavioral research. The Journal of Neuroscience, 43(37), 6344–6356. https://doi.org/10.1523/JNEUROSCI.0020-23.2023

14.

Esteban

Ciric

Finc

Blair

R. W.

Markiewicz

C. J.

Moodie

C. A.

Kent

J. D.

Goncalves

DuPre

Gomez

D. E. P.

Salo

Valabregue

Amlien

I. K.

Liem

Jacoby

Stojić

Cieslak

Urchs

. . . Gorgolewski

K. J.

(2020). Analysis of task-based functional MRI data preprocessed with fMRIPrep. Nature Protocols, 15(7), 2186–2202. https://doi.org/10.1038/s41596-020-0327-3

15.

Falk

E. B.

Hyde

L. W.

Mitchell

Faul

Gonzalez

Heitzeg

M. M.

Keating

D. P.

Langa

K. M.

Martz

M. E.

Maslowsky

Morrison

F. J.

Noll

D. C.

Patrick

M. E.

Pfeffer

F. T.

Reuter-Lorenz

P. A.

Thomason

M. E.

Davis-Kean

Monk

C. S.

Schulenberg

(2013). What is a representative brain? Neuroscience meets population science. Proceedings of the National Academy of Sciences, 110(44), 17615–17622.

16.

Flournoy

J. C.

Bryce

N. V.

Dennison

M. J.

Rodman

A. M.

McNeilly

E. A.

Lurie

L. A.

Bitran

Reid-Russell

Vidal Bustamante

C. M.

Madhyastha

McLaughlin

K. A.

(2024). A precision neuroscience approach to estimating reliability of neural responses during emotion processing: Implications for task-fMRI. NeuroImage, 285, Article 120503. https://doi.org/10.1016/j.neuroimage.2023.120503

17.

Garcini

L. M.

Arredondo

M. M.

Berry

Church

J. A.

Fryberg

Thomason

M. E.

McLaughlin

K. A.

(2022). Increasing diversity in developmental cognitive neuroscience: A roadmap for increasing representation in pediatric neuroimaging research. Developmental Cognitive Neuroscience, 58, Article 101167. https://doi.org/10.1016/j.dcn.2022.101167

18.

Gard

A. M.

Hyde

L. W.

Heeringa

S. G.

West

B. T.

Mitchell

(2023). Why weight? Analytic approaches for large-scale population neuroscience data. Developmental Cognitive Neuroscience, 59, Article 101196. https://doi.org/10.1016/j.dcn.2023.101196

19.

Gard

A. M.

Tyrell

F. A.

Mueller

C. W.

(2022). Give the people what they want: Results from two qualitative studies exploring community members’ perspectives on developmental research and biosocial methods [Conference session]. International Society for Developmental Neuroscience, San Diego, CA, United States.

20.

Gazzaniga

M. S.

(2005). The ethical brain. Dana Press.

21.

Gorgolewski

K. J.

Varoquaux

Rivera

Schwarz

Ghosh

S. S.

Maumet

Sochat

V. V.

Nichols

T. E.

Poldrack

R. A.

Poline

J.-B.

Yarkoni

Margulies

D. S.

(2015). NeuroVault.org: A web-based repository for collecting and sharing unthresholded statistical maps of the human brain. Frontiers in Neuroinformatics, 9, Article 8. https://doi.org/10.3389/fninf.2015.00008

22.

Goujon

Samir

K. C.

Speringer

Barakat

Potancoková

Eder

Striessnig

Bauer

Lutz

(2016). A harmonized dataset on global educational attainment between 1970 and 2060 – An analytical window into recent trends and future prospects in human capital development. Journal of Demographic Economics, 82(3), 315–363. https://doi.org/10.1017/dem.2016.10

23.

Greene

A. S.

Shen

Noble

Horien

Hahn

C. A.

Arora

Tokoglu

Spann

M. N.

Carrión

C. I.

Barron

D. S.

Sanacora

Srihari

V. H.

Woods

S. W.

Scheinost

Constable

R. T.

(2022). Brain–phenotype models fail for individuals who defy sample stereotypes. Nature, 609(7925), Article 7925. https://doi.org/10.1038/s41586-022-05118-w

24.

Greene

(2014). Moral tribes: Emotion, reason, and the gap between us and them. Penguin.

25.

Groves

R. M.

Fowler

F. J.

Jr. Couper

M. P.

Lepkowski

J. M.

Singer

Tourangeau

(2009). Survey methodology (2nd ed.). John Wiley & Sons.

26.

Habibi

Sarkissian

A. D.

Gomez

Ilari

(2015). Developmental brain research with participants from underprivileged communities: Strategies for recruitment, participation, and retention. Mind, Brain, and Education, 9(3), 179–186. https://doi.org/10.1111/mbe.12087

27.

Hahm

Lotze

Domin

Schmidt

(2019). The association of health-related quality of life and cerebral gray matter volume in the context of aging: A voxel-based morphometry study with a general population sample. NeuroImage, 191, 470–480. https://doi.org/10.1016/j.neuroimage.2019.02.035

28.

Hannah

Samborska

Ahuja

Ortiz-Ospina

Roser

(2023). Global education. OurWorldinData.org. https://ourworldindata.org/global-education

29.

Henrich

Heine

S. J.

Norenzayan

(2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2–3), 61–83. https://doi.org/10.1017/S0140525X0999152X

30.

Hyde

L. W.

Gard

A. M.

Tomlinson

R. C.

Burt

S. A.

Mitchell

Monk

C. S.

(2020). An ecological approach to understanding the developing brain: Examples linking poverty, parenting, neighborhoods, and the brain. The American Psychologist, 75(9), 1245–1259. https://doi.org/10.1037/amp0000741

31.

Iruka

I. U.

Gardner-Neblett

Telfer

N. A.

Ibekwe-Okafor

Curenton

S. M.

Sims

Sansbury

A. B.

Neblett

E. W.

(2022). Effects of racism on child development: Advancing antiracist developmental science. Annual Review of Developmental Psychology, 4, 109–132. https://doi.org/10.1146/annurev-devpsych-121020-031339

32.

Kerr

N. L.

(1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2(3), 196–217. https://doi.org/10.1207/s15327957pspr0203_4

33.

La Scala

Mullins

J. L.

Firat

R. B.

, Emotional Learning Research Community Advisory Board, & Michalska

K. J

. (2023). Equity, diversity, and inclusion in developmental neuroscience: Practical lessons from community-based participatory research. Frontiers in Integrative Neuroscience, 16, Article 1007249. https://doi.org/10.3389/fnint.2022.1007249

34.

LeWinn

K. Z.

Sheridan

M. A.

Keyes

K. M.

Hamilton

McLaughlin

K. A.

(2017). Sample composition alters associations between age and brain structure. Nature Communications, 8(1), Article 874. https://doi.org/10.1038/s41467-017-00908-7

35.

Marek

Tervo-Clemmens

Calabro

F. J.

Montez

D. F.

Kay

B. P.

Hatoum

A. S.

Donohue

M. R.

Foran

Miller

R. L.

Hendrickson

T. J.

Malone

S. M.

Kandala

Feczko

Miranda-Dominguez

Graham

A. M.

Earl

E. A.

Perrone

A. J.

Cordova

Doyle

. . . Dosenbach

N. U. F.

(2022). Reproducible brain-wide association studies require thousands of individuals. Nature, 603(7902), 654–660. https://doi.org/10.1038/s41586-022-04492-9

36.

McKiernan

E. C.

Schimanski

L. A.

Muñoz Nieves

Matthias

Niles

M. T.

Alperin

J. P.

(2019). Use of the Journal Impact Factor in academic review, promotion, and tenure evaluations. eLife, 8, Article e47338. https://doi.org/10.7554/eLife.47338

37.

Moher

Liberati

Tetzlaff

Altman

D. G.

(2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Annals of Internal Medicine, 151(4), 264–269. https://doi.org/10.7326/0003-4819-151-4-200908180-00135

38.

Nichols

T. E.

Das

Eickhoff

S. B.

Evans

A. C.

Glatard

Hanke

Kriegeskorte

Milham

M. P.

Poldrack

R. A.

Poline

J.-B.

Proal

Thirion

Van Essen

D. C.

White

Yeo

B. T. T.

(2017). Best practices in data analysis and sharing in neuroimaging using MRI. Nature Neuroscience, 20(3), Article 3. https://doi.org/10.1038/nn.4500

39.

Niso

Botvinik-Nezer

Appelhoff

De La Vega

Esteban

Etzel

J. A.

Finc

Ganz

Gau

Halchenko

Y. O.

Herholz

Karakuzu

Keator

D. B.

Markiewicz

C. J.

Maumet

Pernet

C. R.

Pestilli

Queder

Schmitt

. . . Rieger

J. W.

(2022). Open and reproducible neuroimaging: From study inception to publication. NeuroImage, 263, Article 119623. https://doi.org/10.1016/j.neuroimage.2022.119623

40.

Nosek

B. A.

Ebersole

C. R.

DeHaven

A. C.

Mellor

D. T.

(2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600–2606. https://doi.org/10.1073/pnas.1708274114

41.

Page

M. J.

McKenzie

J. E.

Bossuyt

P. M.

Boutron

Hoffmann

T. C.

Mulrow

C. D.

Shamseer

Tetzlaff

J. M.

Akl

E. A.

Brennan

S. E.

Chou

Glanville

Grimshaw

J. M.

Hróbjartsson

Lalu

M. M.

Loder

E. W.

Mayo-Wilson

McDonald

. . . Moher

(2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. The BMJ, 372, Article n71. https://doi.org/10.1136/bmj.n71

42.

Paret

Unverhau

Feingold

Poldrack

R. A.

Stirner

Schmahl

Sicorello

(2022). Survey on open science practices in functional neuroimaging. NeuroImage, 257, Article 119306. https://doi.org/10.1016/j.neuroimage.2022.119306

43.

Polanin

J. R.

Pigott

T. D.

Espelage

D. L.

Grotpeter

J. K.

(2019). Best practice guidelines for abstract screening large-evidence systematic reviews and meta-analyses. Research Synthesis Methods, 10(3), 330–342. https://doi.org/10.1002/jrsm.1354

44.

Poldrack

R. A.

Markiewicz

C. J.

Appelhoff

Ashar

Y. K.

Auer

Baillet

Bansal

Beltrachini

Benar

C. G.

Bertazzoli

Bhogawar

Blair

R. W.

Bortoletto

Boudreau

Brooks

T. L.

Calhoun

V. D.

Castelli

F. M.

Clement

Cohen

A. L.

. . . Gorgolewski

K. J.

(2024). The past, present, and future of the Brain Imaging Data Structure (BIDS). Imaging Neuroscience, 2, 1–19. https://doi.org/10.1162/imag_a_00103

45.

Provins

MacNicol

Seeley

S. H.

Hagmann

Esteban

(2023). Quality control in functional MRI studies with MRIQC and fMRIPrep. Frontiers in Neuroimaging, 1, Article 1073734. https://doi.org/10.3389/fnimg.2022.1073734

46.

Telzer

E. H.

(2017). Cultural differences and similarities in beliefs, practices, and neural mechanisms of emotion regulation. Cultural Diversity and Ethnic Minority Psychology, 23(1), 36–44. https://doi.org/10.1037/cdp0000112

47.

Roper v. Simmons, 543 U.S. 551. (2005). https://supreme.justia.com/cases/federal/us/543/03-633/index.pdf

48.

Simons

D. J.

Shoda

Lindsay

D. S.

(2017). Constraints on Generality (COG): A proposed addition to all empirical papers. Perspectives on Psychological Science, 12(6), 1123–1128. https://doi.org/10.1177/1745691617708630

49.

Simonsohn

Simmons

J. P.

Nelson

L. D.

(2019). Specification curve: Descriptive and inferential statistics on all reasonable specifications. SSRN. https://doi.org/10.2139/ssrn.2694998

50.

Singh

Barokova

M. D.

Baumgartner

H. A.

Lopera-Perez

D. C.

Omane

P. O.

Sheskin

Yuen

F. L.

Alcock

K. J.

Altmann

E. C.

Bazhydai

Carstensen

Chan

K. C. J.

Chuan-Peng

Dal Ben

Franchin

Kosie

J. E.

Lew-Williams

Okocha

. . . Frank

M. C.

(2024). A unified approach to demographic data collection for research with young children across diverse cultures. Developmental Psychology, 60(2), 211–227. https://doi.org/10.1037/dev0001623

51.

Sternberg

R. J.

(2016). “Am I famous yet?” Judging scholarly merit in psychological science: An introduction. Perspectives on Psychological Science, 11(6), 877–881. https://doi.org/10.1177/1745691616661777

52.

Turner

B. O.

Paul

E. J.

Miller

M. B.

Barbey

A. K.

(2018). Small sample sizes reduce the replicability of task-based fMRI studies. Communications Biology, 1(1), Article 1. https://doi.org/10.1038/s42003-018-0073-z

53.

van Harmelen

A.-L.

van Tol

M.-J.

Dalgleish

van der Wee

N. J. A.

Veltman

D. J.

Aleman

Spinhoven

Penninx

B. W. J. H.

Elzinga

B. M

. (2014). Hypoactive medial prefrontal cortex functioning in adults reporting childhood emotional maltreatment. Social Cognitive and Affective Neuroscience, 9(12), 2026–2033. https://doi.org/10.1093/scan/nsu008

54.

Varnum

M. E. W.

Kitayama

(2017). The neuroscience of social class. Current Opinion in Psychology, 18, 147–151. https://doi.org/10.1016/j.copsyc.2017.07.032

55.

von Elm

Altman

D. G.

Egger

Pocock

S. J.

Gøtzsche

P. C.

Vandenbroucke

J. P

. (2008). The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: Guidelines for reporting observational studies. Journal of Clinical Epidemiology, 61(4), 344–349. https://doi.org/10.1016/j.jclinepi.2007.11.008

56.

Yuste

Bargmann

(2017). Toward a global BRAIN initiative. Cell, 168(6), 956–959. https://doi.org/10.1016/j.cell.2017.02.023

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.43 MB