Sage Journals: Discover world-class research

Abstract

Assessing the credibility of presented problems is an essential part of the clinical evaluation of attention-deficit/hyperactivity disorder (ADHD) in adulthood. We conducted a systematic review and meta-analysis to examine Continuous Performance Tests (CPTs) as embedded validity indicators. Eighteen studies (n = 3,021; 67 effect sizes) were analyzed: eight simulation studies and ten analogue studies. Moderating variables included study design (simulation vs. criterion) and sample type (student vs. nonstudent). CPTs effectively distinguish between credible and noncredible performance (g = 0.73). Effect sizes were nearly twice as large in simulation studies (g = 0.94) compared to criterion group studies (g = 0.55), underscoring the influence of study design on the interpretation of research findings. Student and nonstudent groups did not differ significantly. CPTs are valuable as embedded validity indicators. Given the moderate effects, clinical decisions should not rely on a single CPT but on a variety of measures.

Keywords

continuous performance test performance validity ADHD embedded validity indicators noncredible performance neuropsychological assessment meta-analysis

Background

Continuous Performance Tests (CPTs) are among the most commonly used neuropsychological tasks in both clinical practice and research on attention-deficit/hyperactivity disorder (ADHD; Pettersson et al., 2018). CPTs are computerized tasks that measure how well a person can stay focused and control their responses (DuPaul et al., 1992; Fuermaier et al., 2019, 2024). During a CPT, individuals are usually asked to respond quickly to certain visual or auditory signals while ignoring others. These tasks provide an objective way to measure attention and response control, unlike self-report questionnaires or clinical interviews that rely on someone’s memory or personal interpretation (Pagán et al., 2023; Rosvold et al., 1956). CPTs have been used in research and clinical practice since the 1950s (Pagán et al., 2023; Rosvold et al., 1956), and many versions have been developed over time. CPTs are commonly used within ADHD assessments to provide a better understanding of the cognitive problems that occur frequently. For example, at the group level, adults with ADHD show longer reaction times, greater variability, more distractibility (omissions), and more impulsive behavior (commissions) in comparison to adults without ADHD. Sometimes, CPT performance patterns may appear atypical compared to what is commonly observed in adults with ADHD. For example, an individual might show extremely slow reaction times or an unusually high number of errors that deviate from established response patterns. Such atypical results may indicate noncredible performance, meaning that the observed test scores might not accurately reflect the person’s actual cognitive abilities (Larrabee, 2012). In this respect, CPTs can be classified as embedded performance validity indicators that help the clinician to validate the performance level while also using the same test data to assess cognitive functioning (Rogers, 2018). Previous studies have found some promising evidence that CPTs can be useful as embedded performance validity indicators in adult ADHD assessments. However, the results have not been consistent across studies. This inconsistency raises questions about the reliability and generalizability of previous findings and highlights the need for the current meta-analysis to evaluate the utility of CPTs in assessing performance validity in adults undergoing ADHD assessment.

Performance tests, including CPTs, are not mandatory components of diagnostic evaluations of ADHD. While only self-report measures are required for establishing a diagnosis, performance tests are still frequently employed by clinicians, with approximately 30% reporting their use to assess impairment (Fuermaier et al., 2024). They hold an important role in the diagnostic process, particularly for treatment planning and clinical decision-making. When CPTs are included in the assessment of ADHD, it is essential to evaluate their validity given that the estimated prevalence of noncredible responding ranges from 9% to 16% (Mascarenhas et al., 2023; Ovsiew et al., 2023). Clinicians are generally not able to reliably detect or predict noncredible performance solely based on their clinical judgment (Dandachi-FitzGerald et al., 2017; Faust et al., 1988). The failure to include such validity measures may lead to the interpretation of invalid neuropsychological performance data, which can negatively affect the diagnostic assessment, treatment recommendations, and evaluation. The importance of incorporating validity tasks in neuropsychological assessments is emphasized by statements from the National Academy of Neuropsychology (NAN; Bush et al., 2005) and the American Academy of Clinical Neuropsychology (AACN; Sweet et al., 2021).

Noncredible performance can appear in various forms, such as malingering, factitious disorder, careless responding due to boredom, and the inability to perform the task according to instructions because of severe psychopathology or cognitive dysfunction. These forms of noncredible performance may have different underlying and interacting reasons. The interaction between the fundamental factors motivation, setting, and performance capacity may explain why people perform in a noncredible way (Dandachi-FitzGerald et al., 2024). Malingering, for example, may be the result of an extrinsic motivation to receive financial compensation, access to medication, or legal advantages. However, more intrinsic motivational factors can also explain noncredible test performance, for example, when people adopt a sick role (e.g., factitious disorder) or use symptoms as an excuse for failure. Performance is also influenced by the setting, which may include research participation, forensic evaluations, clinical treatment, or academic assessments. For example, students participating in research to obtain mandatory course credits may respond carelessly (DeRight & Jorgensen, 2015). In contrast, an individual undergoing a forensic evaluation to determine eligibility for disability benefits is more likely to exhibit malingering (Sweet et al., 2021). Another important factor is performance capacity, which reflects an individual’s actual cognitive and emotional ability to meet the demands of testing. For example, in some cases, individuals with significant cognitive impairments (i.e., dementia) may exhibit performance patterns resembling intentional underperformance (McGuire et al., 2019). However, these patterns may not stem from deceptive intent (e.g., feigning or malingering) but rather from a genuine inability to engage with the task effectively. Understanding the reasons for noncredible performance necessitates an integrated understanding of motivational factors, setting, and performance capabilities.

CPTs can be classified as embedded performance validity indicators. In contrast to freestanding performance validity indicators, which are designed to solely measure performance validity, embedded validity indicators, such as atypical hit patterns and reaction times, are derived from existing neuropsychological tests (Sweet et al., 2021). The benefit is that, if the test results are deemed valid, the collected test data can be interpreted and used in the clinical evaluation, which saves time and clinical resources. For example, if an attention test contains an embedded validity indicator, the clinician is able to validate the performance level while also using the same test data to assess cognitive functioning, which may reduce the need to administer separate validity and attention tests.

While this dual function makes embedded validity indicators efficient tools in clinical practice, their value depends on their discriminative accuracy. To examine how well CPTs distinguish between credible and noncredible performance, we calculated an effect size that reflects the size of the difference between two groups: individuals with a credible CPT performance and those whose performance was classified as noncredible. In this context, a larger effect size means that a CPT is better able to differentiate individuals who are responding in a credible manner from those whose test data suggest noncredible responding. The utility of CPTs as embedded performance validity indicators for distinguishing credible from noncredible performance in adult ADHD assessments has been investigated in various studies. While some studies have yielded promising results supporting the use of CPTs as embedded performance validity indicators in distinguishing credible from noncredible performance (C. Berger et al., 2021; Quinn, 2003; Winter & Braw, 2022), these findings have not been consistently replicated (Scimeca et al., 2021; Sollman et al., 2010; Suhr, Sullivan, & Rodriguez, 2011). This discrepancy suggests a lack of stability in the observed effects, raising questions about the generalizability of these findings. To better understand this variability, we examine four key moderators in our meta-analysis.

The first important source of variability lies in the diversity of CPTs assessed across studies. Clinicians and researchers are often uncertain about which specific CPT variant to use. Different CPTs measure different aspects of cognitive functioning. For example, the Perception and Attentional Functions test (WAFV) from the Vienna Test System (VTS; Schuhfried, 2013), includes tasks where participants respond to rare target stimuli. This design is monotonous, making it a measure of sustained attention and vigilance as performance must be maintained over time despite low levels of targets (Tucha et al., 2017). In contrast, other tasks, such as the Test of Variables of Attention (TOVA; Leark et al., 2007), have a mixed test design. The first half of test administration is regarded as “low‑stimulation,” with rare targets, while the second half is “high‑stimulation,” with more frequent targets. Although these CPTs vary in length and cognitive demands, many assess overlapping cognitive functions. Therefore, it remains an open question whether different CPT variants vary substantially in their sensitivity to noncredible performance.

The second source of variability is that the differences not only exist between various CPTs themselves but also in the specific outcome measures used to assess performance. There is also considerable uncertainty among clinicians and researchers regarding which specific CPT outcome measures are most informative, including indicators of processing speed, measures of distractibility (e.g., omissions), and measures of impulsivity (e.g., commissions). This uncertainty largely stems from the absence of studies directly comparing different CPT outcome measures in the context of embedded performance validity indicators in ADHD. We selected omission errors, commission errors, reaction time, and reaction time variability as our four main outcome measures. Additional measures, such as Hit Reaction Time Block Change or Hit Reaction Time Inter-Stimulus Interval Change from Conners-3, are derived from at least one of these core measures and are thus not independent from these core outcomes (Conners, 2014). Another argument to focus on the four indices is that they are included in the majority of studies.

Several studies have not identified any single CPT outcome measure that consistently outperforms others (Fuermaier et al., 2022; Robinson et al., 2023). Thus, we explored potential differences in the sensitivity of CPT outcome measures without presuming that any particular metric would show superior discriminative value.

The third source of variability is the variation in study designs. Some studies employed a criterion-group design, in which clinical data, often accessed from routine clinical care, are utilized. Within this design, the researcher examines two sets of clients: one group highly suspected of performing noncredibly according to an external credibility criterion (typically one or more established validity tasks), and another group highly likely of performing credibly according to the same criterion. All clients complete the CPT. While the CPT is originally designed as an attention test, in this context, it is used as a measure of validity. Thus, in this design on the development of embedded measures, researchers do not compare one validity test against another; rather, they compare a routine clinical measure (i.e., performance on the CPT) against a criterion of credibility. One of the main advantages of this approach is the high external validity because the study is carried out in clinics with clients who are performing the tests for practical reasons (Schroeder et al., 2019).

In contrast, other studies adopt a simulation group design (also called analogue design, see Rogers, 2018), in which clinical data of individuals with ADHD who were classified as credible based on a criterion of credibility (typically one or more validity tasks) are compared to the CPT performance of individuals without ADHD who were instructed to either perform normally or feign ADHD in an experimental setting (Rogers, 2018). In this study, we included only adults without ADHD instructed to feign and those in the credible group, as our aim was to examine whether CPTs can distinguish between these two groups. The advantage of this approach is the strong internal validity due to operationalized circumstances, consistent instructions, and manipulation checks (Rogers, 2018). The methodological differences between these two designs are nontrivial because individuals in criterion-group designs within a clinical setting may be more cautious to avoid detection, as it can impact their clinical trajectory, leading to more subtle effects. In contrast, individuals in simulation studies are explicitly instructed to simulate or feign ADHD characteristics, and are therefore more likely to exaggerate these characteristics. Thus, each design has distinct implications for the validity and interpretation of the results. Although it has been noted that effect sizes differ between simulation and criterion-group designs (Rogers, 1993), these differences have not been systematically evaluated across studies. This represents an important gap, as understanding the magnitude of these discrepancies is critical for interpreting the discriminative ability of CPT indicators. We hypothesized that simulation designs produce larger effect sizes relative to criterion-group designs.

The fourth source of variability is the sample composition. A substantial number of studies relied on student samples. Students in the criterion group design may have a more clear incentive to simulate ADHD characteristics, as a diagnosis may grant access to academic accommodations such as extended testing time, private test-taking environments, and reduced academic workload (Sullivan et al., 2007; Tucha et al., 2015). Furthermore, access to medication is an especially relevant incentive for students, as many may view the prescription of medication as a means to enhance academic performance, concentration, and alertness (e.g., Faraone et al., 2020). Given the widespread belief in the academic benefits of these medications and the relative ease of obtaining or sharing them both on and off campus, college students are likely disproportionately affected by misuse and diversion (Garcia et al., 2022). In a survey with over 1,000 students in the Netherlands, more than half believe it is easy to simulate ADHD (Fuermaier et al., 2021). Students often have heightened access to social and online resources related to ADHD characteristics, which may increase their ability to simulate characteristics during assessments. We therefore hypothesized that studies with nonstudent samples had larger effects compared to student samples.

In conclusion, the current body of research on the utility of CPTs as embedded performance validity indicators in ADHD assessment is characterized by inconsistencies in findings and methodological variability. We had two main objectives. First, we aimed to assess the general utility of CPTs as embedded performance validity indicators for detecting noncredible performance in ADHD assessments. We hypothesized that CPTs would demonstrate moderate effectiveness in distinguishing credible from noncredible performance. Second, we explored four potential sources of variability. To the best of our knowledge, this is the first meta-analysis to synthesize the evidence on CPTs as embedded performance validity indicators.

Method

We conducted the systematic review in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines for reporting systematic reviews (Page et al., 2021). The corresponding PRISMA checklist is provided in Supplement 1.

Literature Search

We registered our multilevel meta-analysis on Open Science Framework (OSF; DOI 10.17605/OSF.IO/PBDYV). P.T. searched for articles until February 2025 using scientific databases (i.e., PsycInfo, Scopus, PubMed, Web of Science, and Google Scholar). Furthermore, references of identified articles were checked to include all relevant studies meeting the inclusion criteria. The search was performed with the following essential keywords: “ADHD,” “Performance validity test,” “Continuous performance test,” “Malingering,” “non-credible performance,” and “Adult” (for full search terms per database, see Supplement 2). A.B.M.F., who has expertise in this area, compiled a reference list over the years and supplemented the literature search. Each article was reviewed by first screening the title and abstract, followed by a thorough examination of the full text based on the inclusion criteria. If a paper did not provide the necessary information, P.T. contacted the authors via email. Refer to Figure 1 for a flowchart outlining the search process and see Supplement 3 for a list of studies that appeared to meet the inclusion criteria, but which were excluded, and the reasons for exclusion.

Figure 1.

PRISMA Flowchart of Literature Search.

Study Selection

We included studies if they conducted a CPT measuring at least one of four CPT measures: omission errors (OM), commission errors (COM), reaction time (RT), or reaction time variability (RTSD). In addition, studies were required to either follow a criterion group or a simulation group design for the detection of noncredible performance in adult ADHD. Throughout this study, we labeled the group of individuals demonstrating credibility in the criterion-group design and simulation group design as the credible group. Conversely, we labeled the group that demonstrated noncredibility in the criterion-group design and the individuals without a classification of ADHD who were instructed to feign within the simulation design as the noncredible group. The means and standard deviations of the CPT measures for the credible and noncredible groups were required for inclusion.

Data Extraction

PT extracted the following data from the included articles. First, PT sorted the studies by design (i.e., criterion group design or simulation group design). Second, we identified the specific CPT variant(s) and CPT outcome measures used in each study. Then, we classified the studies based on whether they used student or nonstudent samples. Fourth, we extracted the sample size, mean age, and gender distribution for both the credible and noncredible groups. Fifth, we collected the means and standard deviations for each group on the four CPT outcome measures.

Data Analysis

We performed the certainty assessment following the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) guidelines to determine the quality of evidence. See the GRADE approach handbook to see how we graded each study from very low to high (Grade Working Group, 2013–2023). Study quality was assessed using the Joanna Briggs Institute (JBI) Critical Appraisal Checklist for Analytical Cross-Sectional Studies (JBI, 2020). We used the “metafor” package (Viechtbauer, 2010) in R Studio (R Studio Team, 2022) for data analysis. We created p-curves with the p-curve app 4.10 (https://www.p-curve.com/app4/). The code is publicly available on OSF (DOI 10.17605/OSF.IO/PBDYV).

In cases where multiple noncredible groups were reported, we calculated the overall mean and standard deviation using a weighted average based on sample size. When studies provided standard errors instead of standard deviations, standard deviations were first derived from the reported values. For all outcomes, lower effect sizes meant poorer performance. We transformed scores in case the original study used a different format.

We conducted a multilevel meta-analysis using a random-effects model to account for dependencies among effect sizes within studies. The model was fitted using the restricted maximum likelihood (REML) estimation method with the rma.mv() function. Effect sizes were calculated using standardized mean differences (SMD; Hedges’s g), and a forest plot was made. Hedges’s g of 0.2 is considered a small effect, 0.5 as medium effect, and 0.8 as large effect (Cohen, 1992).

We assessed overall heterogeneity with the Q statistic. Our multilevel model distinguished two sources of variance: between‑study variance (Level 3) and within‑study variance (Level 2, e.g., differences between CPT outcomes within the same study). Variance at each level was estimated using sigma squared (σ²): at Level 3, σ² reflects how much true effect sizes vary across studies, whereas at Level 2, σ² captures the variability of multiple effect sizes reported within the same study. To express the proportion of total variability explained at each level, we calculated I² values. Following Higgins and Thompson (2002), I² values of 25%, 50%, and 75% indicate low, moderate, and substantial heterogeneity, respectively.

We analyzed each moderator separately (i.e., study design and sample composition) with a multilevel meta-regression using a random-effects model. The models were fitted using the REML estimation method with the rma.mv() function. We calculated the effect size per category ( ${\hat{β}}_{0}$ ) and tested differences between each category ( ${\hat{β}}_{0}$ ). In addition, we conducted two exploratory analyses (i.e., CPT variant and CPT outcome measure) using the same multilevel meta-regression approach as in the moderator analyses. For all analyses, p < .05 indicated significance.

Certainty Assessment: CPT Outcomes

We included a certainty assessment of the four CPT outcomes to investigate inconsistency, indirectness, imprecision, publication bias, and the overall certainty of the evidence.

Study Quality

Furthermore, we assessed the risk of bias using the Risk of Bias—Symptom and Performance Validity (RoB-SPV) tool (Puente-López et al., 2025). We rated each study based on the selection of a clinical comparison group, design-specific components, and an overall assessment, using a scale of low risk of bias, some concerns, and high risk of bias.

Robustness Checks

Meta-analyses are susceptible to biases such as the influence of outliers and highly influential studies, as well as small-study effects (including publication bias), which can distort the estimated effect sizes.

Outliers and Influential Points

Outliers or influential studies may disproportionately affect the robustness and validity of the estimate (Viechtbauer & Cheung, 2010). To examine whether outliers or influential effect sizes influenced our final results, we conducted four distinct analyses. We performed a pooled estimate confidence interval (CI) analysis with a sensitivity analysis (Harrer et al., 2021), the leave-one-out method based on both effect size and study level, Cook’s Distance analysis (Viechtbauer, 2010), and a difference in betas (DFBETAs) analysis (Viechtbauer, 2010). For a detailed explanation of the procedures used, see Supplement 5(a).

Small-Study Effects

Small-study effects refer to the tendency of smaller studies to report different (often larger) effect sizes in comparison to larger studies (Schwarzer et al., 2015; Sterne et al., 2000). Publication bias is a well-known reason for the phenomenon of small-study effects, which arises when studies with significant results are more likely to be published than those with nonsignificant findings, leading to an overrepresentation of positive effects in the literature (Easterbrook et al., 1991). To address small-study effects, we conducted the following analyses: a p-curve analysis (Simonsohn et al., 2015), the test of excessive significance (Ioannidis & Trikalinos, 2007), and an assessment of funnel plot asymmetry through visual inspection, the multilevel Egger’s regression test (Rodgers & Pustejovsky, 2021), and a Trim-and-Fill analysis (Duval & Tweedie, 2000). For a detailed explanation of the procedures used, see Supplement 5(b).

Results

Literature Search and Study Characteristics

We identified 18 studies, see flow chart Figure 1, of which 67 effect sizes from 18 studies could be included in the final analysis. Because several studies reported multiple effect sizes based on overlapping samples, the number of independent participants (i.e., participants contributing unique data to the analysis) was 3,021. Approximately half of the participants in the credible group were female (50.98%, sd = 14.87), compared to 53.59% (sd = 19.65) in the noncredible group. The average age in the credible group was 26.86 (sd = 4.97) years and in the noncredible group 24.05 (sd = 4.55) years. In total, eight studies did not report sex and age in their sample.

Eight studies employed a simulation group design, whereas 10 studies adopted a criterion-group design. A total of 10 different CPT variants were administered: three studies utilized Conners Continuous Performance Test—II (Conners II; Conners, 2000), while five studies employed Conners Continuous Performance Test—3 (Conners 3; Conners, 2014). One study employed the Integrated Visual and Auditory Plus CPT (IVA +; Sanford & Turner, 1995), and three studies used the MOXO-d-CPT (I. Berger & Goldzweig, 2010). One study employed the Test of Attentional Performance (TAP; Zimmermann & Fimm, 2012), another study conducted the Quantified Behavior Test Plus (Qb+; Ulberstad, 2012), and two studies utilized the WAFV—VTS Perception and Attention Function—Vigilance (WAFV; Schuhfried, 2013). Furthermore, one study employed the Psychology Experiment Building Language (pCPT; Mueller & Piper, 2014), another used the Tests of Attentional Distraction (TOAD; Morey, 2016), and finally three studies employed the Test of Variables of Attention (TOVA; Leark et al., 2007). 17 studies assessed the CPT measures only visually, with the exception of Quinn (2003), who also assessed the CPT measures via the auditory information channel. OM was assessed in 18 studies, COM in 17 studies, RT in 10 studies, and RTSD in 13 studies. Among the 18 studies, 11 were conducted with samples from student groups (see Table 1 for study characteristics and see Figure 1 for a flowchart of the literature search).

Table 1.

Study Characteristics.

Study	Study design	CPT variant	Student sample	Credible group			Noncredible group			Effect size (Hedges’s g)
Study	Study design	CPT variant	Student sample	N	%F	Age	N	%F	Age	OM	COM	RT	RTSD
Quinn (2003)	SD	IVA^a	Yes	16	−	−	23	−	−	2.47	2.31	−	1.20
Quinn (2003)	SD	IVA^b	Yes	16	−	−	23	−	−	1.53	1.59	−	1.20
Sollman et al. (2010)	SD	Conners-II^c	Yes	29	44.8	19.4	30	53.3	19.1	.65	.40	−	.62
Marshall et al. (2010)	CD	Conners-II^c	No	66	−	−	45	−	−	.79	.54	−	−
Marshall et al. (2010)	CD	TOVA^d	No	66	−	−	45	−	−	.62	.59	−	.60
Suhr et al. (2011)	CD	Conners-II^c	Yes	27	−	−	22	−	−	.59	.62	.56	.63
Marshall et al. (2016)	CD	TOVA^d	No	82	−	−	90	−	−	.73	.41	−	.85
Hirsch and Christiansen (2018)	CD	TAP^e	No	133	36.0	31.5	63	33.0	34.4	.33	−	−	.05
Hirsch and Christiansen (2018)	CD	Qb+^f	No	133	36.0	31.5	63	33.0	34.4	.47	.07	−	−
Fuermaier et al. (2018)	SD	WAFV^g	Yes	46	45.7	33.4	116	73.3	20.4	1.13	.66	1.15	.39
Morey (2019)	SD	pCPT^h	Yes	32	−	−	115	−	−	.34	.70	−	.56
Morey (2019)	SD	TOADⁱ	Yes	32	−	−	115	−	−	.37	.65	−	.56
Harrison and Armstrong (2020)	CD	TOVA^d	Yes	13	−	−	49	−	−	.83	.62	.79	1.16
C. Berger et al. (2021)	SD	MOXO^j	Yes	47	63.8	23.8	47	78.7	22.7	1.09	1.60	.81	−
Ord et al. (2021)	CD	Conners-3^k	No	154	−	−	43	−	−	.42	.40	.31	.84
Scimeca et al. (2021)	CD	Conners-3^k	No	169	64.0	27.8	32	38.0	26.8	.41	.46	.40	.75
Winter and Braw (2022)	SD	MOXO^j	Yes	36	30.6	23.4	39	38.5	23.0	1.86	1.83	.93	−
Ovsiew et al. (2023)	CD	Conners^k	No	323	−	−	61	−	−	.69	.31	−	−
Lev et al. (2022)	SD	MOXO^j	Yes	33	57.6	23.88	37	67.6	23.62	.82	1.54	.54	−
Robinson et al. (2023)	CD	Conners-3^k	Yes	159	72.4	23.0	42	27.6	22.9	.75	.73	.30	.84
Finley et al. (2025)	CD	Conners-3^k	No	515	62.0	28.1	70	47.0	27.5	.52	−	−	−
Becke et al. (2023)	SD	WAFV^g	Yes	57	33.0	34.3	151	79.0	19.9	1.24	.46	.54	.01

Note. If a study by the same author and year is presented twice, it means that the study used more than one CPT. SD = simulation design; CD = criterion-group design; N = sample size; %F = percentage female; Age = mean age; OM = omission; COM = commission; RT = reaction time; RTSD = reaction time standard deviation; − = not available.

Integrated Visual and Auditory Plus CPT (IVA +) measured auditory.

Integrated Visual and Auditory Plus CPT (IVA +) measured visually.

Conners Continuous Performance Test—II.

Test of Variables of Attention (TOVA).

Test of Attentional Performance (TAP).

Qb+ Test.

WAFV—VTS Perception and Attention Function—Vigilance (WAFV).

Psychology Experiment Building Language—pCPT.

Tests of Attentional Distraction (TOAD).

MOXO-d-CPT.

Conners Continuous Performance Test-3.

Certainty Assessment CPT Outcomes

Certainty of evidence for CPT outcomes ranged from low to high. OM and RTSD showed moderate certainty, downgraded due to heterogeneity and publication bias. COM had low to moderate certainty, with high heterogeneity and strong publication bias. RT had the highest certainty (moderate to high), with limited heterogeneity and no publication bias. Full details are in Supplement 4, Table 1.

Study Quality

Across the 18 included studies, 16 showed some concerns regarding risk of bias. The specific reasons for these concerns varied by study design. Clinical comparison groups were generally well selected, although few studies systematically evaluated whether control groups were free from external motivations (e.g., litigation or secondary gain), which could compromise the validity of the comparison. Simulation studies varied in the clarity and completeness of their procedures, with inconsistent reporting of compliance checks and almost no consideration of participants’ prior familiarity with the simulated condition. Criterion-group studies showed heterogeneity in the standards used to classify valid and invalid performance, raising concerns about misclassification bias. General methodological aspects were often suboptimal: dropout rates exceeding 5% were common, methodological reporting sometimes lacked sufficient detail for replication, and control for clinical or linguistic confounders was inconsistently applied. In line with the RoB-SPV recommendations, we do not provide a single overall bias rating. Instead, these findings indicate a mixed methodological quality across the literature. Some domains (e.g., clinical sampling) were generally strong, whereas others (e.g., control group incentive assessment, simulation reporting) showed notable variability.

Robustness Checks

Outliers and Influential Studies

The four analyses conducted to identify outliers and influential studies did not indicate the presence of any such studies. Consequently, all studies were included in the main analysis. Detailed results for each analysis are provided in Supplement 6(a).

Small-Study Bias

The outcomes of the analyses conducted for small-study bias were less straightforward. The p-curve analysis suggests the presence of genuine underlying effects in the included studies. The test of excessive significance did not indicate an excess of significant findings, which suggests no indication of bias related to the selective reporting of statistically significant results. However, despite the trim-and-fill analysis not suggesting missing studies on the left side of the funnel plot, the visual asymmetry of the funnel plot and the significant Egger’s test raise concerns about the presence of small-study effects. Specifically, smaller studies tend to have higher standard errors and larger effect sizes, whereas larger studies are associated with lower standard errors and smaller effect sizes, which likely accounts for the observed asymmetry. Taken together, while there is evidence supporting the presence of true effects in our meta-analysis, the distribution of effect sizes suggests that they may still be inflated due to small-study bias. Detailed results for each analysis are provided in Supplement 6(b).

Main Analysis: Utility of CPTs

CPTs differentiated significantly between credible and noncredible groups (g = 0.73, 95% CI = [0.55, 0.91], p < .001; see Figure 2). Heterogeneity was significant, Q(66) = 244.23, p < .001, with larger heterogeneity between studies (σ² = 0.10, sd = 0.32, I² = 54.82%, k = 18) than within studies (σ² = 0.04, sd = 0.21, I² = 23.90% across 67 levels).

Figure 2.

Forest Plot of All the Effect Sizes.

Moderator Analyses

Moderator analyses revealed significant variability in effect sizes across CPT variants, CPT outcomes, and study design. Regarding the CPT variant, the largest effect was observed for the IVA CPT, which significantly outperformed all other variants. This suggests the IVA CPT seems the most sensitive CPT in distinguishing between credible and noncredible performance. Among CPT outcome domains, OM showed the highest average effect size. Effect sizes for OM were significantly higher than for RT, indicating this outcome may be more sensitive to group differences. No significant differences were found between other CPT outcomes. Furthermore, simulation group designs produced larger effects than criterion group designs. The contrast between the two designs was significant, suggesting study design influenced the observed effects. Finally, there were no statistically significant differences between student and nonstudent groups. Note that this moderator analysis was limited to criterion-group studies, as all simulation-group studies used student participants, making comparisons not possible. For a detailed overview of all statistics and contrasts, see Table 2.

Table 2.

Overview of All Test Statistics and Contrasts.

Moderators (k total)	Test of Moderator	Effect (k)	${\hat{β}}_{0}$	${\hat{β}}_{0}$	95% CI	$σ_{. 1}^{2}$	$σ_{. 2}^{2}$	QE (df)
CPT variant (67) ***		Conners CPT-II (9)	0.60		0.38; 0.82***	<.001	0.05	122.43(57)**
		IVA CPT (6)	1.66		1.31; 2.01****
		MOXO-d-CPT (9)	1.19		0.98; 1.41****
		TAP (2)	0.19		−0.17; 0.55
		Qb+ test (2)	0.27		−0.09; 0.64
		WAFV–VTS (8)	0.68		0.49; 0.87***
		p-CPT (3)	0.53		0.20; 0.87*
		TOAD (3)	0.52		0.19; 0.85*
		TOVA (10)	0.69		0.50; 0.88***
		Conners CPT-3 (15)	0.54		0.40; 0.68****
	CPT variant F(9, 8) = 7.83**
		Conners CPT-II vs. IVA CPT		1.06	0.64; 1.47****
		Conners CPT-II vs. MOXO-d-CPT		0.59	0.28; 0.90***
		IVA CPT vs. MOXO-d-CPT		−0.46	−0.87; −0.06*
		IVA CPT vs. TAP		−1.47	−1.98; −0.97****
		IVA CPT vs. Qb+ test		−1.39	−1.89; −0.88****
		IVA CPT vs. WAFV–VTS		−0.98	−1.37; −0.58****
		IVA CPT vs. p-CPT		−1.13	−1.61; −0.64****
		IVA CPT vs. TOAD		−1.14	−1.62; −0.66****
		IVA CPT vs. TOVA		−0.97	−1.37; −0.57****
		IVA CPT vs. Conners CPT-3		−1.12	−1.50; −0.74****
		MOXO-d-CPT vs TAP		−1.01	−1.43; −0.58****
		MOXO-d-CPT vs. Qb+ test		−0.92	−1.35; −0.50****
		MOXO-d-CPT vs. WAFV-VTS		−0.51	−0.80; −0.22***
		MOXO-d-CPT vs. pCPT		−0.66	−1.06; 0.27**
		MOXO-d-CPT vs. TOAD		−0.67	−1.07; −0.28***
		MOXO-d-CPT vs. TOVA		−0.50	−0.79; −0.22***
		MOXO-d-CPT vs. Conners CPT-3		−0.65	−0.91; −0.40****
		TAP vs. WAFV-VTS		0.50	0.08; 0.91*
		TAP vs. TOVA		0.50	0.09; 0.91*
		Qb+ test vs. WAFV–VTS		0.41	0.001; 0.82*
		Qb+ test vs. TOVA		0.42	0.01; 0.83*
CPT outcome (67)		OM (22)	0.83		0.61; 1.05****	0.11	0.04	238.83(63)****
		COM (20)	0.74		0.51; 0.96****
		RT (10)	0.56		0.29; 0.83***
		RTSD (15)	0.68		0.45; 0.93****
	CPT outcome F(3, 63) = 1.87
		OM vs. COM		−0.09	−0.28; 0.09
		OM vs. RT		−0.27	−0.50; −0.03*
		OM vs. RTSD		−0.14	−0.34; 0.06
		COM vs. RT		−0.18	−0.41; 0.06
		COM vs. RTSD		−0.05	−0.25; 0.16
		RT vs. RTSD		0.13	−0.12; 0.38
Study design (67) *		Simulation (32)	0.94		0.71; 1.18***	0.07	0.04	213.35(65)*
		Criterion (35)	0.55		0.35; 0.77***
	Study design F(1, 16) = 6.74*
		Criterion vs. Simulation		0.39	0.07; 0.70*
Sample group (35)		Student (12)	0.68		0.46; 0.91***	0.01	<.01	47.95(33)
		Nonstudent (23)	0.49		0.37; 0.63***
	Sample group F(1, 8) = 2.74
		Nonstudent vs. Student		0.18	−0.07; 0.45

Note. ${\hat{β}}_{1}$ denotes the Hedges’s g if the moderator equals the level in the Effects column.

${\hat{β}}_{1}$ denotes the change in Hedges g with a one-unit increase in the moderator.

$σ_{. 1}^{2}$ estimate is the estimated variance between studies, $σ_{. 2}^{2}$ estimate is the estimated variance within studies, and QE is a test for residual heterogeneity.

Effect sizes refer to the general utility of the type of variable (variant, outcome, design, or sample), and the comparisons refer to the test statistic reference of the moderation effect.

Regarding the CPT variant, only significant comparisons are displayed (21 out of the 45 contrasts).

Regarding sample composition, 32 effect sizes were excluded because they originated from studies using a simulation group design.

CPT variant and CPT outcome were approached as exploratory analyses.

Abbreviations: n = number of effect sizes, df = degrees of freedom, k = effect size.

p < .05. **p < .01. ***p < .001. ****p < .0001.

Discussion

CPTs are a key ingredient in both clinical and research settings for assessing cognitive functions in adults evaluated for ADHD (Pettersson et al., 2018). In addition to their established role in evaluating attention and impulsivity, CPTs have seen growing interest as embedded validity indicators, as several studies highlight their utility to distinguish credible from noncredible performance (C. Berger et al., 2021; Morey, 2019; Winter & Braw, 2022). Despite their widespread use, the evidence remained inconclusive as to whether CPTs can reliably serve as embedded indicators of performance validity, and our current meta-analysis aimed to synthesize this evidence. As hypothesized, we found a moderate-to-large effect (g = 0.73) in distinguishing credible from noncredible performance in clinical or experimental assessments of ADHD using CPTs.

To better understand the impact of the differences between the studies in our meta-analysis, we looked at two possible moderators that might explain this variation and also carried out two exploratory moderator analyses. For the first moderator, we considered whether the type of study design (i.e., criterion‑group vs. simulation‑group) could shed light on the variation we found across studies. Notably, studies employing a simulation group design yielded significantly larger effects than those using a criterion group design (Hedges’s g = 0.94 vs. 0.55). This higher sensitivity in simulation designs compared to criterion group designs is consistent with long-standing concerns about the limited generalizability of simulation research (Rogers, 1993). One plausible explanation is that the motivation behind pretending to have ADHD in an experimental context (focus on showing ADHD characteristics) differs from actual circumstances in clinical practice (focus on avoiding getting caught), which may result in exaggeration of ADHD characteristics in the simulation group. However, it is important to consider that other methodological differences between designs, such as sample characteristics, preparation, or assessment context, may also contribute to the observed sensitivity differences. Furthermore, relying solely on criterion-group designs is not desirable, as the less controlled environment may lead to distorted effect sizes. Ultimately, the choice of research design has a significant impact on the magnitude of the observed effect sizes. This means that, as a researcher or clinician, one should avoid drawing conclusions about the utility of CPTs as embedded validity measures based on a single type of research design. To obtain more accurate and ecologically valid assessments, it is essential to integrate evidence from both simulation and criterion-group designs when evaluating CPTs, thereby ensuring that clinical recommendations are grounded in realistic expectations.

For the second moderator analysis, we examined the effect of sample composition. Students may systematically differ from nonstudents in their motivation, knowledge of ADHD, and incentives to feign ADHD characteristics. We hypothesized that students, who generally have greater access to social and online information about ADHD characteristics, would be better positioned to simulate these characteristics during assessments and, as a result, be more difficult to detect than non‑student participants. No significant differences were observed between student and nonstudent samples. Important to note is that the robustness of the finding may be limited because of the exclusion of the studies that utilized a simulation design, which may have reduced statistical power.

For the first exploratory moderator analysis, we tested whether different versions of the CPT might have influenced the results. Indeed, we found differences across the various CPT versions. The IVA CPT demonstrated the largest effect, followed by the MOXO‑d‑CPT. The TAP and the QbTest yielded the lowest effect size. It is important to note that some CPT variants were only represented by a single study, and other CPTs by multiple studies. This imbalance limits the reliability of estimating moderator effects for underrepresented CPT variants. We did not identify any specific test characteristic, based on a review of the test manuals, that could clearly explain the substantial differences observed between the CPT variants. The observed differences between CPT variants should be confirmed in future research using larger samples. Of particular relevance is that with the exception of the TAP and QbTest, all CPT variants showed significant predictive power in differentiating credible from noncredible performance, with effect sizes ranging from medium to very large (Cohen, 1992).

In our second exploratory moderator analysis, we examined whether the four major CPT outcomes varied in their ability to distinguish between credible and non‑credible performance. Overall, omissions, commissions, reaction time, and variability in reaction time all showed comparable levels of predictive utility. Notably, omission errors appeared somewhat more sensitive than reaction time, though the reason for this difference remains unclear and warrants further investigation in larger samples. Importantly, all four outcomes demonstrated meaningful sensitivity in distinguishing credible from non‑credible performances.

Limitations

The current study is not without limitations. While robustness analyses indicated that the observed effect shows evidential value and no single study or effect size substantially influenced the overall effect size, it is likely that the results are influenced, at least in part, by small‑study effects. Findings from Egger’s test and the asymmetry of the funnel plot suggest that the originally reported effect size is likely overestimated, thereby warranting caution in the interpretation of its magnitude. Finally, the RoB-SPV evaluation indicated that most included studies presented some risk of bias, particularly regarding reporting transparency and the assessment of participant incentives. Such methodological inconsistencies may limit the comparability of studies.

Furthermore, research findings and current guidelines underscore the importance of utilizing multiple performance validity indicators (Rhoads et al., 2021; Soble, 2021; Sweet et al., 2021). Given that a single performance validity indicator failure may occur even in a credible group (indication of a false-positive), conclusions regarding noncredible performance should not be based solely on one performance validity indicator (Victor et al., 2009), unless performance is significantly below chance level on a forced-choice measure (Sweet et al., 2021). On the contrary, the likelihood of failing multiple performance validity indicators in a credible group, including individuals with cognitive impairment, is low (Critchfield et al., 2019), except in cases involving individuals with severe functional disabilities (Sweet et al., 2021). However, the criteria for classifying noncredibility differed across studies, specifically in terms of how many failed performance validity indicators were required for classification of noncredibility. In more than half of the included studies, noncredibility was determined based on a single external performance validity test (PVT) (typically well-established PVTs such as the Word Memory Test (WMT; Green, 2003) or Test of Memory Malingering (TOMM; Tombaugh, 1996)). The remaining studies employed more stringent or alternative criteria, such as failure on two or more PVTs, or a combination of one failed PVT and additional indicators (e.g., evidence of impaired cognitive functioning). The inclusion of different criteria may have resulted in misclassification of participants and (undesired) variability across studies, ultimately affecting the sensitivity estimates of the CPTs.

Finally, although we report group differences using Hedges’s g, it is important to note that this only shows the average difference between credible and noncredible groups. It does not tell us how well the CPT can correctly classify individual participants or how accurately it works at specific cutoff scores.

Future Research

Future research should go beyond group-level effects and focus on classification accuracy at the individual level. Specifically, studies are needed to examine how well CPTs can correctly identify noncredible performance using classification metrics, such as sensitivity, specificity, positive predictive value, and negative predictive value (Lange & Lippa, 2017). These metrics would provide a clearer picture of the practical and clinical utility of CPTs as embedded validity indicators, helping to determine not just whether group differences exist, but whether the tests can meaningfully separate individuals in real-world assessment settings.

Furthermore, future research should examine the utility of CPTs as embedded validity indicators in child and adolescent populations. Recent years have seen a growing support for the inclusion of performance and symptom validity testing in pediatric assessments (Kirk et al., 2020). Several freestanding PVTs, such as the TOMM and the Medical Symptom Validity Test (MSVT; Green, 2004), originally developed for adults, have also demonstrated effectiveness in pediatric settings (Blaskewitz et al., 2008; Constantinou & McCaffrey, 2003; Rienstra et al., 2010). However, embedded validity indicators such as CPTs, particularly in populations with ADHD, require further validation to establish their utility and clinical applicability (Kirk et al., 2020).

Conclusion

In conclusion, despite some limitations, our findings (Hedges’s g = 0.73) support the value of CPTs as embedded validity indicators in adult ADHD assessments. Effect sizes were nearly twice as large in simulation studies compared to criterion designs, highlighting the importance of integrating both approaches for a comprehensive evaluation of CPT utility before issuing clinical recommendations. Most CPT variants and outcomes showed predictive value. Based on these findings, clinicians and researchers can make more informed decisions about which CPT measures and variants to use in practice and research. However, the limited number of studies prevents firm conclusions about differences between them.

These findings carry meaningful practical implications. Our quantitative synthesis revealed patterns that are not evident in individual studies. For instance, Quinn (2003), a small study reporting unusually large effects, has been cited far more often than larger studies with more moderate findings. While factors such as publication timing may partly explain this pattern, it suggests that the field may have given disproportionate attention to a single study, which may have contributed to an overestimation of CPTs’ effectiveness as embedded validity measures.

However, CPTs do hold considerable promise as embedded validity indicators, particularly given their ability to simultaneously assess cognitive functioning and performance validity, increasing efficiency while complementing other validity tests. Moreover, their typically low face validity (Rogers, 2018) reduces the likelihood of intentional or unintentional performance distortion, further strengthening their clinical value. Given their widespread use in neuropsychological practice (Pettersson et al., 2018), CPTs are well-positioned for expanded use as embedded validity measures without requiring major changes to standard assessment protocols.

Of note, while our findings reflect a medium-to-large effect size (Cohen, 1992), the classification proposed by Rogers (2018), developed specifically for validity assessment, places this effect in the small range. As such, a single CPT should not be used in isolation to detect noncredible performance at the individual level. This reflects a broader shift in validity assessment toward interpreting performance validity as a position on a continuous scale derived from multiple PVTs, rather than applying a binary pass/fail decision based on a single cutoff score from one measure (Boone, 2009; Erdodi, 2019; Erdodi et al., 2014). This is particularly relevant given the lack of consensus on the most appropriate measures and optimal cut scores, which can significantly influence sensitivity and specificity estimates (Bigler, 2012; Erdodi, 2019). Thus, in clinical practice, CPTs should be interpreted in conjunction with other embedded and freestanding validity indicators to ensure a more robust and nuanced assessment of response validity.

Supplemental Material

sj-docx-1-asm-10.1177_10731911251401306 – Supplemental material for Evaluating Continuous Performance Tests as Embedded Measures of Performance Validity in ADHD Assessments: A Systematic Review and Meta-Analysis

Supplemental material, sj-docx-1-asm-10.1177_10731911251401306 for Evaluating Continuous Performance Tests as Embedded Measures of Performance Validity in ADHD Assessments: A Systematic Review and Meta-Analysis by Pinar Toptas, Tycho J. Dekkers, Annabeth P. Groenman, Geraldina F. Gaastra, Dick de Waard and Anselm B. M. Fuermaier in Assessment

Footnotes

Support and Contributors

A.B.M.F., P.T., T.J.D., and G.F.G. designed the study. A.B.M.F. and P.T. conducted the literature search. P.T. performed the data analysis. A.P.G. checked the code and supervised the statistical analysis; G.F.G. also contributed to the statistical support. P.T. wrote the manuscript. A.B.M.F. and T.J.D. provided supervision, and A.B.M.F., T.J.D., D.deW., A.P.G., and G.F.G. reviewed and edited the manuscript. All authors approved the final version of the manuscript.

Declaration of Conflicting Interests

The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: T.J.D. is in the steering committee of the European ADHD Network (EUNETHYDIS), which receives funding for educational activities by Medice and Takeda. A.B.M.F. has a contract with Schuhfried GmbH for the development and evaluation of neuropsychological instruments. The other authors report no potential conflicts of interest.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Pinar Toptas

Anselm B. M. Fuermaier

Availability of Data,Code,and Other Materials

The data and R code are made publicly available on OSF (DOI 10.17605/OSF.IO/PBDYV). Supplements can be found at Sage Assessment.

Supplemental Material

Supplemental material for this article is available online.

References

*Becke

Tucha

Butzbach

Aschenbrenner

Weisbrod

Tucha

Fuermaier

A. B. M.

(2023). Feigning adult ADHD on a comprehensive neuropsychological test battery: An analogue study. International Journal of Environmental Research and Public Health, 20(5), 4070. https://doi.org/10.3390/ijerph20054070

*Berger

Lev

Braw

Elbaum

Wagner

Rassovsky

(2021). Detection of feigned ADHD using the MOXO-d-CPT. Journal of Attention Disorders, 25(7), 1032–1047. https://doi.org/10.1177/1087054719864656

Berger

Goldzweig

(2010). Objective measures of attention-deficit/hyperactivity disorder—A pilot study. Israel Medical Association Journal, 12, 531–535.

Bigler

E. D.

(2012). Symptom validity testing, effort, and neuropsychological assessment. Journal of the International Neuropsychological Society, 18(4), 632–640. https://doi.org/10.1017/s1355617712000252

Blaskewitz

Merten

Kathmann

(2008). Performance of children on symptom validity tests: TOMM, MSVT, and FIT. Archives of Clinical Neuropsychology, 23(4), 379–391. https://doi.org/10.1016/j.acn.2008.01.008

Boone

K. B.

(2009). The need for continuous and comprehensive sampling of effort/response bias during neuropsychological examinations. The Clinical Neuropsychologist, 23(4), 729–741. https://doi.org/10.1080/13854040802427803

Bush

Ruff

Troster

Barth

Koffler

Pliskin

Reynolds

Silver

(2005). Symptom validity assessment: Practice issues and medical necessity NAN Policy & Planning Committee. Archives of Clinical Neuropsychology, 20(4), 419–426. https://doi.org/10.1016/j.acn.2005.02.002

Cohen

(1992). A power primer. Psychological Bulletin, 112(1), 155–159.

Conners

C. K.

(2000). Conners’ continuous performance test (CPT-2) computer program for windows, technical guide, and software manual. Multi-Health Systems Inc.

10.

Conners

C. K.

(2014). Conners’ continuous performance test 3rd edition (CPT-3) manual. Multi-Health Systems.

11.

Constantinou

McCaffrey

R. J.

(2003). Using the TOMM for evaluating children’s effort to perform optimally on neuropsychological measures. Child Neuropsychology, 9(2), 81–90. https://doi.org/10.1076/chin.9.2.81.14505

12.

Critchfield

Soble

J. R.

Marceaux

J. C.

Bain

K. M.

Chase Bailey

Webber

T. A.

Alex Alverson

Messerly

Andrés González

O’Rourke

J. J. F.

(2019). Cognitive impairment does not cause invalid performance: Analyzing performance patterns among cognitively unimpaired, impaired, and noncredible participants across six performance validity tests. The Clinical Neuropsychologist, 33(6), 1083–1101. https://doi.org/10.1080/13854046.2018.1508615

13.

Dandachi-FitzGerald

Merckelbach

Merten

(2024). Cry for help as a root cause of poor symptom validity: A critical note. Applied Neuropsychology: Adult, 31(4), 527–532. https://doi.org/10.1080/23279095.2022.2040025

14.

Dandachi-FitzGerald

Merckelbach

Ponds

R. W. H. M

. (2017). Neuropsychologists’ ability to predict distorted symptom presentation. Journal of Clinical and Experimental Neuropsychology, 39(3), 257–264. https://doi.org/10.1080/13803395.2016.1223278

15.

DeRight

Jorgensen

R. S.

(2015). I just want my research credit: Frequency of suboptimal effort in a non-clinical healthy undergraduate sample. The Clinical Neuropsychologist, 29(1), 101–117. https://doi.org/10.1080/13854046.2014.989267

16.

DuPaul

Anastopoulos

Shelton

Guevremont

Metevia

(1992). Multimethod assessment of attention-deficit hyperactivity disorder: The diagnostic utility of clinic-based tests. Journal of Clinical Child and Adolescent Psychology, 21, 394–402. https://doi.org/10.1207/s15374424jccp2104_10

17.

Duval

Tweedie

(2000). Trim and fill: A simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics, 56(2), 455–463. https://doi.org/10.1111/j.0006-341X.2000.00455.x

18.

Easterbrook

P. J.

Gopalan

Berlin

J. A.

Matthews

D. R.

(1991). Publication bias in clinical research. The Lancet, 337(8746), 867–872. https://doi.org/10.1016/0140-6736(91)90201-Y

19.

Erdodi

L. A.

(2019). Aggregating validity indicators: The salience of domain specificity and the indeterminate range in multivariate models of performance validity assessment. Applied Neuropsychology: Adult, 26(2), 155–172. https://doi.org/10.1080/23279095.2017.1384925

20.

Erdodi

L. A.

Roth

R. M.

Kirsch

N. L.

Lajiness-O’neill

Medoff

(2014). Aggregating validity indicators embedded in Conners’ CPT-II outperforms individual cutoffs at separating valid from invalid performance in adults with traumatic brain injury. Archives of Clinical Neuropsychology, 29(5), 456–466. https://doi.org/10.1093/arclin/acu026

21.

Faraone

S. V.

Rostain

A. L.

Montano

C. B.

Mason

Antshel

K. M.

Newcorn

J. H.

(2020). Systematic review: Nonmedical use of prescription stimulants: Risk factors, outcomes, and risk reduction strategies. Journal of the American Academy of Child & Adolescent Psychiatry, 59(1), 100–112. https://doi.org/10.1016/j.jaac.2019.06.012

22.

Faust

Hart

Guilmette

Arkes

(1988). Neuropsychologists’ capacity to detect adolescent malingerers. Professional Psychology: Research and Practice, 19, 508–515. https://doi.org/10.1037/0735-7028.19.5.508

23.

*Finley

J.-C. A.

Brooks

J. M.

Nili

A. N.

VanLandingham

H. B.

Ovsiew

G. P.

Ulrich

D. M.

Resch

Z. J.

Soble

J. R.

(2025). Multivariate examination of embedded indicators of performance validity for ADHD evaluations: A targeted approach. Applied Neuropsychology: Adult, 32, 1254–1267. https://doi.org/10.1080/23279095.2023.2256440

24.

Fuermaier

A. B. M.

Fricke

J. A.

de Vries

S. M.

Tucha

(2019). Neuropsychological assessment of adults with ADHD: A Delphi consensus study. Applied Neuropsychology: Adult, 26(4), 340–354. https://doi.org/10.1080/23279095.2018.1429441

25.

Fuermaier

A. B. M.

Gontijo-Santos Lima

Tucha

(2024). Impairment assessment in adult ADHD and related disorders: Current opinions from clinic and research. Journal of Attention Disorders, 28(12), 1529–1541. https://doi.org/10.1177/10870547241261598

26.

Fuermaier

A. B. M.

Tucha

Guo

Mette

Müller

B. W.

Scherbaum

Tucha

(2022). It takes time: Vigilance and sustained attention assessment in adults with ADHD. International Journal of Environmental Research and Public Health, 19(9), 5216. https://doi.org/10.3390/ijerph19095216

27.

*Fuermaier

A. B. M.

Tucha

Koerts

Send

T. S.

Weisbrod

Aschenbrenner

Tucha

(2018). Is motor activity during cognitive assessment an indicator for feigned attention-deficit/hyperactivity disorder (ADHD) in adults? Journal of Clinical and Experimental Neuropsychology, 40(10), 971–986. https://doi.org/10.1080/13803395.2018.1457139

28.

Fuermaier

A. B. M.

Tucha

Koerts

Tucha

Thome

Faltraco

(2021). Feigning ADHD and stimulant misuse among Dutch university students. Journal of Neural Transmission, 128(7), 1079–1084. https://doi.org/10.1007/s00702-020-02296-7

29.

Garcia

Valencia

Diaz Roldan

Garcia

Amador Ayala

Looby

McMullen

Bavarian

(2022). Prescription stimulant misuse and diversion events among college students: A qualitative study. Journal of Prevention, 43(1), 49–66. https://doi.org/10.1007/s10935-021-00654-z

30.

GRADE Working Group. (2013–2023). GRADE handbook for grading quality of evidence and strength of recommendations. https://gdt.gradepro.org/app/handbook/handbook.html

31.

Green

(2003). Word memory test: User’s manual. Green’s Publishing.

32.

Green

(2004). Medical Symptom Validity Test: User’s manual. Green’s Publishing.

33.

Harrer

Cuijpers

Furukawa

T. A.

Ebert

D. D.

(2021). Doing meta-analysis with R : A hands-on guide (1st ed., Vol. 1–1 online resource). Chapman and Hall/CRC; WorldCat. https://www.taylorfrancis.com/books/e/9781003107347

34.

*Harrison

A. G.

Armstrong

I. T.

(2020). Differences in performance on the test of variables of attention between credible vs. noncredible individuals being screened for attention deficit hyperactivity disorder. Applied Neuropsychology: Child, 9(4), 314–322. https://doi.org/10.1080/21622965.2020.1750115

35.

Higgins

J. P. T.

Thompson

S. G.

(2002). Quantifying heterogeneity in a meta-analysis. Statistics in Medicine, 21(11), 1539–1558. https://doi.org/10.1002/sim.1186

36.

*Hirsch

Christiansen

(2018). Faking ADHD? Symptom validity testing and its relation to self-reported, observer-reported symptoms, and neuropsychological measures of attention in adults with ADHD. Journal of Attention Disorders, 22(3), 269–280. https://doi.org/10.1177/1087054715596577

37.

Ioannidis

J. P.

Trikalinos

T. A.

(2007). An exploratory test for an excess of significant findings. Clinical Trials, 4(3), 245–253. https://doi.org/10.1177/1740774507079441

38.

JBI. (2020). Critical appraisal tool for systematic reviews.

39.

Kirk

J. W.

Baker

D. A.

Kirk

J. J.

MacAllister

W. S.

(2020). A review of performance and symptom validity testing with pediatric populations. Applied Neuropsychology: Child, 9(4), 292–306. https://doi.org/10.1080/21622965.2020.1750118

40.

Lange

R. T.

Lippa

S. M.

(2017). Sensitivity and specificity should never be interpreted in isolation without consideration of other clinical utility metrics. The Clinical Neuropsychologist, 31(6–7), 1015–1028. https://doi.org/10.1080/13854046.2017.1335438

41.

Larrabee

G. J.

(2012). Performance validity and symptom validity in neuropsychological assessment. Journal of the International Neuropsychological Society, 18(4), 625–630. https://doi.org/10.1017/S1355617712000240

42.

Leark

R. A.

Greenberg

L. M.

Kindschi

C. L.

Dupuy

T. R.

Hughes

S. J.

(2007). TOVA professional manual: Test of variables of attention continuous performance test. The TOVA Company.

43.

*Lev

Elbaum

Berger

Braw

(2022). Feigned ADHD Associated cognitive impairment: Utility of integrating an eye-tracker and the MOXO-dCPT. Journal of Attention Disorders, 26(9), 1212–1222. https://doi.org/10.1177/10870547211063643

44.

*Marshall

P. S.

Hoelzle

J. B.

Heyerdahl

Nelson

M. W.

(2016). The impact of failing to identify suspect effort in patients undergoing adult attention-deficit/hyperactivity disorder (ADHD) assessment. Psychological Assessment, 28(10), 1290–1302. https://doi.org/10.1037/pas0000247

45.

*Marshall

P. S.

Schroeder

O’Brien

Fischer

Ries

Blesi

Barker

(2010). Effectiveness of symptom validity measures in identifying cognitive and behavioral symptom exaggeration in adult attention deficit hyperactivity disorder. The Clinical Neuropsychologist, 24(7), 1204–1237. https://doi.org/10.1080/13854046.2010.514290

46.

Mascarenhas

M. A.

Cocunato

J. L.

Armstrong

I. T.

Harrison

A. G.

Zakzanis

K. K.

(2023). Base rates of non-credible performance in a post-secondary student sample seeking accessibility accommodations. The Clinical Neuropsychologist, 37(8), 1608–1628. https://doi.org/10.1080/13854046.2023.2167737

47.

McGuire

Crawford

Evans

J. J.

(2019). Effort testing in dementia assessment: A systematic review. Archives of Clinical Neuropsychology, 34(1), 114–131. https://doi.org/10.1093/arclin/acy012

48.

*Morey

L. C.

(2016). Tests of attentional distraction. Morey Consulting.

49.

Morey

L. C.

(2019). Examining a novel performance validity task for the detection of feigned attentional problems. Applied Neuropsychology: Adult, 26(3), 255–267. https://doi.org/10.1080/23279095.2017.1409749

50.

Mueller

S. T.

Piper

B. J.

(2014). The psychology experiment building language (PEBL) and PEBL test battery. Journal of Neuroscience Methods, 222, 250–259. https://doi.org/10.1016/j.jneumeth.2013.10.024

51.

*Ord

A. S.

Miskey

H. M.

Lad

Richter

Nagy

Shura

R. D.

(2021). Examining embedded validity indicators in Conners continuous performance test-3 (CPT-3). The Clinical Neuropsychologist, 35(8), 1426–1441. https://doi.org/10.1080/13854046.2020.1751301

52.

*Ovsiew

G. P.

Cerny

B. M.

Boer

A. B. D.

Petry

L. G.

Resch

Z. J.

Durkin

N. M.

Soble

J. R.

(2023). Performance and symptom validity assessment in attention deficit/hyperactivity disorder: Base rates of invalidity, concordance, and relative impact on cognitive performance. The Clinical Neuropsychologist, 37(7), 1498–1515. https://doi.org/10.1080/13854046.2022.2162440

53.

Pagán

A. F.

Huizar

Y. P.

Schmidt

A. T.

(2023). Conner’s Continuous Performance Test and adult ADHD: A systematic literature review. Journal of Attention Disorders, 27(3), 231–249. https://doi.org/10.1177/10870547221142455

54.

Page

M. J.

McKenzie

J. E.

Bossuyt

P. M.

Boutron

Hoffmann

T. C.

Mulrow

C. D.

Shamseer

Tetzlaff

J. M.

Akl

E. A.

Brennan

S. E.

Chou

Glanville

Grimshaw

J. M.

Hróbjartsson

Lalu

M. M.

Loder

E. W.

Mayo-Wilson

McDonald

. . .Moher

(2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 372, n71. https://doi.org/10.1136/bmj.n71

55.

Pettersson

Söderström

Nilsson

K. W.

(2018). Diagnosing ADHD in adults: An examination of the discriminative validity of neuropsychological tests and diagnostic assessment instruments. Journal of Attention Disorders, 22(11), 1019–1031. https://doi.org/10.1177/1087054715618788

56.

Puente-López

Pina

Shura

R. D.

Lopez-López

Merten

Martínez-Jarreta

(2025). The risk of bias – symptom and performance validity (RoB-spv): A risk of bias checklist for systematic review and meta-analysis. Clinical Neuropsychologist, 39(7), 1996–2020. https://doi.org/10.1080/13854046.2025.2469354

57.

*Quinn

C. A.

(2003). Detection of malingering in assessment of adult ADHD. Archives of Clinical Neuropsychology, 18, 379–395.

58.

Rhoads

Neale

A. C.

Resch

Z. J.

Cohen

C. D.

Keezer

R. D.

Cerny

B. M.

Jennette

K. J.

Ovsiew

G. P.

Soble

J. R.

(2021). Psychometric implications of failure on one performance validity test: A cross-validation study to inform criterion group definition. Journal of Clinical & Experimental Neuropsychology, 43(5), 437–448. https://doi.org/10.1080/13803395.2021.1945540

59.

Rienstra

Spaan

P. E. J.

Schmand

(2010). Validation of symptom validity tests using a “child-model” of adult cognitive impairments. Archives of Clinical Neuropsychology, 25(5), 371–382. https://doi.org/10.1093/arclin/acq035

60.

*Robinson

Reed

Davis

Divers

Miller

Erdodi

L. A.

Calamia

(2023). Settling the score: Can CPT-3 embedded validity indicators distinguish between credible and non-credible responders referred for ADHD and/or SLD? Journal of Attention Disorders, 27(1), 80–88. https://doi.org/10.1177/10870547221121781

61.

Rodgers

M. A.

Pustejovsky

J. E.

(2021). Evaluating meta-analytic methods to detect selective reporting in the presence of dependent effect sizes. Psychological Methods, 26(2), 141–160. https://doi.org/10.1037/met0000300

62.

Rogers

(1993). Feigning neuropsychological impairment: A critical review of methodological and clinical considerations. Clinical Psychology Review, 13, 255–274. https://doi.org/10.1016/0272-7358(93)90035-D

63.

Rogers

(2018). Researching response styles. In Rogers

Bender

S. D.

(Eds.), Clinical assessment of malingering and deception (4th ed., pp. 592–614). The Guilford Press.

64.

Rosvold

H. E.

Mirsky

A. F.

Sarason

Bransome

E. D.

Beck

L. H.

(1956). A continuous performance test of brain damage. Journal of Consulting Psychology, 20(5), 343–350. https://doi.org/10.1037/h0043220

65.

R Studio Team. (2022). RStudio: Integrated development environment for R (Version 2022.x) [Computer software]. https://www.rstudio.com/

66.

Sanford

J. A.

Turner

(1995). Manual for the integrated visual and auditory continuous performance test. BrainTrain.

67.

Schroeder

R. W.

Martin

P. K.

Heinrichs

R. J.

Baade

L. E.

(2019). Research methods in performance validity testing studies: Criterion grouping approach impacts study outcomes. The Clinical Neuropsychologist, 33(3), 466–477. https://doi.org/10.1080/13854046.2018.1484517

68.

Schuhfried

(2013). Vienna test system (VTS) 8 (Version 8.2.00) [Computer software]. Schuhfried.

69.

Schwarzer

Carpenter

J. R.

Rücker

(2015). Small-study effects in meta-analysis. In Schwarzer

Carpenter

J. R.

Rücker

(Eds.), Meta-analysis with R (pp. 107–141). Springer. https://doi.org/10.1007/978-3-319-21416-0_5

70.

*Scimeca

L. M.

Holbrook

Rhoads

Cerny

B. M.

Jennette

K. J.

Resch

Z. J.

Obolsky

M. A.

Ovsiew

G. P.

Soble

J. R.

(2021). Examining Conners continuous performance test-3 (CPT-3) embedded performance validity indicators in an adult clinical sample referred for ADHD evaluation. Developmental Neuropsychology, 46(5), 347–359. https://doi.org/10.1080/87565641.2021.1951270

71.

Simonsohn

Simmons

J. P.

Nelson

L. D.

(2015). Better P-curves: Making P-curve analysis more robust to errors, fraud, and ambitious P-hacking, a reply to Ulrich and Miller (2015). Journal of Experimental Psychology. General, 144(6), 1146–1152. https://doi.org/10.1037/xge0000104

72.

Soble

J. R.

(2021). Future directions in performance validity assessment to optimize detection of invalid neuropsychological test performance: Special issue introduction. Psychological Injury and Law, 14(4), 227–231. https://doi.org/10.1007/s12207-021-09425-x

73.

*Sollman

M. J.

Ranseen

J. D.

Berry

D. T. R.

(2010). Detection of feigned ADHD in college students. Psychological Assessment, 22(2), 325–335. https://doi.org/10.1037/a0018857

74.

Sterne

J. A. C.

Gavaghan

Egger

(2000). Publication and related bias in meta-analysis: Power of statistical tests and prevalence in the literature. Journal of Clinical Epidemiology, 53(11), 1119–1129. https://doi.org/10.1016/S0895-4356(00)00242-0

75.

*Suhr

J. A.

Sullivan

B. K.

Rodriguez

J. L.

(2011). The relationship of noncredible performance to continuous performance test scores in adults referred for attention-deficit/hyperactivity disorder evaluation. Archives of Clinical Neuropsychology, 26(1), 1–7. https://doi.org/10.1093/arclin/acq094

76.

Sullivan

B. K.

May

Galbally

(2007). Symptom exaggeration by college adults in attention-deficit hyperactivity disorder and learning disorder assessments. Applied Neuropsychology, 14(3), 189–207. https://doi.org/10.1080/09084280701509083

77.

Sweet

J. J.

Heilbronner

R. L.

Morgan

J. E.

Larrabee

G. J.

Rohling

M. L.

Boone

K. B.

Kirkwood

M. W.

Schroeder

R. W.

Suhr

J. A.

, & Conference Participants. (2021). American Academy of Clinical Neuropsychology (AACN) 2021 consensus statement on validity assessment: Update of the 2009 AACN consensus conference statement on neuropsychological assessment of effort, response bias, and malingering. The Clinical Neuropsychologist, 35(6), 1053–1106. https://doi.org/10.1080/13854046.2021.1896036

78.

Tombaugh

T. N.

(1996). Test of memory malingering (TOMM). Multi-Health Systems.

79.

Tucha

Fuermaier

A. B. M.

Koerts

Buggenthin

Aschenbrenner

Weisbrod

Thome

Lange

K. W.

Tucha

(2017). Sustained attention in adult ADHD: Time-on-task effects of various measures of attention. Journal of Neural Transmission, 124(Suppl. 1), 39–53. https://doi.org/10.1007/s00702-015-1426-0

80.

Tucha

Fuermaier

A. B. M.

Koerts

Groen

Thome

(2015). Detection of feigned attention deficit hyperactivity disorder. Journal of Neural Transmission, 122(Suppl. 1), 123–134. https://doi.org/10.1007/s00702-014-1274-3

81.

Ulberstad

(2012). QbTest technical manual. Qbtech AB.

82.

Victor

T. L.

Boone

K. B.

Serpa

J. G.

Buehler

Ziegler

E. A.

(2009). Interpreting the meaning of multiple symptom validity test failure. The Clinical Neuropsychologist, 23(2), 297–313. https://doi.org/10.1080/13854040802232682

83.

Viechtbauer

(2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36, 1–48. https://doi.org/10.18637/jss.v036.i03

84.

Viechtbauer

Cheung

M. W.-L.

(2010). Outlier and influence diagnostics for meta-analysis. Research Synthesis Methods, 1(2), 112–125. https://doi.org/10.1002/jrsm.11

85.

*Winter

Braw

(2022). Validating embedded validity indicators of feigned ADHD-associated cognitive impairment using the MOXO-d-CPT. Journal of Attention Disorders, 26(14), 1907–1913. https://doi.org/10.1177/10870547221112947

86.

Zimmermann

Fimm

(2012). Testbatterie zur Aufmerksamkeitsprüfung [Test of Attentional Performance (TAP)] (Version 2.3). Würselen, Germany: Psytest.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.34 MB