Abstract
Objective:
Auditory hallucinations (hearing voices) have been associated with a range of altered cognitive functions, pertaining to signal detection, source-monitoring, memory, inhibition and language processes. Yet, empirical results are inconsistent. Despite this, several theoretical models of auditory hallucinations persist, alongside increasing emphasis on the utility of a multidimensional framework. Thus, clarification of current evidence across the broad scope of proposed mechanisms is warranted.
Method:
A systematic search of the Web of Science, PubMed and Scopus databases was conducted. Records were screened to confirm the use of an objective behavioural cognitive task, and valid measurement of hallucinations specific to the auditory modality.
Results:
Auditory hallucinations were primarily associated with difficulties in perceptual decision-making (i.e. reduced sensitivity/accuracy for signal-noise discrimination; liberal responding to ambiguity), source-monitoring (i.e. self–other and temporal context confusion), working memory and language function (i.e. reduced verbal fluency). Mixed or limited support was observed for perceptual feature discrimination, imagery vividness/illusion susceptibility, source-monitoring for stimulus form and spatial context, recognition and recall memory, executive functions (e.g. attention, inhibition), emotion processing and language comprehension/hemispheric organisation.
Conclusions:
Findings were considered within predictive coding and self-monitoring frameworks. Of concern was the portion of studies which – despite offering auditory-hallucination-specific aims and inferences – employed modality-general measures, and/or diagnostic-based contrasts with psychologically healthy individuals. This review highlights disparities within the literature between theoretical conceptualisations of auditory hallucinations and the body of rigorous empirical evidence supporting such inferences. Future cognitive investigations, beyond the schizophrenia-spectrum, which explicitly define and measure the timeframe and sensory modality of hallucinations, are recommended.
Introduction
Auditory hallucinations (AHs), also known as hearing voices, are recognised as the perception of sound in the absence of corresponding external stimuli, often characterised by a sense of reality and lack of control (David, 2004). Theoretical models postulate that AHs reflect the externalisation of mental events, such as inner speech (Jones and Fernyhough, 2007) or intrusive/fragmented memories (Waters et al., 2006), and/or the exacerbation of environmental cues via increased hypervigilance or attributed salience (Dodgson and Gordon, 2009; Kapur, 2003). Similarly, broad frameworks of AHs suggest common difficulties in the recognition of self-generated mental events (Waters et al., 2012b), hemispheric organisation for language processing (Hugdahl et al., 2012) and the weighting of perceptual input against prior expectation (Corlett et al., 2019; Sterzer et al., 2018). A range of cognitive mechanisms have been implicated, including challenges in auditory signal detection and response biases (Bentall and Slade, 1985; McLachlan et al., 2013), source-monitoring (Bentall, 1990; Woodward et al., 2007), memory function (Brébion et al., 2020), attention modulation and inhibition (Badcock et al., 2005; Waters et al., 2003), language processes (Docherty, 2012; Green et al., 1994) and emotion detection (Alba-Ferrara et al., 2013; Rossell and Boundy, 2005; Shea et al., 2007). However, a notable portion of non-significant findings have also been reported (e.g. Brébion et al., 2008; Johns et al., 2001; McKay et al., 2000; McLachlan et al., 2013; Pinheiro et al., 2016; Rocca et al., 2006; Rossell et al., 2001; Schnakenberg Martin et al., 2018; Toh et al., 2020).
Previous meta-analyses, focusing on specific domains of cognition, have demonstrated moderate-to-large associations between hallucinations and a self-monitoring bias (Waters et al., 2012b), increased externalising source-monitoring errors (Brookwell et al., 2013; Damiani et al., 2022) and reduced left-hemispheric language lateralisation (Ocklenburg et al., 2013). Source-monitoring difficulties were observed in the context of auditory stimuli and imagined mental events (Damiani et al., 2022), and identified in spite of intact broader memory function (Waters et al., 2012b). While such reviews have provided valuable integration of findings, some methodological inconsistencies are evident. For example, the sensory modality (e.g. auditory and/or additional senses) and specified timeframe of hallucinations were not always identified. In addition, group contrasts based on diagnostic status (e.g. schizophrenia vs no history of mental illness) commonly contributed to findings. Thus, it is difficult to discern the extent to which findings are relevant to AHs specifically, as opposed to hallucinations of other forms, psychosis syndromes or broader diagnostic groups.
Numerous narrative and systematic reviews have summarised a breadth of cognitive functions in the context of proposed theoretical models (e.g. Seal et al., 2004; Tracy and Shergill, 2013; Upthegrove et al., 2016; Waters et al., 2012a). These papers emphasise that competing theoretical accounts are not mutually exclusive, and that no single mechanism can sufficiently account for AH emergence. Thus, a multidimensional framework has been advanced, along with recommendations for transdiagnostic investigation at the symptom level. While inconsistencies regarding the definition and measurement of AHs have been acknowledged, limited consideration has been given to how this may affect the substantiation of corresponding theoretical models. Likewise, poor measurement practices relating to cognitive task development make interpretation of disparate findings difficult (Smailes et al., 2022). Subsequently, a lack of clarity and consensus remains when interpreting and integrating the abundance of cognitive studies at a conceptual level. Thus, a systematic review of the relationships between AHs and the range of proposed cognitive mechanisms is warranted.
Study aims
The current systematic review aimed to synthesise the literature examining cognition in relation to the presence and severity of AHs accompanied by a psychiatric diagnosis. We omitted hallucination proneness within psychologically healthy populations, to focus on cognitive profiles likely associated with a need for care. Cognition was considered across the broad range of mechanisms, with the aim of comparing the relative proportion and nature of evidence representing each. To address existing gaps, the sensory modality and explicit timeframe of AHs were carefully considered. Inferences regarding the alignment of empirical findings to theoretical frameworks were offered, where possible.
Method
Protocol pre-registration
A protocol for the systematic review was developed in line with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis Protocols (PRISMA-P; Moher et al., 2015), and registered with the International Prospective Register of Systematic Reviews (PROSPERO; CRD42020148907).
Search strategy
A systematic search of the Web of Science, Scopus and PubMed databases was employed to identify peer-reviewed empirical records written in the English language, published from 1 January 1980 to 2 August 2022. Details of the search strategy, including full syntax, are reported in Supplemental Appendix A.
Study selection
The search yielded 22,391 records, after duplicate removal. A single reviewer (A.B.) screened the title, abstract and keywords against the eligibility criteria. Records were retained if they met the following criteria: (1) part of an original group design empirical investigation, 1 (2) an adult participant sample (i.e. 18–65 years), (3) AHs accompanied by a psychiatric diagnosis (i.e. individuals without a psychiatric diagnosis were omitted), (4) participant sample omitting organic illness or injury (e.g. neurological, substance-induced or other major medical conditions), (5) cognition investigated with reference to AHs, by means of an objective behavioural task and (6) the presence/severity of AHs was identified via appropriate clinical judgement and/or a valid measure specific to, or adapted for, the auditory modality (current and lifetime timeframes eligible). Independent reviewers (A.B., S.L.R.) screened the full-text articles retained at Stage 2. Discrepancies in eligibility were resolved during regular meetings, with mediation from a third reviewer (W.L.T.).
Following screening, 101 records met the inclusion criteria. Several records were omitted from data synthesis given that (1) no AH-specific data was available (n = 9) (i.e. group contrasts between clinical AHs and healthy control groups only), (2) as analyses pertained to phenomenological variations of AHs rather than presence or severity (n = 1) or (3) AHs were examined in the context of borderline personality disorder (n = 1). Although a transdiagnostic approach was initially intended, the latter instance was the only study to fall outside the schizophrenia-spectrum (i.e. schizophrenia or mood related diagnosis with psychotic features), and was therefore considered clinically distinct and unsuitable for synthesis. Subsequently, 90 studies remained for data extraction (see Figure 1).

PRISMA record screening.
Data extraction
The following data were extracted: record identifying information, participant groups (psychiatric diagnosis and/or AH presence/absence), AH frequency/severity, cognitive task(s) and associated outcome variables, analyses and data relevant to relationships between AHs and cognition (e.g. direction, effect sizes and significance coefficients, where available). Group sample characteristics (e.g. age, sex) were extracted where possible. Given the extent of relevant data, no attempts were made to contact authors to obtain missing information. Data were summarised into three streams: group contrasts, correlations and other analyses (e.g. regression, mediation and discriminant function). Where possible, available effect sizes were converted to Cohen’s d (Lenhard and Lenhard, 2016), with d = 0.2 interpreted as small, d = 0.5 as medium and d = 0.8 as large in magnitude (Cohen, 1988). That said, these benchmarks are arbitrary, and small effect sizes can be equally important (Lakens, 2013). Furthermore, given typically limited sample sizes, effect confidence intervals are likely to be wide and should be interpreted with caution.
Quality assessment
The Newcastle Ottawa Scale (NOS; Wells et al., 2000) was used as a template for methodological quality assessment. Key factors were examined, and studies were assigned an overall quality score from 1 to 5. Categorical quality rankings were developed, based on the median and quartile split of total scores. These consisted of low (2.50–3.49), moderate (3.50–4.06), high (4.07–4.39) and very high (4.40–5.00). See Supplemental Appendix B for checklist items and anchors, alongside associated scores and categorical rankings.
Data analysis
To aid interpretation, studies were tabulated by broad cognitive domain: perceptual decision-making and imagery (n = 30), source-monitoring and context memory (n = 28), general memory (n = 29), language processing, production and hemispheric organisation (n = 14), executive function (n = 23), and emotion processing and social cognition (n = 9) (details in Supplemental Appendix C). Subsequently, refined cognitive sub-domains were formed by grouping results associated with similar outcome variables. These groupings accounted for all data in the results tables, except for the executive function domain, which had too much variation to meaningfully group. Hence, only key sub-domains were considered: 2 attention, inhibition and cognitive flexibility/set-shifting.
As outlined in the pre-registered protocol, a minimum of 10 studies per cognitive sub-domain was required to proceed with meta-analysis. After grouping studies by homogeneous task design, variable definition and analysis form, only one cognitive sub-domain met this criterion (i.e. 14 group contrasts for internal–external source misattributions). Given that the primary aim of the systematic review was to compare results across the range of proposed cognitive mechanisms, and that meta-analysis of source-monitoring data has been well documented (see Brookwell et al., 2013; Damiani et al., 2022; Waters et al., 2012b), performing a quantitative synthesis for this single sub-domain was considered unsuitable. It is possible to conduct meta-analysis of broader, collapsed groups of data, with follow-up statistics to accommodate study deviations (e.g. meta-regression, subgroup and/or sensitivity analysis). However, caution has been drawn to implementing this approach to heterogeneous datasets, due to challenges in defining these points of variation (e.g. Imrey, 2020). Within this literature, we observed notable heterogeneity in the definition of constructs, the use of novel experimental paradigms and the interpretation of numerous outcome variables. Thus, we chose to proceed with a narrative synthesis to allow for consideration of study nuances, as documented comprehensively within the results tables. To facilitate the integration of a broad array of data, a counting method was adopted, to identify the number of studies with significant, mixed and non-significant findings pertaining to each cognitive sub-domain, as relevant to the three analysis streams (see Supplemental Appendix C for details).
Results
Sample characteristics
Across 81 unique samples, 3 a total of 4933 individuals with a schizophrenia-spectrum diagnosis were represented in AH-specific groups/analyses, with a mean age of 34 years and gender ratio of 13 males:7 females.
AH status
Many studies explored cognition in the context of current AHs (n = 50), and to a lesser extent, lifetime (n = 18) and past AHs (n = 5). A notable portion did not explicitly specify the timeframe (n = 23). Only a small number of studies explored several AH status variations (n = 6).
Cognitive domains
The full set of data extracted from the 90 studies, arranged by cognitive domain, can be seen in Tables 1–6; with the proportions of significant, mixed and non-significant findings displayed in Figure 2. Of the 21 identified sub-domains, in 9 cases, the number of investigations with any significant associations to AHs (i.e. significant or mixed results overall) outweighed the number of non-significant findings: spatial context attribution (3/3), language comprehension (5/7), signal detection response bias (7/10), temporal context attribution (4/6), working memory (9/13), verbal fluency/linguistic cohesion (5/8), signal-noise discrimination sensitivity (9/16), internal–external source attribution (11/21) and hemispheric language organisation (4/7). Below is a summary of the significant findings within each domain.
Perceptual decision-making and imagery.
Diagnoses: BPAD: bipolar-affective disorder, SCZ: schizophrenia, SAD: schizo-affective disorder, SSD: schizophrenia-spectrum diagnosis. Where diagnostic information was not provided by authors, groups/variables were labelled according to AH status only.
Auditory hallucination status: CAH: current auditory hallucinations, LAH: lifetime auditory hallucinations (current status not measured), AH: auditory hallucinations present (timeframe not specified), NCAH: no current auditory hallucinations (lifetime history not measured), NLAH: no lifetime history of auditory hallucinations, PAH: past auditory hallucinations, NAH: no auditory hallucinations (timeframe not specified). Groups defined by general ‘hallucination’ status (i.e. sensory modality not specified) were not extracted. AH status was labelled according to the timeframes explicitly specified by authors in measurement descriptions. In the case of ambiguity, status inferences were not extrapolated from typical timeframes of administered measures, given that the implementation of assessment in its true form could not be confirmed.
Notes: Higher AH and cognition scores indicate increased severity and superior performance, respectively, unless otherwise specified. Task conditions are listed in parentheses following relevant variables. Significant findings (p < 0.05) are listed in bold font. NS: non-significant findings.
Additional group contrasts omitted due to no AH-specific data available (i.e. clinical AH vs healthy control only).
AH variable where higher scores indicate lower severity.
Lower scores indicate a larger effect of auditory imagery over perceptual input, while higher scores indicate less weighting of imagery.
Lower scores indicate a liberal pattern of responding, while higher scores indicate a more conservative pattern of responding.
Effect size values, means and standard deviations not available. Unable to calculate Cohen’s d.
Where positive scores indicate a preference for auditory imagery, while negative scores indicate a preference for visual imagery.
EA: predisposition to external attention; IA: predisposition to internal attention.
Effect size values not available. Cohen’s d calculated from means and standard deviations.
Significance value/level not reported, described as ‘significant’ by authors.
Effect size unable to be converted to Cohen’s d.
Additional results involving phenomenological AH variables not extracted.
Cognitive variable where higher scores indicate poorer rhythm reproduction ability.
Source-monitoring and context memory.
Diagnoses: SCZ: schizophrenia, SAD: schizo-affective disorder, SSD: schizophrenia-spectrum diagnosis. Where diagnostic information was not provided by authors, groups/variables were labelled according to AH status only.
Auditory hallucination status: CAH: current auditory hallucinations, LAH: lifetime auditory hallucinations (current status not measured), AH: auditory hallucinations present (timeframe not specified), NCAH: no current auditory hallucinations (lifetime history not measured), NLAH: no lifetime history of auditory hallucinations, PAH: past auditory hallucinations, NAH: no auditory hallucinations (timeframe not specified). Groups defined by general ‘hallucination’ status (i.e. sensory modality not specified) were not extracted. AH status was labelled according to the timeframes explicitly specified by authors in measurement descriptions. In the case of ambiguity, status inferences were not extrapolated from typical timeframes of administered measures, given that the implementation of assessment in its true form could not be confirmed.
Note: Task conditions are listed in parentheses following relevant variables. Higher AH and cognition scores indicate increased severity and superior performance, respectively, unless otherwise specified. Significant findings (p < 0.05) are listed in bold font. NS: non-significant findings.
Additional group contrasts omitted due to no AH-specific data available (i.e. clinical AH vs healthy control only).
AH variable where higher scores indicate lower severity.
Effect size values, means and standard deviations not available. Unable to calculate Cohen’s d.
Effect size unable to be converted to Cohen’s d.
Additional results involving phenomenological AH variables not extracted.
SCZ-NLAH did not show above-chance deficits of 1–2 standard deviation(s) from healthy control mean, thus scores were not entered into group contrasts.
General memory.
Diagnoses: SCZ: schizophrenia, SAD: schizo-affective disorder, SSD: schizophrenia-spectrum diagnosis. Where diagnostic information was not provided by authors, groups/variables were labelled according to AH status only.
Auditory hallucination status: CAH: current auditory hallucinations, LAH: lifetime auditory hallucinations (current status not measured), AH: auditory hallucinations present (timeframe not specified), NCAH: no current auditory hallucinations (lifetime history not measured), NLAH: no lifetime history of auditory hallucinations, PAH: past auditory hallucinations, NAH: no auditory hallucinations (timeframe not specified). Groups defined by general ‘hallucination’ status (i.e. sensory modality not specified) were not extracted. AH status was labelled according to the timeframes explicitly specified by authors in measurement descriptions. In the case of ambiguity, status inferences were not extrapolated from typical timeframes of administered measures, given that the implementation of assessment in its true form could not be confirmed.
Task abbreviations: BACS: Brief assessment of cognition in Schizophrenia, LNNB: Luria-Nebraska Neuropsychological Battery, MCCB: Matrics Consensus Cognitive Battery, PALPA: Psycholinguistic Assessment of Language Processing, WAIS-III: Wechsler Adult Intelligence Scales, third edition, WMS-R: Wechsler Memory Scales, Revised.
Note: Task conditions are listed in parentheses following relevant variables. Higher AH and cognition scores indicate increased severity and superior performance, respectively, unless otherwise specified. Significant findings (p < 0.05) are listed in bold font. NS: non-significant findings.
Lower scores indicate a conservative pattern of responding, while higher scores indicate a more liberal pattern of responding.
Additional group contrasts omitted due to no AH-specific data available (i.e. clinical AH vs healthy control only).
Effect size unable to be converted to Cohen’s d.
Effect size values, means and standard deviations not available. Unable to calculate Cohen’s d.
Additional results involving phenomenological AH variables not extracted.
Language processing, production and hemispheric organisation.
Diagnoses: SCZ: schizophrenia, SAD: schizo-affective disorder, SSD: schizophrenia spectrum diagnosis. Where diagnostic information was not provided by authors, groups/variables were labelled according to AH status only.
Auditory hallucination status: CAH: current auditory hallucinations, LAH: lifetime auditory hallucinations (current status not measured), AH: auditory hallucinations present (timeframe not specified), NCAH: no current auditory hallucinations (lifetime history not measured), NLAH: no lifetime history of auditory hallucinations, PAH: past auditory hallucinations, NAH: no auditory hallucinations (timeframe not specified). There was one study for which these definitions did not apply, with groups described as ‘frequent’ (FAH) and ‘non-frequent’ (NFAH) instead (Løberg et al., 2015). Groups defined by general ‘hallucination’ status (i.e. sensory modality not specified) were not extracted. AH status was labelled according to the timeframes explicitly specified by authors in measurement descriptions. In the case of ambiguity, status inferences were not extrapolated from typical timeframes of administered measures, given that the implementation of assessment in its true form could not be confirmed.
Task abbreviations: LNNB: Luria-Nebraska Neuropsychological Battery, PALPA = Psycholinguistic Assessment of Language Processing.
Note: Task conditions are listed in parentheses following relevant variables. Higher AH and cognition scores indicate increased severity and superior performance, respectively, unless otherwise specified. Significant findings (p < 0.05) are listed in bold font. NS: non-significant findings.
Additional group contrasts omitted due to no AH-specific data available (i.e. clinical AH vs healthy control only).
Effect size values not available. Cohen’s d calculated from means and standard deviations.
Laterality index: positive scores indicate a right ear advantage (REA), while negative scores indicate a left ear advantage (LEA).
Participants examined at baseline (T1) and 3 months post admission (T2).
Cognitive variable where higher scores indicate poorer expressive language performance.
Executive function.
Diagnoses: SCZ: schizophrenia, SAD: schizo-affective disorder, SSD: schizophrenia-spectrum diagnosis. Where diagnostic information was not provided by authors, groups/variables were labelled according to AH status only.
Auditory hallucination status: CAH: current auditory hallucinations, LAH: lifetime auditory hallucinations (current status not measured), AH: auditory hallucinations present (timeframe not specified), NCAH: no current auditory hallucinations (lifetime history not measured), NLAH: no lifetime history of auditory hallucinations, PAH: past auditory hallucinations, NAH: no auditory hallucinations (timeframe not specified). There was one study for which these definitions did not apply, with groups described as ‘frequent’ (FAH) and ‘non-frequent’ (NFAH) instead (Løberg et al., 2015). Groups defined by general ‘hallucination’ status (i.e. sensory modality not specified) were not extracted. AH status was labelled according to the timeframes explicitly specified by authors in measurement descriptions. In the case of ambiguity, status inferences were not extrapolated from typical timeframes of administered measures, given that the implementation of assessment in its true form could not be confirmed.
Task abbreviations: FAB: Frontal assessment battery, LNNB: Luria-Nebraska Neuropsychological Battery, MCCB: Matrics Consensus Cognitive Battery, WAIS-III & WAIS-R: Wechsler Adult Intelligence Scales, third edition & revised.
Note: Task conditions are listed in parentheses following relevant variables. Higher AH and cognition scores indicate increased severity and superior performance, respectively, unless otherwise specified. Significant findings (p < 0.05) are listed in bold font. NS: non-significant findings.
Additional group contrasts omitted due to no AH-specific data available (i.e. clinical AH vs healthy control only).
Effect size values, means and standard deviations not available. Unable to calculate Cohen’s d.
Significance value/level not reported (identified as ‘significant’ by author).
Additional results involving phenomenological AH variables not extracted.
Cognitive variable where higher scores indicate poorer attentional performance.
SCZ-NLAH did not show above-chance deficits of 1–2 standard deviation(s) from healthy control mean, thus these scores were not entered into group contrasts.
Primary data from the Memory for Context task were included in the source monitoring table. Task included here as relevant to data point examining the combined prevalence of inhibition and context memory deficits in the context of AHs (Waters et al., 2006).
Emotion processing and social cognition.
Diagnoses: BPAD: bipolar-affective disorder, SCZ: schizophrenia, SAD: schizo-affective disorder, SSD: schizophrenia-spectrum diagnosis. Where diagnostic information was not provided by authors, groups/variables were labelled according to AH status only.
Auditory hallucination status: CAH: current auditory hallucinations, LAH: lifetime auditory hallucinations (current status not measured), AH: auditory hallucinations present (timeframe not specified), NCAH: no current auditory hallucinations (lifetime history not measured), NLAH: no lifetime history of auditory hallucinations, PAH: past auditory hallucinations, NAH: no auditory hallucinations (timeframe not specified). Groups defined by general ‘hallucination’ status (i.e. sensory modality not specified) were not extracted. AH status was labelled according to the timeframes explicitly specified by authors in measurement descriptions. In the case of ambiguity, status inferences were not extrapolated from typical timeframes of administered measures, given that the implementation of assessment in its true form could not be confirmed.
Task abbreviations: CATS: Comprehensive Affective Testing System, DANVA: Diagnostic Analysis of Nonverbal Accuracy 2, MCCB: Matrics Consensus Cognitive Battery.
Note: Task conditions are listed in parentheses following relevant variables. Higher AH and cognition scores indicate increased severity and superior performance, respectively, unless otherwise specified. Significant findings (p < 0.05) are listed in bold font. NS: non-significant findings.
Effect size values not available. Cohen’s d calculated from means and standard deviations.
Effect size values, means and standard deviations not available. Unable to calculate Cohen’s d.
Additional group contrasts omitted due to no AH-specific data available (i.e. clinical AH vs healthy control only).

The proportion of non-significant, mixed and significant results per cognitive sub-domain, identified via group contrasts (AH presence), correlations (AH severity) and regression/other analyses.
Perceptual decision-making and imagery
This domain encompassed data relevant to the detection of sensory input, primarily within the auditory modality, but also visual, and to a lesser extent tactile stimuli. Key variables reflected signal detection and/or noise discrimination sensitivity/accuracy, discrimination of perceptual qualities and response biases in the expectation of input. The intensity, integration and weighting of imagery in relation to these perceptual processes are also reflected here (see Table 1). Significant results suggested an association between AH presence/severity and reduced sensitivity/accuracy for signal-noise discrimination (d = 0.47–1.50), liberal responding in the face of stimulus ambiguity (d = 0.42–2.42), reduced stimulus feature discrimination capacity (d = 0.04–1.00) and increased vividness/weighting of imagery or illusion susceptibility (d = 1.96–2.84). Conversely, one study noted superior stimulus feature discrimination in participants with current, relative to past AHs, citing exaggerated environmental scanning as a possible contributor (Schneider and Wilson, 1983). In addition, AH severity was observed as a predictor of spatial detection sensitivity (d = 1.5), and signal expectation bias (d = 0.33); while signal detection also predicted the AH presence (χ2 = 12.1).
Source-monitoring and context memory
This domain reflected a complex set of processes involved in the monitoring of, and memory for, contextual aspects of presented stimuli and mental events. These included temporal order, visuospatial location, external form (e.g. picture vs word stimulus), internal form (e.g. thought vs said), and the differentiation of internal–external source (e.g. imagined/read vs heard; self vs other speech/action). Primary variables pertained to the number of source misattributions, as well as error rates and accuracy scores (see Table 2). In most instances the direction of internal–external source-monitoring was specified (e.g. externalising vs internalising misattributions), however, in some cases this was not discernible (n = 2). Significant results suggested an association between AH presence/severity and greater challenges in distinguishing: temporal presentation order (d = 0.64–0.85), spatial presentation location (d = 1.04–1.12), internal–external stimulus source (d = 0.57–2.82), internal mental event form (d = 0.49–1.12) and external stimulus form (d = 1.01–1.12). In addition, AH severity was observed as a predictor of misattribution for temporal context (d = 1.22), spatial context (β = 0.53) and internal–external source (d = 1.25).
General memory
The general memory domain examined the retention of auditory and visual information, held across the short- and long-term, assessed via recall and recognition retrieval methods. Working memory paradigms were also included. In many instances, source-monitoring tasks involved a recognition component (e.g. identification of ‘new’ stimuli), and these results are reflected here. Key variables included the accuracy and error rate of recognised information, capacity of recall or working memory and composite memory scores reflecting overall performance on subtests (see Table 3). Significant results suggested an association between AH presence/severity and greater difficulties with: recall memory span (d = 0.58–0.90), recognition capacity or bias (d = 1.01–1.45), and working memory (d = 0.54–1.07). In addition, AH severity was observed as a predictor of picture (d = 1.01–0.77) and word list (d = 0.90–0.98) recognition abilities, while working memory also predicted AH presence/severity (β = 0.61).
Language processing, production and hemispheric organisation
This domain related to the processing and production of spoken language. The former is commonly examined via dichotic listening paradigms and considered within the context of a hemispheric organisation framework. A laterality index was often provided, reflecting the relative accuracy of detected input within the left and right ears. Language processing was also examined via broader measures of comprehension (e.g. higher order and integrative language processes). With respect the production of spoken language, key variables included verbal fluency and speech samples analysed for markers of coherence (see Table 4). Significant results suggested an association between AH presence/severity and a reduction in: left-hemispheric lateralisation for speech processing (d = 0.39–0.91), verbal fluency and coherence (d = 0.77–1.01), and language comprehension capacity (d = 0.48–1.15). Conversely, one study found increased verbal coherence in those with lifetime AHs, relative to those without, citing thought disorder within clinical control groups as a possible contributor (Thompson and Copolov, 1998). In addition, metaphor processing ability (d = 0.87) and semantic fluency (d = 0.49) were observed as predictors of AH severity and presence, respectively.
Executive function
More ‘general’ processes were grouped to form an executive function domain. This included processes such as inhibition, set-shifting, attention, processing speed and reasoning. These constructs were frequently measured as part of an overarching battery of neurocognitive tests, with each process represented by a particular subtest or a composite subscale score (see Table 5). Significant results suggested an association between AH presence/severity and reduced capacities across: inhibition (d = 0.71–1.45) and attention (d = 1.06–1.42). Additional isolated findings indicated associations between AH presence/severity and reduced: processing speed 4 (d = 0.71–0.98), vocabulary (d = 0.86) and visual learning (d = 1.24). AH presence was also observed as a predictor of visual attention (d = 0.49) and set-shifting (d = 0.48), while AH severity was seen to predict inhibition (d = 0.63). Similarly, set-shifting (d = 0.12) and visual learning (d = 0.26) were observed as predictors of AH presence.
Emotion processing and social cognition
The emotion processing and social cognition domain related to the recognition of affect across multiple forms (e.g. facial expression, prosodic tone), and inference of others’ state of mind or emotional perspective. Key variables included accuracy and sensitivity in the detection and differentiation of presented emotions, as well as composite scores for broad social processing (see Table 6). Significant results suggested an association between AH presence/severity and greater difficulties in the accurate detection of emotional expression (d = 0.25–0.77), with variation across specific emotions. In addition, ‘mentalising ability’ was observed as a predictor of AH presence (d = 0.93).
Quality assessment
The proportion of quality ratings per cognitive domain are shown in Table 7. For studies ranked the highest in quality, average checklist scores yielded similarly high rankings (4+) for all but the AH definition item (~3.5). For studies ranked the lowest in quality, average checklist scores were lowest for diagnostic status measurement, AH measurement and selection of controls (1–3). Regarding AH-specific language, 81% of studies demonstrated consistency across aims, methods and inferences, and these were relatively evenly spread across the domains. Of the studies which conducted group contrasts (n = 56) and correlations (n = 35), 36% and 69% of these reported at least 20 participants per group/analysis, respectively. A breakdown of studies reporting 20+ cases 5 for group contrasts, per cognitive domain, is as follows: emotion processing and social cognition (67%), executive function (55%), language function (50%), perceptual decision-making and imagery (43%), general memory (32%) and source-monitoring/context memory (11%).
Proportion of studies per cognitive domain within each quality range.
Discussion
Summary of findings
The current review offered a novel perspective by systematically summarising task-based measures of cognition across a broad range of domains, specifically relating to clinical AHs. Cognitive sub-domains with the highest frequency of reported significant findings suggested that AHs were associated with reduced sensitivity/accuracy for signal-noise discrimination and biases towards liberal responding for ambiguous stimuli, source-monitoring confusion for temporal order and self–other origin, reduced working memory capacity and reduced language coherency/fluency. Signal detection difficulties were noted for both auditory and visual stimuli and yielded medium-to-large effects. Challenges in distinguishing and remembering the temporal presentation order and self–other source of stimuli, along with reduced working memory capacity, yielded medium-to-large effects and were generally consistent with previous meta-analyses (Brookwell et al., 2013; Damiani et al., 2022; Waters et al., 2012b). In instances where the direction of self–other source attribution was distinguished, AHs were associated with a tendency to misattribute self-generated stimuli (e.g. thoughts, speech, actions) to an external source, but not vice versa. Finally, medium-to-large effects were observed for reduced speech coherence, and to a lesser extent, verbal fluency. This was generally in line with prior integrative reviews, except for some sub-domains outlined below (e.g. Seal et al., 2004; Upthegrove et al., 2016; Waters et al., 2012a).
Primarily mixed findings, due to inconsistent results across outcome variables, were found for the monitoring of spatial context, language comprehension and hemispheric organisation. Notably, language comprehension demonstrated limited cohesion as a cognitive construct (e.g. metaphor processing, lexical competition, serial linguistic expectation; Hoffman et al., 1999; Siddi et al., 2016; Titone and Levy, 2004), while spatial context monitoring comprised only two studies. For hemispheric organisation of language function, variables extracted from dichotic listening tasks were interpreted inconsistently, thus undermining the strength of inferences made in some cases. While our findings offer less robust support for language lateralisation than previous reviews (Ocklenburg et al., 2013), our approach to screening AH measurement led to a narrowed selection of eligible data, likely accounting for discrepancies.
The sub-domains with predominantly non-significant associations to AHs were stimulus sensory feature discrimination, imagery vividness or illusion susceptibility, source-monitoring for internal events and external stimuli form, recognition and recall memory abilities, executive function, and processing of emotion and social cues. These results are consistent with previous accounts cautioning the relevance of mental imagery aberrations to AHs (Seal et al., 2004). While the past literature has suggested a possible role for sensory feature discrimination (McKay et al., 2000; McLachlan et al., 2013), and cognitive inhibition (Badcock and Hugdahl, 2014), these ideas were not well supported by the current empirical results. However, it is also noted that sub-components of these functions (e.g. automatic vs intentional and cognitive vs behavioural inhibition), which may yield distinct findings, were not examined here. Although broad summaries of executive function and AHs are yet to be provided, our observation of limited significant findings were mostly consistent with recent empirical conclusions (e.g. Toh et al., 2020). Given that a critical portion of literature was omitted from this review due to a lack of methodological specificity to AHs, the absence of substantial evidence noted in respect to the above cognitive sub-domains does not necessarily disconfirm the relevance of these processes, but rather highlights the need for further methodological rigour in validating such inferences. That said, it is important that the proportion of non-significant results, spanning many of the cognitive sub-domains (Figure 2), are not overlooked. Similarly, the issue of publication bias should be acknowledged, as non-significant findings may often not be retained/accepted for publication.
Quality assessment of studies indicated a breadth of rankings, with no clear differences across the cognitive domains. Studies of lower quality ranking exhibited the most difficulty in measuring diagnostic status and AH experiences, along with employing appropriate selection methods for control groups. Studies ranked the highest in quality still demonstrated difficulty in AH definition, relative to the other methodological considerations. Notably, just over a third of studies employing group contrasts reported a minimum of 20+ cases per group; 6 with difficulties in reaching this threshold most frequently observed in the source-monitoring literature.
Key take-home messages
Distinguishing AHs from diagnostic categories
A primary methodological concern was the lack of distinction between AHs and broader psychiatric diagnoses. Investigating cognitive profiles linked to clinical AHs with reference to a clinical (i.e. schizophrenia-spectrum diagnosis with no AH history) vs non-clinical (i.e. no major psychiatric history) control groups could reasonably be expected to yield distinct findings. Yet limited consideration for this issue has been offered in empirical studies. We chose to include only data which contrasted the presence and absence of AHs in the context of a psychiatric diagnosis. This was done with view that non-clinical control comparisons could not adequately distinguish AHs from broader diagnostic profiles. As a result, group contrast data for 25 retained studies were wholly excluded. Although these studies identified AHs as a variable of interest, appropriate methodological adjustments were not afforded. Subsequently, in some cases, differences can be seen between the original interpretations offered by authors, who drew on diagnostic-based contrasts to substantiate findings, and our interpretation of available secondary AH-specific data (e.g. follow-up symptom-based correlations). Furthermore, effectively isolating AHs from broader diagnoses is made increasingly difficult given the high level of comorbidity observed between schizophrenia and other conditions, such as depression or post-traumatic stress (Buckley et al., 2009). Thus, even if the presence/absence of AHs is examined exclusively within clinical samples, a range of different symptom profiles (e.g. mood state, trauma history, delusional ideation) may also interact with cognition. Yet, feasibly accounting for a large set of psychological factors presents quite a challenge. Our quality assessment found that approximately half of the included studies attempted to control for extraneous factor(s) as part of their analyses, while slightly over a third accounted for this by either matching, or examining the presence of statistical differences in, participant groups for key clinical variable(s). Considering the application of symptom dimensions on a broader scale, the relevance of transdiagnostic and continuum-based research is noted (e.g. De Leede-Smith and Barkus, 2013; Waters et al., 2012a). Notably, a transdiagnostic overview of cognitive research was not possible here, due to a lack available studies examining AHs beyond the schizophrenia-spectrum; thus, demonstrating a notable gap within the literature.
Distinguishing AHs from related experiences
The measures employed to confirm or quantify the experience of AHs were vast, and in some instances, utilised in ways that failed to isolate AHs from broader psychosis symptomatology or cross-modality hallucination experiences. For example, gold-standard assessments, such as the Positive and Negative Syndrome Scale (PANSS; Kay et al., 1987) and Scale for the Assessment of Positive Symptoms (SAPS; Andreasen, 1984) were utilised frequently. While these measures offer good validity when employed in their intended form, they do not quantify the experience of hallucinations of an auditory nature specifically, unless relevant items are selected, or scoring is adapted accordingly. While many excluded studies provided aims and inferences in line with their employed methodology, a proportion offered interpretations in relation to AHs, despite a lack of methodological alignment. Notably, several of these studies represent seminal investigations which have been readily adopted by the field and informed theoretical models. A key driver may be the tendency to interchange the terms ‘hallucinations’ and ‘auditory hallucinations’ in academic language. Historically, AHs have received most attention, and as our recognition of hallucinations across the range of senses evolves (McCarthy-Jones et al., 2017), appropriate use of terminology and ensuing assessment measures, has lagged (Toh et al., 2019). While many studies exhibiting these discrepancies were screened out, ~20% of retained studies displayed inconsistencies in AH-specific terminology employed across aims, methodology and inferences.
Distinguishing variations in AH status
Many studies explored current, as opposed to lifetime or past AH experiences. Importantly, the presence of current/past AHs were often reported without consideration for, or measurement of, alternate timeframes. This highlights the potential for group overlap, and consequently difficulties in effectively isolating state from trait profiles. Similarly, there was much between-study variation in the definition of timeframes. For example, current AHs ranged from endorsement during testing, to confirmation over preceding weeks or even months. Similarly, while lifetime presence of AHs was sometimes guided by criteria, such as a specified number of episodes, this was not always the case. Furthermore, a notable portion of studies did not explicitly specify the time-period of interest when examining AHs. Longitudinal research has suggested that some cognitive profiles linked to schizophrenia may endure over time, while others demonstrate fluctuations (Heilbronner et al., 2016; Szöke et al., 2008). Yet, only six included studies directly compared variations in AH status, with source-monitoring and language lateralisation tentatively suggested as state markers.
Theoretical implications
Much debate remains with respect to the theoretical conceptualisation of AHs as a form of intrusive memory (Waters et al., 2006), inner speech (Jones and Fernyhough, 2007), or environmental cues attributed heightened salience/attention (Dodgson and Gordon, 2009; Kapur, 2003). However, given that no single cognitive mechanism solely informs each theoretical model, a multidimensional perspective is required. Hence, we found it useful to consider the current results in the context of broader cognitive frameworks, as discussed below.
Of the cognitive sub-domains identified within this review, those most substantially associated with AHs included reduced sensitivity/accuracy for signal-noise discrimination, and increased expectation biases for sensory input under conditions of ambiguity. These trends can be understood within a predictive coding framework, which emphasises the aberrant weighting of prior beliefs against incoming stimuli (Corlett et al., 2019; Sterzer et al., 2018). Similarly, excitatory–inhibitory models suggest that imbalances in the activation of bottom–up vs top–down neural networks may cause these two streams to reverberate and become misinterpreted as one another (i.e. circular inference), thus rendering an individual susceptible to anomalous perceptions (Jardri and Denève, 2013). Yet in the current review, task-based measures of candidate top–down functions (e.g. attentional control, inhibition, strong imagery) exhibited limited associations to AHs. However, much of the literature pertaining to inhibitory cognitive control, as measured by the dichotic listening task, was excluded from our synthesis due to non-AH-specific methodology.
Also associated with AHs in the current review were difficulties in distinguishing the self–other source and temporal presentation order of given events, along with reduced working memory capacity. These trends sit coherently within the self-monitoring framework, which proposes that individuals with AHs may exhibit greater difficulties in identifying with self-generated information (Waters et al., 2012b). Several functions have been hypothesised to underpin self-monitoring difficulties, including the ability to register the sensorimotor consequences of one’s actions at a neurological level (Ford and Mathalon, 2005), and the capacity to bind contextual cues with events in memory (Waters et al., 2004). It is possible that difficulties in the accurate encoding of relevant temporal cues may render the later recollection of events in memory as fragmented, and thus intrusive and foreign to oneself (Hardy, 2017; Steel, 2015). This notion applies to current cognitive perspectives of AHs in the context of trauma (McCarthy-Jones and Longden, 2016). It is worth acknowledging theoretical overlap, in that ‘low-level’ motor/sensory processing and ‘higher order’ expectations, posited to underpin the above frameworks, likely occur in conjunction. Consequently, due to its integrative perspective, the predictive coding account has recently been employed as a means of reconciling these concepts (Griffin and Fletcher, 2017; Thakkar et al., 2021).
Clinical implications and developments
Psychological interventions for distressing AHs commonly focus on shifting individuals’ perceptions of, and relationships with, their voices; in addition to enhancing adaptive responses and coping strategies. Yet, there is a need to identify key processes of therapeutic change, rather than continuing to compare the efficacy of broad therapeutic frameworks (Thomas et al., 2014). Cognitive remediation programmes, which aim to improve functions such as memory, attention and verbal learning in individuals with schizophrenia-spectrum diagnoses (Wykes et al., 2011), have limited application to AHs, with mixed findings to date (Fiszdon et al., 2005; Thomas et al., 2018). Thus, ongoing cognitive research which considers the above pursuits is warranted; and several distinctions could be made to support clinical relevance. Namely, some processes have been hypothesised as implicated in the emergence of AHs (e.g. self-monitoring and context memory), while others may be more relevant to the phenomenology of such experiences (e.g. affective processing, perceptual expectation bias) (Waters et al., 2012a). Despite not being feasible for synthesis here, six out of the 101 studies meeting our initial inclusion criteria reported investigations between phenomenological characteristics of AHs and cognitive function. Thus, further examination of AH mainfestations (e.g. valence, distress, controllability), beyond presence, could be of value. Similarly, differentiating between developmental and maintenance factors, as demonstrated within delusion literature (e.g. Freeman, 2016), may support the advancement of cognitive interventions relevant to the prevention, and alleviation, of distressing AHs, respectively. Future studies considering AH onset and chronicity would be well positioned to build on these ideas.
Given this review was restricted to AH presence/severity, it is acknowledged that cognitive processes associated with common therapeutic targets – such as levels of associated distress – have not been captured here. However, possible mechanisms of therapeutic change, aligning to the results of this review, could include targeting the accurate detection and anticipation of sensory input, identification with self-generated events, and integration of contextual memory cues. For example, imagery rescripting protocols, which aim to re-contextualise voice-related memories by meeting client needs and providing a corrective emotional experience, are hypothesised to target source-memory and context integration (Paulik et al., 2019). Evaluation of objective cognitive change markers, alongside the tracking of AH presentation and/or distress, pre- and post-intervention, could facilitate more direct links between cognitive mechanisms and key therapeutic outcomes.
Limitations and scope
There are several limitations within the current systematic review. Given the number of studies identified (n = 22,391), stage one screening was completed by a single author, posing the possibility of missed records. Furthermore, the range of summarised cognitive mechanisms have, in many instances, been assessed with a breadth of novel behavioural paradigms. A key example is the operationalisation of internal–external source-monitoring, represented by memory for self (read/imagined) vs other (heard) words; differentiation of self vs other distorted speech; recognition of self-speech feedback delay; and recollection of learned words (other-presented) vs extra-list intrusions (self-originating errors). This rendered the definition of valid constructs, and development of broader cognitive themes, quite challenging, and overlap was inevitable. Given the scope of data, we chose to focus on methodological concerns regarding the measurement of AHs, cutting across studies, rather than proving a detailed critique of cognitive tasks. However, variation in task parameters can be seen in the results, and thus, some cognitive mechanisms may benefit from refinement. A range of stimulus presentation forms were also present. Conceptually, this assumes cognitive mechanisms to be modality-general in nature, and while we did not consider this in detail, further clarification is required to support such assumptions (Fernyhough, 2019). Importantly, the direction of relationships between cognition and AHs remains unconfirmed, and while difficulties are often inferred as mechanisms for AHs, they may also reflect by-products of such experiences. In addition, the status of antipsychotic medication use within study samples was not assessed during quality assessment or data extraction, due to inconsistent reporting of, and/or statistically controlling for, this information. However, antipsychotic medication use is recognised a possible confounding factor when examining cognitive performance (Haddad et al., 2023).
In contrasting the relative evidence supporting each cognitive mechanism, we focused on the proportion of significant results reported for each, as presented by the original authors. In some instances, a notable proportion of non-significant findings were also present (e.g. just under 50% internal–external source-monitoring studies), and thus, appropriate caution should be exercised when interpreting overall trends. It is also recognised that not all investigations can be attributed equal weighting, due to methodological concerns and sample size limitations. Nevertheless, this review offered a nuanced summary of a highly heterogeneous dataset, providing important insights into the methodological landscape of this field. Alternative approaches to synthesis (e.g. meta-analysis) could complement these findings in future. Finally, empirical evidence supporting theoretical AH models have emerged from various lines of investigation, extending beyond the behavioural paradigms examined (e.g. electrophysiology, brain-imaging). Thus, while limited evidence was noted for some models, inspection of complimentary research streams may provide alternative conclusions.
Recommendations for future research
Recommendations for future research are focused on three areas: expanding, and matching for, the broader psychological context; measuring the timeframe of hallucinatory experiences; and specifying the sensory modality in which they emerge. Given that AHs emerge within a range of psychological conditions and the wider population (De Leede-Smith and Barkus, 2013; Sommer et al., 2010), the utility of transdiagnostic investigation is noted. With respect to expanding the psychological context of this research, cognitive investigations into the healthy AH continuum are progressing (e.g. Moseley et al., 2022), yet studies of this nature beyond schizophrenia-spectrum diagnoses are lacking. Phenomenological research examining hallucinations in the context of trauma are abundant and have begun to draw theoretical links with psychosis (McCarthy-Jones and Longden, 2016). However, at this stage, complementary task-based cognitive investigations warrant further attention, as do related conditions, such as borderline personality disorder. To this end, efforts should be made to match groups on psychological profiles where possible, with and without AHs, so that distinctions from related experiences can be made.
Explicitly defining and measuring the timeframe of AHs is also recommended, to allow for inferences regarding state vs trait cognitive markers. Importantly, links between proposed cognitive functions and current AHs do not, in isolation, confirm their role as a state marker. Rather, appropriate contrasts with past AHs are required, to identify the potential for cognitive change in conjunction with AH remission. Ideally, longitudinal investigation would be most revealing, although cross-sectional examination of AH status variations may provide a more feasible option. It is acknowledged that quantifying the status of complex phenomena, such as AHs, is difficult, given the need for arbitrary boundaries. Therefore, transparent reporting of AH measurement parameters is imperative in aiding cross-study comparisons. Going forward, the utilisation of measures which specifically examine hallucinations across each of the respective senses, rather than collapsing these diverse experiences into a single item, is advised. Subsequently, care must be taken when forming inferences regarding each sensory modality, so as to not form unfounded cross-modal conclusions. Appropriate measurement of these experiences will not only provide clarity within the AH literature, but also inform much needed investigations into the cognitive underpinnings of hallucinations beyond the auditory modality (Toh et al., 2019).
Conclusion
The current review highlighted the breadth and complexity of research examining cognition in the context of AHs. Difficulties in signal detection/expectation, self-other-source and temporal context monitoring, working memory, and language fluency, exhibited the most consistent links to AHs. Conceptually, these functions align with predictive coding and self-monitoring frameworks, and theoretical overlap is acknowledged. A lack of specificity regarding the sensory modality of hallucinatory experiences measured, as well as the common use of diagnostic-based comparisons, were of primary concern. Thus, we noted discrepancies between some theoretical AH models and corresponding empirical evidence. Finally, we recommended the extension of this research into broader transdiagnostic and healthy populations, in conjunction with relevant therapeutic intervention, to advance the field in a meaningful way.
Supplemental Material
sj-docx-1-anp-10.1177_00048674241235849 – Supplemental material for Examining the relationships between cognition and auditory hallucinations: A systematic review
Supplemental material, sj-docx-1-anp-10.1177_00048674241235849 for Examining the relationships between cognition and auditory hallucinations: A systematic review by Adrienne Bell, Wei Lin Toh, Paul Allen, Matteo Cella, Renaud Jardri, Frank Larøi, Peter Moseley and Susan L Rossell in Australian & New Zealand Journal of Psychiatry
Footnotes
Acknowledgements
The authors thank Associate Professor Vaughan Bell for his contributions to the protocol pre-registration and feedback on initial manuscript drafts. W.L.T. is supported by the Medical Research Council (NHMRC) New Investigator project grant [GNT1161609]; S.L.R. holds a Senior NHMRC Fellowship [GNT1154651]; and P.A. is supported by the UK Medical Research Council and University of Lille International Chair Award.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
