Abstract
Background:
The current cognitive tests have been developed based on and standardized against Western constructs and normative data. With older people of minority ethnic background increasing across Western countries, there is a need for cognitive screening tests to address factors which influence performance bias and timely diagnostic dementia accuracy. The diagnostic accuracy in translated and culturally adapted cognitive screening tests and their impact on test performance in diverse populations have not been well addressed to date.
Objective:
This review aims to highlight considerations relating to the adaptation processes, language, cultural influences, impact of immigration, and level of education to assess for dementia in non-Western and/or non-English speaking populations.
Methods:
We conducted a systematic search for studies addressing the effects of translation and cultural adaptations of cognitive screening tests (developed in a Western context) upon their diagnostic accuracy and test performance across diverse populations. Four electronic databases and manual searches were conducted, using a predefined search strategy. A narrative synthesis of findings was conducted.
Results:
Search strategy yielded 2,890 articles, and seventeen studies (4,463 participants) met the inclusion criteria. There was variability in the sensitivity and specificity of cognitive tests, irrespective of whether they were translated only, culturally adapted only, or both. Cognitive test performance was affected by education, linguistic ability, and aspects of acculturation.
Conclusions:
We highlight the importance of translating and culturally adapting tests that have been developed in the Western context. However, these findings should be interpreted with caution as results varied due to the broad selection of included cognitive tests.
Keywords
INTRODUCTION
Over the last 50 years, the impact of globalization has led to an increase in people from culturally diverse backgrounds migrating to various destinations, including Western countries [1]. According to the census data from the Office for National Statistics (2021), approximately 18% of the United Kingdom (UK) population identifies as an ethnic minority, which includes those for whom English is not their first language [2].
Older people from ethnic minorities are at increased risk of receiving a diagnosis of dementia [3]. Physical health related factors, such as hypertension and diabetes, are partially responsible for this [4]. The higher likelihood of a dementia diagnosis could also be attributed to the reduced specificity of some tests, especially brief cognitive assessment tools [5], resulting in higher incidents of false positives, whereby dementia and mild cognitive impairment (MCI) are incorrectly identified by a test in a cognitively healthy (CH) individual.
Screening tests assist the diagnostic process of assessing cognitive impairment. Such tests are known to be influenced by multiple factors including age, years/level of education, language, culture, and literacy rates [6]. Eurocentric bias is also well-documented in the cognitive assessment process [7]. Many cognitive screening tests have been constructed in Western countries, i.e., developed predominantly in the United Kingdom (UK) and United States of America (USA), and therefore standardized on primarily Western, Caucasian, and English-speaking samples and cultures [8]. At the core, this situates European norms as the standard to which other cultures are being compared against [9].
Even though cultural factors may influence screening tools with dominant Western characteristics and, thus, can affect test performance, only few cognitive tests have considered this [10]. Furthermore, some argue that emphasis is placed on different cognitive skills within different cultures [11]. For example, the Western model of education typically exposes individuals to test conditions from a fairly young age [12] which could be likened to that of cognitive assessments. Where schooling is embedded within the culture, individuals are more likely to develop skills relating to finishing tasks within an allotted timeframe and being familiar with the demand for continuous attention. Therefore, individuals migrating from a developing country may possess potential sources of underperformance such as lower level of education, lack of familiarity with the language and cultural differences, such as culture-specific test items which assume familiarity to a different culture in which the population had been raised [13]. In these circumstances, such a testing process may raise a person’s anxiety levels and, thus, not be an accurate reflection of an individual’s true ability.
There has been growing evidence that cultural variables can exert a powerful effect on test performance [14] and, therefore, pose a challenge when interpreting scores in different cultural contexts. Even within the same country, ethnic minorities may also be influenced by cultural factors. One factor which is thought to have a potential impact on test performance is acculturation, defined as the process of cultural learning and incorporation of language, values, beliefs, and customs of a new country in which a person with an immigrant background resides [15, 16]. Some studies have identified associations between the degree of acculturation and the performance on cognitive tests designed in the country in which they reside, i.e., nativity status (place of birth and length of residence in the new country) and language acculturation (documented whether participants were bilingual or spoke primarily Spanish or English at home) [17], or quality of education (reading level) [18]. However, acculturation except language ability (understanding reading, writing and speaking, use and preference) comprises of different variables including the quality and the degree of formal education, and the pattern of abilities and values developed as a consequence of the cultural background [19], including social contacts (relations, friendships, way of dealing with people, communication style, family life), daily living habits, use of media, access to education, work, marriage, childrearing practices, celebrations and holidays, general knowledge and knowledge of world affairs, as well as specific cultural habits and customs [20]. It is, therefore, important to understand exactly what aspects of acculturation may impact performance.
Various screening tests have been validated through translated adaptations to increase diagnostic accuracy in culturally diverse populations. However, translation alone can be problematic as it does not address other matters concerning cultural differences [21]. Translation may also create issues when measuring aspects such as fluency, as it will not be testing the same factor if translated and could manifest as longer response times [22]. The impact of culture may affect the validity of cognitive screening tests used to identify dementia and MCI through biases that may be related to the test itself, the administration and/or the interpretation of scores [8]. For example, where an item in a test may be ambiguous due to poor translation or lack of meaning in a particular cultural context, this may affect an individual’s understanding, their interpretation [23] and, as a result, impact a test’s diagnostic accuracy. This may be through different biases including that of construct (non-equivalent constructs across cultural groups) and item (inadequate translation from incorrect word choice) [24].
Universal cut-offs can also lead to sampling biases, whereby a cut-off may not match with the individuals being assessed [25] (e.g., if a target population has a lower level of education or higher rates of illiteracy). As such, this could lead to false conclusions made about an individual’s cognitive ability which could have certain consequences, such as leading to inaccurate diagnosis. Additionally, this could also be problematic for the interpretation of scores by healthcare professionals which may impact clinical decision making.
Systematic reviews have explored cultural adaptations of specific screening tests including the Montreal Cognitive Assessment (MoCA) [26] and the Rowland Universal Dementia Assessment (RUDAS) [27] in specific populations e.g., literate versus illiterate [28], or specific to certain countries [29]. However, the translation and cultural adaptation of several dementia screening tests more broadly across a variety of populations has not been addressed. The aim of this paper is to explore the effects of translation and cultural adaptations in a range of cognitive screening tests developed in a Western context and investigate the impact this has on their diagnostic accuracy and test performance across diverse populations. This review aims to highlight considerations relating to the adaptation processes, language, cultural influences, impact of immigration, and level of education when using a range of different Western-designed cognitive screening tools originally in English to assess for dementia in non-Western and/or non-English speaking populations.
METHODs
Search strategy
Four databases were searched, including: APA PsycInfo, Medline, the Cumulative Index to Nursing and Allied Health Literature (CINAHL), and SCOPUS. Databases were searched from inception to 30 June 2023. Database searches were additionally supplemented with manual searches [i.e., The Clinical Neuropsychologist 2017–2023; Archives of Clinical Psychology 2020–2023; Neurological Sciences 2022; British Journal of Psychiatry Open 2022 (an online-only open access journal); Journal of International Neuropsychological Society2022, etc.].
Development of search terms
The key search concepts for this review were identified through a focus on the population and intervention elements of the PICO framework [30]. The individual terms for each concept were guided by consideration of terms used in previous reviews within the field (Table 1).
Concept and corresponding search terms table
The Boolean operator ‘AND’ was used to combine search terms for the final search. *Truncation symbol used at the end of search terms to find any string of characters in that position, for example, ethnic* would identify ethnic, ethnicity, ethnicities etc. **The term ‘culturally diverse populations’ was used to refer to non-English speaking participants and/or those from non-Western countries for whom English was not their first language.
Article selection summary
A total of 2,890 records were identified. Duplicates (n = 801) were removed using the reference management tool, Endnote (n = 460), as well as manually (n = 341) (NCC-A). The remaining records (n = 2,089) were screened based on title and abstract independently by three reviewers (NCC-A, TC, and EBM-L) against the inclusion and exclusion criteria (Table 2). The PRISMA diagram [31] outlines the search results and screening process, including the reasons for exclusion at the full-text screen (Fig. 1). The study is compliant with the PRISMA 2020 27-item checklist for reporting systematicreviews [31].
Inclusion and exclusion criteria. Cultural diversity was defined according to Lin (2020) as ‘an open-ended term, which generally refers to a reality of coexistence of diverse knowledge, beliefs, arts, morals, laws, customs, religions, languages, abilities and disabilities, genders, ethnicities, races, nationalities, sexual orientations, etc.’
*As based on mean age reported; for some studies 95% confidence interval (CI) was also calculated, to ensure that 95% of participants had the true mean of age ≥60 years. HIV, human immunodeficiency virus infection; TIA, transient ischemic attack.

PRISMA flowchart for systematic reviews with included searches of databases and identified reports.
Data extraction and synthesis
Data was extracted on the following: population, setting, specification of diagnosis, comparison (dementia/MCI versus controls and cultural group), age, country, education level, adaptations (cultural/ translations versus original), cut-off scores, sensitivity, and specificity (Supplementary Material 1). The extraction of data and quality assessment was performed independently by two authors (extraction of data: NCC-A and TC, and TC and EBM-L; quality assessment NCC-A and EBM-L and EBM-L and HS). Disagreements were internally resolved between the reviewers. Due to limited studies identified, their heterogeneity, as based on dementia subtypes and clinical cognitive assessment tools used, a meta-analysis was precluded, and a narrative synthesis was conducted instead. Studies characteristics and key outcomes were summarized as tables and a figure. Data files/information were available for download from the journal website or a data repositories.
Sensitivity and specificity for optimal cut offs (as indicated by the authors) were extracted into Microsoft Excel, and confidence intervals at 95% were calculated for 14 studies. Sensitivity/specificity data was plotted onto two separate forest plots for the relevant category: differentiating MCI from CH and dementia from CH. Where sensitivity and specificity values were not used (n = 3), area under curve (AUC), correlation coefficients or other relevant data is presented. Where studies have included comparison between English-speaking and non-English speaking or immigrant population versus non-immigrant population groups, comparative data relating to test performance is also presented. Heterogeneity was accessed via visual examination of risk bias, descriptive data and forest plots. Factors which were thought to possibly contribute to heterogeneity related to co-variates including selection of patients, education, linguistic ability, and immigration status.
Quality appraisal
The quality of the eligible studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool (Table 1, Supplementary Material 2A and 2B) [32].
RESULTS
Study characteristics
Studies were published between 1999–2023 (Table 3). The studies recruited samples from clinical and community settings, and came from Middle East (Turkey n = 2, Lebanon n = 1), North America (Mexico n = 1, USA n = 1, Canada, n = 1), Europe (one study each from Greece, Belgium/Denmark/Norway/Sweden/Germany, Czech Republic, Netherlands, Germany, Sweden), one study each from South America (Argentina) and Southern/ Eastern Asia (India, Sri Lanka, Thailand and Japan). Most studies used the Diagnostic and Statistical Manual of Mental Disorders (DSM) as a reference standard (n = 9), followed by the NINCDS-ADRDA (National Institute of Neurological and Communicative Disorders and Stroke-Alzheimer’s Disease and Related Disorders Association; n = 6). The most prevalent cognitive tools were RUDAS (n = 5) followed by MoCA (n = 4) (see Supplementary Material 3 for further study characteristics which specify cut-off scores for each study).
Summary of eligible studies
ACE-R, Addenbrooke’s Cognitive Examination-Revised; ACE-R-J, Japanese version of Addenbrookes Cognitive Examination – Revised; AD, Alzheimer’s disease; ADAS-Cog, Alzheimer’s Disease Assessment Scale-Cognitive subscale; CAMCOG, Cambridge Cognitive Examination; CDT, Clock Drawing Test; CERAD, Consortium to Establish a Registry for Alzheimer’s Disease; DRS, Mattis Dementia Rating Scale; DSM, Diagnostic and Statistical Manual of Mental Disorders; DLB, Dementia with Lewy Body; 5-WT, Five-Word Test; FDT, Frontotemporal lobe dementia; FAQ, Functional Assessment Questionnaire; M, Mean; MCI, Mild Cognitive Impairment; MMSE, Mini Mental State Examination; A-MMSE, MMSE in the Arabic language; mMMSE, Modified Mini Mental Status Examination; MMSE-S, MMSE in the Swedish language; MoCA, Montreal Cognitive Assessment; MoCA-CZ, Czech version of MoCA; NINCDS-ADRDA, National Institute of Neurological and Communicative Disorders and Stroke-Alzheimer’s Disease and Related Disorders Association; PD, Parkinson’s disease; PDD, Parkinson’s Disease Dementia; PD-MCI, Parkinson’s disease-Mild Cognitive Impairment; PTSD, Posttraumatic Stress Disorder; RUDAS, Rowland Universal Dementia Assessment Scale; A-RUDAS, RUDAS in the Arabic language; RUDAS-S, RUDAS in the Swedish language; SCI, Subjective Cognitive Impairment; SD, Standard Deviation; SLUMS, Saint Louis University Mental Status; VaD, Vascular Dementia; VAT, Visual Association Test; mVAT, modified Visual Association Test.
Participant characteristics
The total number of participants was 4,463, with sample sizes ranging from 56 [33] to 874 [34]. The overall total sample comprised of 55% females and 45% males, with a mean age 71.8 years. Mean years of education varied across studies from 7 [33] to 14.6 years [35], with few studies including participants with no education [10] or no formal education (52%) [36]. Where studies included further information on the level of education in populations with an immigrant background, a summary of this is presented in Supplementary Material 4.
Cultural and translation adaptations
Translation process
Six studies [36–41] adopted the backwards translation process to translate the screening tests (Table 3). These studies demonstrated high sensitivity (84% –99%) [39, 42] and high specificity, from 80% [38] to 99% [42]. In contrast, studies that did not use back translation, demonstrated much higher heterogeneity in differentiating dementia and MCI from cognitively healthy individuals, with sensitivity and specificity as low as 65% [43] and 40% [35], respectively.
Some studies did not warrant a backwards translation process [10, 44]. For example, Borson et al. (1999) explored the utility of the Clock Drawing Test (CDT) in a multi-ethnic sample which involved minimal language requirements [44]. Instead, an interpreter specifically trained for the study administered the test instructions in the participants’ native language. Results demonstrated high levels of sensitivity (94%) and specificity (85%) for differentiating between probable dementia and CH controls in low educated and non-English speaking participants. Similar high accuracy to differentiate people with dementia and MCI from cognitively intact individuals was also demonstrated with the Visual Association Test (VAT), irrespective of whether the six-line drawings of pairs of interacting objects (association cards) were presented in black and white or incolor [10].
Cultural adaptations
Seven of the included studies made cultural adaptations to fit more specifically within a country’s cultural norms. For example, the cultural adaptation of the five-word test (5WT) [45], was undertaken by five linguistic experts from Mexico. Five words from the Spanish language were chosen based on criteria to make the test comparable to the original: word length of two to three syllables and beginning with different letters of the alphabet. The understanding of the chosen words was then assessed by small groups of older adult volunteers from the memory service of the study site. The test was validated on volunteers from the site in which the study was conducted, which could increase risk of bias and lead to concerns regarding applicability to other regions within Mexico thus, impacting on generalizability. Results demonstrated sensitivity of 89% and specificity of 98% for differentiating between Alzheimer’s disease (AD) and CH. For MCI, sensitivity and specificity were 66% and 77%, respectively.
Other studies used a backwards translation process as well as making cultural adaptations. For example, following the translation process, Mavioglu et al. (2006) adapted words from the ADAS-Cog to fit better with Turkish culture, i.e., “ocean” was replaced with “sea” and “lobster” with “fish” [33]. With these modifications, the test differentiated between probable AD and CH (p < 0.001). In Lakshminarayanan et al. (2022) study [41], the ADAS-Cog “Word Recall’ was replaced with the Consortium to Establish a Registry for Alzheimer’s Disease (CERAD) word list, used in other adaptation studies conducted in low and middle-income countries [46]. Likewise, the “Naming Objects and Fingers” item was substituted with culturally appropriate alternatives, with the authors providing an example of replacing “mask” (which they found difficult for participants to identify) with “battery” from the list of alternative words.
Franzen et al. (2019) adapted the original Visual Association Test (VAT) by changing the black and white line drawings to color photographs [10]. Both the original VAT and the modified version (mVAT) were administered to non-Western immigrants from a memory clinic setting and a control sample of CH individuals with an immigrant background. Although both tests discriminated well between individuals with and without dementia; participants performed higher when completing the mVAT (VAT, AUC = 0.77–0.88; mVAT, AUC = 0.85–0.95). This suggests that performance of non-Western immigrants could be underestimated when using black and white line drawings.
Immigrant populations
In addition to investigating the accuracy of cultural and translational adaptations of dementia cognitive screening tests, six studies explored screening tests in identifying dementia or MCI in those from an immigrant population. Three of the studies’ findings suggested that both educational background and linguistic ability of those from a non-Western first-generation immigrant background need to be considered when characterizing cognitive profiles [35, 47]. However, four of these studies did not translate or culturally adapt the respective cognitive screening tests and instead investigated diagnostic accuracy and test performance in relation to immigrant versus non-immigrant groups. For example, in Nielsen et al.’s study. (2019), 57% of participants had an immigrant background [40]. The findings in this group demonstrated that RUDAS scores were mainly affected by level of education as opposed to immigrant status. When cut-off scores were adjusted to account for education, this slightly improved diagnostic accuracy (before adjustment, AUC of 0.93; after adjustment, AUC of 0.95). On the other hand, Torkpoor et al. (2022) study, based on a smaller group of immigrants, reported that educational background and immigrant status do not influence the RUDAS, but the MMSE cognitive performance [43]. In contrast, Celik et al. (2022) found no difference when adjustments were made to account for years of education in RUDAS scores (p = 0.622) but did find that years of education were significantly associated with total MMSE scores (p < 0.001), where German natives performed better [47]. Thus, it could be important to consider to what extent level of education may be correlated with immigrant background, i.e., some studies have found lower test performance scores to be associated with lower levels of education related to lack of opportunity available within lower economically developed countries (LEDCs) [48]. This is supported by Statucka et al. (2021) [35] who reported that immigrant status alone did not significantly impact results (p = 0.560) but instead was related to other variables around the Historical Index of Human Development (p < 0.001). This latter measure captures three dimensions of human development including a long and healthy life (life expectancy), access to knowledge (years of schooling) and decent standard of living (gross income, adjusted for price level ofcountry) [49].
The impact of bilingualism in cognitive testing also appeared to be relevant. Thus, bilingual people performed significantly better on the MMSE as opposed to those who were monolingual (p = 0.046) [47]. However, when bilinguals were removed from the analyses, results demonstrated that Turkish (TR) and Turkish immigrant (TI) groups performed significantly worse than German participants (TR, p = 0.021; TI, p = <0.001).
Risk of bias across studies
A summary of the methodological quality using the QUADAS-2 is presented in Table 4.
Methodological quality of data based on the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool [32]
The quality within some domains (patient selection, index test, reference standard and flow and timing) was sometimes difficult to assess due to information not being clearly reported within the studies. The ratings for each domain are presented as percentages in Fig. 2. The only study that demonstrated low bias on all domains was the one by Tsolaki et al. (2000) in which participants came from a memory clinic setting [37]. The authors included a heterogeneous group of participants from different cultural and socio-economic environments and extent of cognitive problems, ranging from subjective and functional cognitive problems to those with MCI and unselected dementias, so that they represented as many variables of the general population as possible in the analyzed sample. This was not the case for other studies that had a high level of bias in respect to patient selection, due to the nature of some of the case-controldesigns.

Risk of bias and concerns about applicability. Proportion of studies with low, high, and unclear risk of bias (a) and applicability (b). Values represent percentage of studies (n = 17).
For index test, 7 studies were marked as ‘unclear’ due to the absence or lack of clarity around test administration. Three studies [38, 43] were rated as being at ‘high’ risk of bias due to the assessors’ knowledge of the participant’s cognitive and/or immigrant status and potential for overestimation of diagnostic accuracy.
For reference standard, all but one of the studies used a recognized reference standard that was likely to correctly classify the target condition. Over 80% of studies were rated as ‘low’ risk of bias and ‘low’ in terms of concerns about applicability. Two studies [40, 47] were marked as ‘unclear’ due to lack of clarity around the use of masking.
For flow and timing, four studies were marked as ‘unclear’ due to lack of information reported in relation to the time intervals between the index test and reference standard. One study [35] was rated ‘high’ in level of bias due to lack of justification around a time of approximately 12 months between reference standard and index test. Conditions may have worsened during this timeframe, thus contributing towards unreliable diagnostic accuracydata.
Mild cognitive impairment versus cognitive health studies
Seven studies (n = 1,158; Table 3) investigated the ability of the included cognitive screening tests [5WT, Czech version of the MoCA (MoCA-CZ), Turkish version of Saint Louis University Mental Status (SLUMS-T), RUDAS-Thai, Argentine version of the MoCA (MoCA-A), Dementia Rating Scale (DRS), Japanese version of Addenbrookes Cognitive Examination - Revised (ACE-R-J)] to differentiate MCI from CH. Sensitivity ranged from 66% (at a specificity of 77%) to 87% (at a specificity of 97%), whereas their specificity ranged from 40% (at a sensitivity of 82%) to 92% (at a sensitivity of 87%) (Fig. 3). Two of the included studies by Bezdicek et al. (2020) [34] and Statucka et al. (2021) [35] investigated the ability of two separate cognitive screening tests (MoCA-CZ and DRS-2) to differentiate between Parkinson’s disease (PD) MCI (PD-MCI) and PD cognitively healthy (PD-CH) people. Bezdicek et al. (2020) explored the MoCA-CZ which had a sensitivity of 73% and specificity at 76% [34]. In comparison, the same study also differentiated PD-MCI from CH at a sensitivity of 82% and specificity of 62%, suggesting that the MoCA-CZ was more sensitive in differentiating between PD-MCI and CH, but had higher false positive rates when compared to the PD-MCI and PD-CH group.

Forest plot differentiating between MCI and CH. *These studies differentiated between PD-MCI.
These results are in contrast with Statucka et al. (2021) who reported DRS-2 to have a sensitivity of 82% and specificity of 40% when differentiating between the international PD-MCI versus PD-HC group [35]. However, as the two studies investigated different cognitive screening tests, one which was translated and culturally adapted and one which was not (MoCA-CZ versus DRS-2, respectively), direct comparisons between these two studies could not be made. Nevertheless, this study highlighted the importance of cultural and translated adaptations of a test which could possibly increase the overall diagnostic accuracy in culturally diverse groups leading to lower false positive rates.
Dementia versus CH
Eleven studies with 1,296 participants investigated the ability of cognitive screening tests (ACE-R-J, ADAS-Cog-Tamil, CAMCOG, CDT, MoCA-A, MoCA-S, RUDAS, A-RUDAS, RUDAS-S, SLUMS-T, and 5WT) to differentiate dementia from CH (Table 3). Dementia types included AD (n = 6), vascular dementia, frontotemporal dementia, mixed dementia, dementia with Lewy bodies (two studies each), PD dementia (n = 3) and non-specified dementia type (n = 6). Sensitivity ranged from 80% (at a specificity of 90%; [40]) to 100% (at a specificity of 70%; [50]) whereas specificity ranged from 70% (at a sensitivity of 100%; [50]) to 99% (at a sensitivity of 99%; [42]) (Fig. 4).

Forest plot differentiating between dementia and CH.
Overall, the findings across studies demonstrated cognitive screening tests to have higher sensitivity and specificity in differentiating dementia versus CH subjects.
Heterogeneity
Some studies rated at ‘high risk’ of bias in patient selection had higher levels of accuracy reported than those of low risk and unclear ratings. For example, in the study conducted by Statucka et al. (2021), the DRS-2 showed high sensitivity (82%) but low specificity (40%) when differentiating between PD-MCI from PD-HC [35]. This may be due to the difference in disease severity within and between groups. For instance, both groups comprised of participants diagnosed with PD, with MCI being a common feature, and could have been associated with older age at assessment and depression amongst other factors [51], thus, potentially resulting in higher false positive rates within this population.
DISCUSSION
Dementia cognitive screening tests in older culturally diverse populations were widely variable across studies, especially in terms of cut-offs, with optimal cut offs based on authors’ own data without external validation, and sensitivity and specificity scores. Findings were multi-faceted which may provide insight into different factors when considering the effect that both translation and cultural adaptations of tests have on diagnostic accuracy and test performance in diverse populations.
The studies included within this review varied in sample size, demographics, settings, education level, as well as variations in the population (e.g., severity of disease). This could have included inconsistency in the severity of symptoms, education level, linguistic ability, and the level of anxiety experienced in those completing tests in clinical settings compared to in participants’ own homes within the community. Additionally, the specified cut-offs in studies were adjusted dependent on different factors within the study populations, such as years of education and/or severity of disease. This could have impacted on the diagnostic variability across studies and subsequently affected the validity and reliability, sensitivity, and specificity of the included cognitive screening tests in accurately identifying MCI and dementia.
General findings indicated that sensitivity and specificity was higher when differentiating between dementia and CH in comparison to MCI versus CH. Furthermore, depending on the cognitive tests, the differentiation between dementia and CH groups had the lowest sensitivity and specificity when translated-only cognitive tests were used [43], followed by back translation [36, 40], with the highest accuracy demonstrated on those tests that were based on both back translation and cultural adaptation [38, 50], or cultural adaptation only [45]. On the other hand, all the tests used to differentiate MCI from CH were based only on translation [25, 35] or back translation [39], with only two studies addressing the cultural adaptation of the used cognitive tests, with variable success [42, 45]. One of the reasons why cognitive tests may have different accuracy in detecting dementia and MCI may lie not only in the quality of the translated and adapted cognitive tools, but also their ability to aid the MCI diagnosis in culturally diverse clinical settings. Moreover, MCI itself is also heterogenous and an unstable construct which can lead to variable patterns and diagnosis rates.
Consideration should be taken when interpreting our findings as many of the samples in the included studies compared groups with confirmed diagnoses against participants who were deemed as “cognitively healthy”. This could potentially over-estimated diagnostic accuracy as some studies have excluded participants who may have been considered “difficult to diagnose” or those with unconfirmed diagnoses, suggesting that in clinical settings the sensitivity and specificity of these cognitive tests may be lower. Thus, all these studies, except for Tsolaki et al. (2020) study [37] that used CAMCOG, did not reflect a real clinical setting with a whole spectrum of memory problems, including functional cognitive impairment (i.e., memory problems but no dementia). Indeed, if the latter were included, the sensitivity and specificity of cognitive tests (RUDAS) was substantially lower (0.717 and 0.583, respectively) with accuracy of <70% [52]. This raises the question as to whether differentiating between those with a confirmed diagnosis and those deemed as ‘cognitively healthy’ could be overestimating diagnostic accuracy in typical clinical scenarios, and, therefore, introduce an element of bias by not testing an overall sample including those with ‘unconfirmed’ diagnoses. Additionally, healthcare professionals need to be mindful of high false positive rates as this could lead not only to further testing in those that may not necessarily require it but also impacts patients’ daily life, in terms of driving, work, insurance, anxiety etc.
Cognitive screening tests, detailed above, can be biased against those from a non-English speaking and/or non-Western country. This is particularly apparent in studies which include tests that have not been translated or culturally adapted [35, 47]. Whilst cognitive processes are thought to be generally universal, a person’s culture can impact on this which can also extend to cognitive test performance [53]. Factors concerning socioenvironmental context were thought to have contributed to poorer performance in those from a culturally diverse population when compared to English speaking or Western populations. Characteristics including country of origin, level of education and economic status of a country, are variables thought to have led to discrepancy amongst test performance. It is reported that those from LEDCs have less opportunity to receive higher quality education [54] and with cognitive tests being predominantly Eurocentric in their approach, culturally diverse populations may be at an increased disadvantage.
There were mixed findings relating to the effect of education across studies. Some findings reported that adjusted cut-off scores to account for education were helpful in improving diagnostic accuracy [40], whereas other reported no difference when scores were adjusted for the same test [47]. However, it appeared that higher performance on some tests (such as the MMSE) were significantly associated with years of education, and this seems to be mediated by the Historical Index of Human Development of the participants’ country of birth (which reflects economic, health, and educational potential of a country at the time of birth) [8]. In addition, the quality of the cultural adaptation of the cognitive tests plays a role. Namely, the test should not only account for the language and culture of the person being assessed but should also maintain the integrity of the concepts being assessed [55]. It is necessary to bear in mind that some tests will be more influenced by education than others (e.g., MMSE versus RUDAS) due to the construct of the tests and certain items that may be more problematic for diverse populations such as verbal fluency tasks in those with lower levels of literacy.
Cognitive tests which involved minimal language requirements were more acceptable in those from a non-English speaking and/or non-Western country and demonstrated good levels of sensitivity and specificity. Tests including the CDT [44] and the mVAT [10] demonstrated promising results in differentiating between those with either dementia or MCI and control groups. Higher scores on the mVAT were thought to be associated with the added information that the colored photograph provided, as opposed to decoding black and white line drawings which is thought to be a skill acquired through education [56]. This is important to note, as many cognitive screening tests use visuo-constructional tasks (e.g., the copying of geometric shapes as seen in the ACE) which are reportedly the least culturally adapted items in cognitive screening tests as it is thought that visuospatial ability is not strongly associated with culture [21]. However, the results from Franzen et al. (2019) did not seem to be consistent with this theory [10].
One of the limitations of this study is the lack of more detailed information about the acculturation variables we have analyzed. Thus, although the educational level (i.e., formal level of education) was noted in all studies, there was neither explicit information about the language ability nor social values of participants in the analyzed studies. Furthermore, the reviewed literature does not specify the model of acculturation they have used, or participants’ acculturation groups/stages. Studies largely concentrated on the separation stage/approach (high origin-culture affiliation, with low new-culture affiliation), based on the account the language proficiency is necessary for completion of cognitive testing. A previous study conducted in the USA [57] specifically addressed and investigated the relationship between language and acculturation and identified specific relevant variables. However, according to these authors, these acculturation variables were only applicable to foreign-born individuals and/or those who spoke English as a second language. In their empirical study, Brauer Boone et al. (2007) [57] extracted the following language and acculturation data from the patient files: 1) whether subjects learned English as a first language (or concurrent with another language) versus English learned as a second language (ESL); 2) age at which English was first learned; 3) number of years resided in the United States (subtracted from total age); and finally, 4) number of years educated in the United States (subtracted from total years of education completed). These data, sadly, were not available for the migrant population in the studies we included in this review to make comparison.
Despite some evidence which implies that higher degrees of acculturation could positively impact test performance [58], the findings from this review did not come to the same conclusion. On the contrary, the level of acculturation was not related to total scores in one study [47]. However, this was the only study within the review which formally assessed the impact of acculturation using the Frankfurt Acculturation Scale (FRAKK) [59], and therefore warrants further investigation. Furthermore, acculturation is said to comprise of several variables (including language), and bilingualism was reported to have had a positive effect on test performance in some studies. Therefore, it would be helpful to understand what aspects of acculturation aid test performance.
The translation process aims to provide a means of quality control to ensure the translation possesses the same meaning when moved back to the original source language [60]. The importance of a good translation process can help increase the accuracy of data derived from a translated measure. Out of the six studies which adopted a backwards translation process, two involved medical professionals including neurologists, psychologists, and psychiatrists [33, 41] alongside linguistic experts. Involving qualified professionals is ideal from a clinical and psychometric perspective as a native speaker may not necessarily have knowledge of cognitive tests and this could result in an inaccurate translation or misinterpretation, leading to construct bias. The other four studies used native linguists and/or bilinguals to translate the studies. Three studies involved patients and controls or caregivers to aid with the adaptation and check for cultural appropriateness which is helpful to increase face validity. Face validity is defined as whether test items are appropriate and relevant to a specific target population [61]. Therefore, the target population themselves are said to be best placed to determine the applicability of an adapted test which could help to increase this [62]. A recent study conducted in Maori (M
To conclude, this review highlights the importance of translating and culturally adapting tests that have been developed in the Western context and standardized against Western normative data, as it may improve diagnostic accuracy within diverse populations. Our review highlights consideration around certain variables that clinicians should be mindful of when interpreting individual test scores, including education, linguistic ability, and aspects of acculturation. However, caution should be taken when interpreting the findings of this review as data is varied due to the broad selection of cognitive tests that are included. This makes it difficult to compare sensitivity and specificity data across different studies and populations. Pooled sensitivity and specificity data could provide a more comprehensive assessment of the diagnostic accuracy across studies.
AUTHOR CONTRIBUTIONS
Natasha Czerwinski-Alley (Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Software; Writing – original draft); Tamara Chithiramohan (Data curation; Investigation; Writing – review & editing); Hari Subramaniam (Data curation; Writing – review & editing); Lucy Beishon (Writing – review & editing); Elizabeta Mukaetova-Ladinska (Conceptualization; Data curation; Investigation; Supervision; Writing – review & editing).
Footnotes
ACKNOWLEDGMENTS
We would like to thank Miss Mel N. Ellul Miraval (School of Psychology and Vision Sciences, University of Leicester, Leicester, UK) for her help with Figs. 3 and
.
FUNDING
This review received no specific grant from any funding agency. LB is an Academic Clinical Lecturer funded by the National Institute for Health Research. The views expressed in this publication are those of the author(s) and not necessarily those of the NIHR, (Partner Name), NHS or the UK Department of Health and Social Care.
CONFLICT OF INTEREST
The authors have no conflict of interest to report.
DATA AVAILABILITY
The data supporting the findings of this study are available within the article and/or its supplementary material.
