Methodology for the development of normative data for ten Spanish-language neuropsychological tests in eleven Latin American countries

Abstract

BACKGROUND:

Within the field of neuropsychology, there is a significant lack of normative data for individuals in Latin America.

OBJECTIVE:

To describe the methodology utilized to obtain the data and create norms for 10 Spanish-language neuropsychological tests administered in 11 Latin-American countries in a sample of 3,977 healthy individuals between the ages 18 and 90.

METHOD:

The same data manipulation process was applied to the data collected (regardless of the scale or country) using a regression-based procedure that takes into account sex, age, and educational influences on neuropsychological test scores.

CONCLUSIONS:

Following this procedure, we were able to generate age, education, and sex (if relevant) based norms for each test in each of the 11 countries studied. These norms are presented in the 10 articles that comprise this special issue.

Keywords

Psychometric norms Latin America neuropsychological tests

1 Introduction

Most psychological and neuropsychological tests are built using the Classical Test Theory (CTT), from this perspective norms are created using simple procedures like transformation in percentile scales, or T and D scores (Abad, Olea, Ponsoda, & García, 2011; Crawford, 2003). Within the field of neuropsychology, there is a significant lack of norms for Spanish-speakers in Latin America. This article presents the methodology employed to standardize various neuropsychological tests administered in 11 Latin American countries.

There is extensive literature describing which demographic variables (e.g., sex, age, socio-economic status, and years of education) must be taken into account to generate appropriate norms for neuropsychological tests. It is widely accepted that the systematic error generated by both chronological age and education needs must be controlled for (Salthouse, 2000, 2001, 2009; Salthouse, Atkinson, & Berish, 2003). Therefore, there are several ways to control for age and education effects: 1) A method that involves the adjustment of control systems for error correction, based on the generation of normative data for each reference group. In this procedure, it would be necessary to use the stratum system to identify normative groups based on classic standardization. Though this option is easy, it generates an enormous quantity of normative tables that would turn the final product into a measureless and useless contribution from the point of view of applied clinical research (Brooks, Strauss, Sherman, Iverson, & Slick, 2009; Kim et al., 2014; La Cour & Andersen, 2006; La Paglia et al., 2014; Mungas, Reed, Crane, Haan, & González, 2004; Peintinger & Klünemann, 2013; Salvadori, Poggesi, Pracucci, Inzitari, & Pantoni, 2015), 2) The score correction obtained from the use of partial regression coefficients from a linear model (Blesa et al., 2001; Villaseñor, Guàrdia, Jiménez, Rizo, & Peró, 2010), and 3) The use of regression coefficients combined with the standard deviation of the residual to obtain normative data based on a percentile scale (Van der Elst, Dekker, Hurks, & Jolles, 2012; Van der Elst, Hurks, Wassenberg, Meijs, & Jolles, 2011; Van der Elst, Van Boxtel, Van Breukelen, & Jolles, 2006).

We had to choose a statistical approach to standardize the neuropsychological tests given the final resulting sample. We found an approach that would allow us to take into account a number of issues. For instance, the sample was obtained from a large number of countries and each country had slightly different distributions of age and educational levels. Furthermore, given that tests administered to these samples have different response ranges, as well as individual differences across countries, and well-known floor and ceiling effects made it more difficult to obtain normal distributions of acceptable symmetry values for our sample (Hartshorner & Germine, 2015; Schroeder & Salthouse, 2004). Therefore, we used the standardization procedure described by Van der Elst et al. (2006, 2011, 2012) in which regression coefficients are combined with the standard deviation of the residual.

Given the issues pertaining to this very heterogeneous sample of participants from 11 countries across Latin America, using regression coefficients combined with the standard deviation of the residual to obtain normative data based on a percentile scale seems to be the best procedure. Conventional norms tables will be generated in a quantity that is easy to handle and has greater clinical utility. Additionally, this procedure proposes correction systems that take real observed distributions into account instead of a series of arbitrary strata. In such a way they solve the ceiling and floor effect problems. Finally, the procedure allows for a specific adjustment depending on the cohort studied (Van der Elst et al., 2006, 2011, 2012). Though this procedure does not radically solve the issues of asymmetric distributions, it does make it possible to take them into account, as the predictions of scale setting are based on the establishment of confidence intervals close to the average values (Russell, Russell, & Hill, 2005).

In light of all of the above, in this paper we will explain the methodology based on ordered regression coefficients and statistical models, with the intention to generate normative data and standardize different neuropsychological test in 11 Spanish-speaking countries from Latin America.

2 Method

2.1 Participants

The present study was conducted with a sample of 5,402 healthy individuals from 20 cities in 12 Spanish-speaking countries in Latin America (Argentina, Bolivia, Chile, Colombia, Cuba, El Salvador, Guatemala, Honduras, Mexico, Paraguay, Peru, and Puerto Rico). The results obtained from 1,425 Colombians have been published elsewhere (Arango & Rivera, 2015). Therefore, this article will present the procedures used to generate normative data from 3,977 healthy individuals from 14 cities for the remaining 11 countries.

The sample was designed taking into account the literacy level, the percentage of people with primary, secondary and tertiary studies, and the age distribution for each country. Thus, an empirical quota sampling was used, in order to have a statistical precision between 0.063 to 0.049 with a 95% confidence level estimated under the situation of maximum uncertainty (π= 1 – π= 0.5) (see Table 1). Due to the low number of participants across the three education levels, years of education was recoded into two groups (1 to 12 years and >12 years). These two groups differentiate between people with low and medium and those with a high level of education. A description of the sample differentiated by this codification of years of education and country of origin is shown in Table 2.

For all the countries, the inclusion and exclusion criteria were the same. The inclusion criteria were: a) were between 18 to 95 years of age, b) were born and currently lived in the country where the protocol was conducted, c) spoke Spanish as their native language, d) had completed at least one year of formal education, e) be able to read and write at the moment of evaluation. f) scored ≥23 on the Mini-Mental State Examination (MMSE, Folstein, Folstein, & McHugh, 1975; Ostrosky-Solís, López-Arango, & Ardila, 2000; Villaseńor-Cabrera, Guàrdia-Olmos, Jiménez-Maldonado, Rizo-Curiel, & Peró-Cebollero, 2010), g) scored ≤4 on the Patient Health Questionnaire–9 (PHQ-9, Kroenke, Spitzer, & Williams, 2001), and h) scored ≥90 on the Barthel Index (Mahoney & Barthel, 1965).

The exclusion criteria were: a) having a personal history of central nervous disease (stroke, epilepsy, multiple sclerosis, brain tumour, severe head trauma, etc.); b) having history of alcohol abuse or other psychotropic substances; c) having an active systemic disease or uncontrolled disease associated with cognitive impairment (diabetes mellitus, hypothyroidism, B12 vitamin deficit); d) having a history of psychiatric illness (major depression, bipolar disease, psychosis, etc.); e) having severe sensory deficits (vision and/or auditory loss) that could affect the administration of the test or the participants’ performance in these test; f) Use of psychiatric or other drugs that could affect one’s cognitive performance; and g) taking medications for chronic pain (e.g. Monoamine Oxidase Inhibitors – MAOI).

2.2 Instruments

A sociodemographic questionnaire was created for this project in order to collect data on age, years of education, sex, laterality (right, left handed, ambidextrous), residence zone (rural, urban), race (white, black, mestizo, indigenous), employment status (employed, un employed, student, retired, housekeeping), and marital status (single, married, cohabited, separated, divorced, widower). Then, all of the individuals were administered a comprehensive neuropsychological evaluation that was composed of the 10 neuropsychological tests:

Rey-Osterrieth Complex Figure test (ROCF; Rey, 2009).

Stroop colour and words test (Golden, 2010).

Modified Wisconsin card sorting test (M-WCST; Nelson, 1976; Schretlen, 2010).

Trial making test (TMT A-B; Reitan & Wolfson, 1985).

Brief test of attention (BTA; Schretlen, 1997).

Phonological and semantical verbal fluency test.

Boston naming test (Goodglas, Kaplan, & Barresi, 2005).

Symbol digit modalities test (SDMT; Smith, 2002).

Hopkins verbal learning test - Revised (HVLT-R; Benedict, Schretlen, Groninger, & Brandt, 1998).

Test of memory malingering (TOMM; Tombaugh, 2011).

2.3 Procedure

2.3.1 Training and administration procedure of the battery

The present study began November 1, 2012 with the participation of 15 institutions. Subsequently, the proposal that included the methodology and ethics was drafted and later delivered to the ethics committee of the University of Deusto (Bilbao, Spain). After approval, the publishing manuals, answer sheets and materials (booklets and encouragement cards) for each of the neuropsychological tests were bought. Spanish-language versions of instructions booklets were available for all instruments except for M-WCST and BTA. Spanish administration and instruction manuals for these instruments were created in collaborated with the publishers.

Likewise, tools and visual aids were created to achieve standard management process of the battery. These tools consisted of: a) a randomized list to determine the order of administration of the test for each participant in order to avoid order bias and cognitive conditioning. To do so, the function f_x = RAND () was used in Microsoft Excel^© and was set to take into account the interaction of the language test and verbal memory test; b) a framework for decision-making in the evaluation process; c) a template in Microsoft Excel©for entering information to limit bias input information. The template was designed using the configuration options: data validation = Customized (numeric variables), dropdown lists (categorical variables) and setting formats; d) examples showing the most frequent errors in the administration and scoring of each test, and e) a virtual folder with a security key for each city, administered by the study coordinator to track data entry.

Moreover, each coordinator selected a group of 6–12 undergraduates and/or graduate students, with which the coordinator reviewed the instructions and application and qualification of the tests. Once the group was ready, an online virtual training was performed through telemedicine platform VSee^©. During the two-hour long training, the administration and scoring of the tests were reviewed and any doubts were resolved.

A pilot test with 40 protocols (these were excluded from the analysis of normative data) was conducted by analyzing data from the first two cases collected in each of the 20 centers in order to determine the adequate functioning for collection of information with the proposed design.

Data collection began March 2013 and ended August 2014. The protocol was administered in a single day and lasted about 70 minutes. Before starting the battery administration, participants had to sign the informed consent.

Once the database was consolidated, we reviewed the data distribution of frequencies, comparing the values of various statistics and graphs, in order to check for correct processing and characteristics of distributions for each of the variables analyzed. This certified the database was generated correctly and the properties of the observed variables were known.

2.3.2 Normative procedure

An independent t-test by country was run for each neuropsychological measure to first determine whether there were any significant differences in test scores based on participant sex within each country. Since the sample size is very large, significant differences for gender in neuropsychological scores were needed with a medium or high effect size (r > 0.3, Cohen, 1988) in order to consider that differences in the scores between men and women are relevant. The effect size was obtained using the following expression: $r = \sqrt{\frac{t^{2}}{t^{2} + v}}$ when relevant, country sample sizes were stratified by sex when significant differences were found with medium or high effect sizes.

The lineal regression model was built using the raw (direct/ unconverted) score in the scale being normed as the criterion variable. Participant age was included as a predictor variable, and years of education (dichotomized in two groups: 1 to 12 years coded as 0 and >12 years coded as 1) was also considered as a predictor variable for those cases where the differences in the raw scores between men and women met the above-mentioned criteria. All the regressions were run differentiating by country. If any predictive variable was not statistically significant, the linear regression analysis was repeated eliminating that predictive variable. For all linear regression models, the following were assessed: collinearity between the predictive variables, value of variance inflation factor (VIF) near to 1, the residuals normality using the Q-Q plot and the residuals histogram, the homoscedasticity as in the scatter plot of the standardized residuals and the predicted values, and finally, the existence of influent values by Cook’s distance.

In order to obtain the exact percentile for an exact score, the procedure used by Van der Elst et al. (2006, 2011, and 2012) was followed. The steps to obtain the exact percentile were:

Obtain the predicted value using the regression equation obtained in the country of origin of the person where the original y_ı was de score of each neuropsychological test: $\begin{matrix} {\hat{y}}_{i} & = & b_{0} + b_{1} \cdot {Age}_{i} + b_{2} \cdot & Educationtal {level}_{i} + b_{3} \cdot Sex \end{matrix}$

Obtain the residual value: $e_{i} = y_{i} - {\hat{y}}_{i}$

Standardize the residual obtained (transform to z score). To do this, it is necessary to divide the residual value by the standard deviation for the residual obtained in the fitted regression model. $z = \frac{e_{i}}{{SD}_{e}}$

Search the exact probability associated with the z value using accumulated probability of the standardized normal distribution.

It is possible that some clinicians consider this procedure slow and expensive in time. For this reason, in the papers that the different normative data were presented for each scale, we present also tables with the approximated percentile. To do this, following the procedure used by Van der Elst et al. (2006, 2011, 2012), we used class mark age value starting with 20 years ± 2 years until a class mark of 80 years old. For each class mark age and educational level (1: more than 12 years or 0:1 to 12 years) we apply the procedure explained in order to obtain the percentile. In this case we do not have the percentile for each age in years, we have the approximation for intervals of 5 years instead.

3 Limitations and future directions

Among the limitations encountered in the present study, there was an observed low symmetry in sample sizes between males and females for certain countries (e.g., Argentina). Additionally, discrepancy between designed and final sample was observed in certain countries more so than in others due to issues beyond researchers' control. For instance, in the case of Cuba, where education is obligatory and accessible to the population, it was extremely difficult to find participants with fewer than six years of education. Furthermore, the study used healthy participants. This resulted in floor and ceiling effects on some of the neuropsychological measures used. Additional limitations include lack of clinical sample(s) in the normative data, as well as lack of adequate representation of indigenous populations from Latin America. Among inclusion criteria for this study was the requirement that participants must be able to read. Therefore illiterate individuals were not part of the final sample of this study. Directions for future work in this line of research may include creating normative data for illiterate individuals, as well as individuals with varying neurological disorders (e.g., dementias, TBI, multiple sclerosis, etc.) from Latin America.

4 Conclusions

The procedure used by Van der Elst et al. (2006, 2011, 2012) allows us to propose a convenient procedure to obtain sex, age, and education norms for 10 neuropsychological tests without the necessity to work with large samples. This procedure is, in fact, a simple procedure useful in clinical practice. Clinicians have two possibilities to obtain the percentile. The first one is to obtain the exact percentile using the procedure described and the information of the regression coefficients and the residual standard error provided in the different articles of each test. They also have tables corresponding to the standardized normal distribution or software that provide the cumulative probability to a z score in order to complete the process and obtain the exact percentile. The second one, for those clinicians who do not need to obtain the exact percentile, is to use the tables obtained for each test with the approximate percentile. These tables are very simple to use, and are similar to those tables that psychologists use in their regular practice. They only need to know the sex, age and educational level of the person assessed and, finally, the raw score in the test administered. With these values, it is necessary to search, in the table of their country, the sex if it is considered in the model, the educational level (1 to 12 and >12 years), the age group and the raw score, and then find the approximate percentile provided in the table. These tables will be including in each of the 10 manuscripts of the special issue.

Conflict of interest

None.

Footnotes

Acknowledgments

The Grup de Recerca en Tècniques Estadístiques Avançades Aplicades a la Psicologia (GTEAAP) members of the Generalitat de Catalunya’s 2014 SGR 326 Consolidated Research Group (GRC) provided methodological and statistical support for this study. They are funded by the PSI2013-41400-P project of Ministerio de Economia y Competitividad of the Spanish Government.

References

Abad

F. J.

, Olea

, Ponsoda

, & García

. (2011). Medició n en ciencias sociales y de la salud. Madrid: Editorial Síntesis, S.A.

Arango-Lasprilla

J. C.

, , Rivera

. (2015). Neuropsicología en Colombia: Datos normativos, estado actual y retos a futuro. Manizales, Colombia: Editorial Universidad Autónoma de Manizales.

Benedict

R. B.

, Schretlen

, Groninger

, & Brandt

. (1998). Hopkins Verbal Learning Test— Revised: Normative data and analysis of inter-form and test–retest reliability. Neuropsychologia, 12 (1), 43– 55. 10.1076/clin.12.1.43.1726.

Blesa

, Pujol

, Aguilar

, Santacruz

, Bertrán

, Hernández

, et al. (2001). Clinical Validity of the Mini Mental State for Spanish speaking communities. Neuropsychologia, 39, 1150– 1157.

Brooks

B. L.

, Strauss

, Sherman

, Iverson

G. L.

, & Slick

D. J.

. (2009). Developments in neuropsychological assessment: Refining psychometric and clinical interpretive methods. Canadian Psychology/Psychologie canadienne, 50(3), 196.

Cohen

J. W.

. (1988). Statistical power analysis for the behoral sciences. (2a Ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.

Crawford

J. R.

. (2003). Psychometric foundations of neuropsychological assessment. In Goldstein

L. H.

, & McNeil

, Clinical Neuropsychology: A Practical Guide to Assessment and Management for Clinicians (pp. 220– 235). Chichester: Wiley.

Folstein

M. F.

, Folstein

S. E.

, & McHugh

P. R.

. (1975). “Mini-mental state”: A practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12(3), 189– 198.

Golden

C. J.

. (2010). Manual de test de colores y palabras, Madrid: Publicaciones de psicología aplicada. TEA Ediciones.

10.

Goodglass

, Kaplan

, & Barresi

. (2005). Evaluació n de la Afasia y de Trastornos Relacionados, Madrid: Editorial médica panamericana.

11.

Hartshorne

J. K.

, & Germine

L. T.

. (2015). When does cognitive functioning peak? The asynchronous rise and fall of different cognitive abilities across the life span. Psychological Science, 1, 1– 11. 10.1177/0956797614567339.

12.

Kim

, Parker

, Whyte

, Hart

, Pluta

, Ingalhalikar

, et al. (2014). Disrupted structural connectome is associated with both psychometric and real-world neuropsychological impairment in diffuse traumatic brain injury. Journal of the International Neuropsychological Society, 20(9), 887– 896.

13.

Kroenke

, Spitzer

R. L.

, & Williams

J. B.

. (2001). The PHQ-9. Journal of General Internal Medicine, 16(9), 606– 613. 10.1046/j.1525-1497.2001.016009606.x

14.

La Cour

, & Andersen

. (2006). Neuropsychological assessment with the Visual Gestalt Test: Psychometric properties and differential diagnostic probabilities. Scandinavian Journal of Psychology, 47(1), 1– 8.

15.

La Paglia

, La Cascia

, Cipresso

, Rizzo

, Francomano

, Riva

, & La Barbera

. (2014). Psychometric assessment using classic neuropsychological and virtual reality based test: A study in obsessive-compulsive disorder (OCD) and schizophrenic patients. Pervasive Computing Paradigms for Mental Health (pp. 23– 32). Springer International Publishing

16.

Mahoney

F. I.

, & Barthel

. (1965). Functional evaluation: The Barthel Index. Maryland State Medical Journal, 14, 56– 61.

17.

Mungas

, Reed

B. R.

, Crane

P. K.

, Haan

M. N.

, & González

. (2004). Spanish and English Neuropsychological Assessment Scales (SENAS): Further development and psychometric characteristics. Psychological assessment, 16(4), 347.

18.

Nelson

H. E.

. (1976). A modified card sorting test sensitive to frontal lobe defects. Cortex, 12, 313– 324.

19.

Ostrosky-Solís

, López-Arango

, & Ardila

. (2000). Sensitivity and specificity of the Mini-Mental State Examination in a Spanish-speaking population. Applied Neuropsychology, 7(1), 25– 31.

20.

Peintinger

, & Klünemann

H. H.

. (2013). Psychometric diagnostic of cognitive functions and motor skills— /INS; Findings from a neuropsychological test/INS; battery on Niemann–/INS; Pick type C-patients. Journal of the Neurological Sciences, 333, e634.

21.

Reitan

R. M.

, & Wolfson

. (1985). The Halstead-Reitan neuropsychological test battery: Theory and clinical interpretation, Tucson, AZ: Neuropsychology Press.

22.

Rey

. (2009). REY: Test de copia y de reproducción de memoria 441 de figuras geométricas complejas, Madrid: TEA ediciones.

23.

Russell

E. W.

, Russell

SLK

, & Hill

B. D.

. (2005). The fundamental psychometric status of neuropsychological batteries. Archives of Clinical Neuropsychology, 20, 785– 794.

24.

Salvadori

, Poggesi

, Pracucci

, Inzitari

, & Pantoni

. (2015). Development and psychometric properties of a neuropsychological battery for mild cognitive impairment with small vessel disease: The VMCI-Tuscany Study. Journal of Alzheimer’s Disease, 43(4), 1313– 1323. 10.3233/JAD-141449

25.

Salthouse

. (2000). Aging and measures of processing speed. Biological Psychology, 54, 35– 54.

26.

Salthouse

. (2001). Structural models of the relations between age and measures of cognitive functioning. Intelligence, 29, 93– 115.

27.

Salthouse

. (2009). When does age-related cognitive decline begin?. Neurobiology of Aging, 30, 507– 514.

28.

Salthouse

, Atkinson

T. M.

, & Berish

D. E.

. (2003). Executive functioning as a potential mediator of age-related cognitive decline in normal adults. Journal of Experimental Psychology: General, 132(4), 566– 594.

29.

Schretlen

. (1997). Brief Test of Attention Professional Manual, Odessa, FL: Psychological Assessment Resources, Inc.

30.

Schretlen

D. J.

. (2010). Modified Wisconsin Card Sorting Test: M-WCST; Professional Manual, Lutz: PAR.

31.

Schroeder

D. H.

, & Salthouse

. (2004). Age-related effects on cognition between 20 and 50 years of age. Personality and Individual Differences, 36, 393– 404.

32.

Smith

. (2002). Manual de test de sí mbolos y dí gitos SDMT, Madrid: Publicaciones de psicología aplicada. TEA ediciones.

33.

Tombaugh

T. N.

. (2011). Test de Simulació n de Problemas de Memoria, Madrid: TEA Ediciones.

34.

Van der Elst

, Dekker

, Hurks

, & Jolles

. (2012). The letter digit substitution test: Demographic influences and regression-based normative data for school-aged children. Archives of Clinical Neuropsychology, 27, 433– 439. 10.1093/arclin/acs045

35.

Van der Elst

, Hurks

, Wassenberg

, Meijs

, & Jolles

. (2011). Animal verbal fluency and design fluency in school-aged children: Effects of age, sex, and mean level of parental education, and regression-based normative data. Journal of Clinical and Experimental Neuropsychology, 33(9), 1005– 1015. 10.1080/13803395.2011.589509

36.

Van der Elst

, Van Boxtel

MPJ

, Van Breukelen

GJP

, & Jolles

. (2006). Normative data for the animal, profession and letter M naming verbal fluency tests for Dutch speaking participants and the effects of age, education, and sex. Journal of the International Neuropsychological Society, 12, 80– 89. 10.1017/0S1355617706060115

37.

Villaseñor

, Guárdia

, Jiménez

, Rizo

, & Peró

. (2010). Sensitivity and specificity of the Mini-Mental State Examination in the Mexican population. Quality & Quantity, 44, 1105– 1112.