Abstract
Psychological assessment process involves administration, scoring, interpretation, and report writing. In addition to the human resource, time, and effort required for the training, the assessment process itself requires significant time and effort on the part of trained professionals. This automatically translates into the cost, which sometimes can be significant. Given the significantly limited number of trained psychologists who are in clinical practice in India compared to the extent of the population of the country, the amount of assessment work can take away precious time from the intervention services. On the other hand, a significant number of psychological assessment tests used in India are generally old with outdated norms, are poorly standardized, and are not keeping pace with the global development. Given these issues, mental health profession on the whole, as well as the clients/patients can question the validity of the psychological assessments or some specific assessments. This article tries to discuss some of the issues related to the validity of the psychological assessments in general and in specific domains, such as ability, achievement, and psychopathology. In addition to mentioned issues, the article also tries to suggest possible measures to overcome those limitations.
Keywords
Introduction
Psychological assessments are routinely carried out with children and adolescents in varied contexts and needs, such as to assess the abilities (developmental, intellectual, neuropsychological); achievement (academic learning difficulties); temperament and personality characteristics along with constructs such as motivation, affect, resilience, and self-esteem; understanding interpersonal phenomenon and relationships (attachment, adjustment); clinical (screening, psychodiagnostics) and research purposes. Several times a combination of the above are used in contexts such as forensic related cases and to check for prognosis as well as treatment planning. Apart from the above, assessments are carried out to check the type and extent of disability (developmental, activities of daily living, self-help skills), for mandatory submission requirement as part of the academic course, and to objectively convey a clinical diagnosis to adolescent and/or parent due to the chances of it being denied if not supported by test results, for example, some personality traits/disorder (narcissistic or emotionally unstable personality traits/disorder), intellectual sub-normality, and learning disability. In addition to the above reasons, there are few professionals who carry out some of these assessments as part of a routine without adequate rationale, mainly for economic reasons.
Psychologists use different techniques or methods for psychological assessments. These vary in terms of modality (verbal vs performance), level of awareness of the examinee (self-report vs projective), materials used (paper-pencil vs objects), and extent of structure (structured vs semi-structured). There are no unanimously accepted method or technique to assess most of the psychological constructs. Further, different methods and techniques have their own merits and limitations. Often times, psychological construct that is assessed might be valid, but not the test or the way it is assessed, for example, the construct of intelligence, assessed through Bender Gestalt Test (BGT) 1 on an adolescent. On the other hand, the test per se might be valid, but not the psychological construct it is said to assess, for example, using BGT to assess personality of an adult.
Other issues are time taken for the whole process (administration, scoring and interpretation); the extent of training time, cost, and effort required to master the above processes; the costs involved for the client/patient; and finally, the cost-benefit ratio that becomes relevant for the treating team. Though usually psychologists who do assessment may not mind the whole process as it is a major part of their professional role and responsibilities, other mental health professionals, treating team, patients/clients might question, whether psychological assessment tests are required for use and if yes, are they valid?
Validity of Psychological Assessment Tests
Validity of a test refers to “does it measure what it intends to measure?” In terms of psychological assessment, the validity does not stop at that and it should go further and beyond. This is because, the working of the brain is so complex that the validity of a test does not rest on the administration, scoring, and interpretation. That is, depending on the situation and the client, one has to modify the administration, scoring, and/or interpretation. However, when the basic premise of validity is itself questionable, going beyond seems too farfetched at present. On the other hand, one has to remember the proverb that “a chain is only as strong as its weakest link.” In this context, as the brain is so complex, it might be apt to say that “validity of a particular psychological test lies with the expertise of the administrator, scorer, and/or interpreter.” This implies that (if) even though the test is valid, where it measures what it intends to measure, the application and/or the results of that validity rests on the psychologist’s expertise. More on these will be discussed in detail below.
Standardization and Adaptation
The methods used for the assessment of most of the psychological constructs vary across places and professionals. Though several changes and modifications have occurred over time in some of the tests and its methods, the understanding and the basic structure have remained more or less the same from several decades. This is more applicable for developing economies and socioeconomically diverse countries like India.
For a test to have good validity the test should undergo standardization and/or appropriate adaptation. The issues with the above can be categorized into the following.
Using Western Tests and Their Norms as It Is Without Any Changes
Some of the examples for this include Vineland Adaptive Behavior Scales (VABS), 2 Rorschach Ink Blot Test, 3 Wechsler Objective Reading Dimensions, 4 and Bayley Scales of Infant and Toddler Development Screening Test. 5
Using Western tests and their norms as it is violate the core psychological principles of testing and standardization. This is because the norms are about individual differences and are always used to compare children among their socioeconomic and geographical peers of the same age. One of the easily available examples for this is Standard Progressive Matrices, 6 where one can observe the differences in UK and Indian norms. To get a percentile rank of 50, a 12-year-old should score 33 out of 60 in Delhi, but 41 in UK, a difference of 8 points.
Similarly, in Rorschach a response is considered as “contamination” response, when 2 (or more) incompatible things get fused to form a single thing. For example, fusing a lion’s face with human body to form a “Narasimha” (lion-man) response on card 4 is considered a pathognomonic sign suggesting schizophrenia.7, 8 However, if the psychologist is not aware about the sociocultural-religious practices prevalent in the population, a response of “Narasimha,” or any other similar fusions can lead to false-positive errors.
Using Tests or Scoring-Interpreting Systems That Are No Longer Accepted as Valid Elsewhere
Examples for this are object assembly and picture arrangement subtests in Weschler Adult Performance Scale of Intelligence (WAPIS) 9 ; Object assembly and mazes subtest in Malin’s Intelligence Scale for Indian Children (MISIC) 10 ; and Beck’s 11 and Klopfer’s 12 scoring, and interpretive systems in Rorschach.
The subtests of intelligence tests mentioned above have been discontinued in developed/Western countries, as these subtests do not represent intelligence and/or they do not adequately fulfil the psychometric properties. However, it is still used in India, despite the option that the missing score can be adjusted with proration.
Norms not Being Revised or Updated
India has extreme polarities on factors such as socioeconomic conditions, educational opportunities, nutrition levels, and parental involvement. The effect of these can be observed in performances of several tests, including projective tests, where a child with better socioeconomic and educational opportunities might narrate better stories for Children’s Apperception Test (CAT), 13 compared to a child from adverse socioeconomic-educational conditions.
In terms of intelligence tests, one of the important aspects related to this is the phenomenon of “Flynn effect.” 14 Flynn effect refers to “increase in intelligence scores over time in general population,” which has been observed for different types of intelligences and across countries. For example, about 3 points per decade in US, about 1.65 points in Estonia, and about 7.7 points in Japan. 15 Further, the rise is observed more in the lower side of the NPC than higher side. This necessitates revision of norms. However, barring very few small tests, such as for Sequin Form Board test,16, 17 there are hardly any attempts at revising norms or developing new norms.
Adaptation of Original Content to Local Languages Without Changing Norms
India has about 30 languages spoken by over a million native people. 18 Surprisingly, majority of the standardized psychological tests are only in English language, except few tests, which were standardized on 3 to 4 languages. Hence, whenever there are no tests available, English tests are translated into local languages. This process has several limitations, such as, when English is translated into local languages, the number of words and difficulty level decrease. This has implications when assessing for working memory and vocabulary scores respectively.
Developing Local Norms without Changing the Original Content
Examples for this include, Wechsler Intelligence Scale for Children—IV 19 and Dyslexia Screening Test. 20 This procedure is almost close to having what is desirable in a psychological assessment. However, this too has some limitations, which the test developers themselves accept—that they are adapted on English speaking urban dwelling population only, and hence, it cannot be used on significant chunk of the population.
Changing the Content of the Already Existing Test to Suit the Population as Well as Developing Local Norms
There are several tests that have used this process, such as Binet Kamat Test (BKT),21, 22 MISIC, WAPIS, Vineland Social Maturity Scale (VSMS),23–25 SFBT, CAT, and Sentence Completion Test (SCT). 26 This is almost one step lesser than the most desirable solution of developing and standardizing a completely new test. One can question that, in general, to what extent psychological principles that are utilized in tests are common among different countries. There is consensus that the general principles (eg, intelligence) might be the same. However, the specific aspects utilized in tests might differ across countries. Such as, pictures used in TAT/picture completion subtest, and sentence stems in SCT. Therefore, majority of the people who adapted the above tests have retained the general principles, but altered the specific test items. However, the important issue is that the local adaptation has not kept pace with the updates that have happened for the original tests elsewhere.
Developing New Tests
Developing new tests is the most desirable thing one can hope for. There were attempts in this regard, for example, Bhatia’s Battery of Performance Test of Intelligence 27 and Post Graduate Institute Memory Scale. 28 This is appreciable and noteworthy. However, one of the major limitations is that many of these tests have been developed several decades ago, and some of them do not fulfil the sound psychometric properties. For example, Bhatia’s battery can be used only on adolescent boys between 11 and 16 years, and it has separate norms for illiterates and literates. 29
Developing new tests, and especially in a country like India with varying socioeconomics, culture, and multitudes of language, is probably the most difficult process. The process might take up several years and requires huge financial and human resources.
Few Important and Fundamental Psychometric Properties
One of the most important requirements of a sound psychological test is that it should be valid and reliable. For a test to be valid it should have few requirements. These requirements vary with respect to what the test intends to measure.
If its goal is to measure the “ability” as part of “individual differences”; such as, memory, problem solving, reasoning, and intelligence, the norms derivation should have been based on the normal probability curve, and the corresponding norms should approximately resemble an inverted “s” shaped curve (for detailed discussion refer Roopesh 17 ). If the items of the tests are too easy, the result would show a “negative skew” distribution, and if the items are too difficult, then the result will resemble a “positive skew” distribution. Either of these distributions is not appropriate.
To achieve this, it is desirable to have good number and representation of all the people, such as based on age, gender, education, socio-economic conditions, parental occupation, geographical regions, languages as well as target population (eg children with SLD).
Reliability and validity are 2 aspects that are important for a test. It is commonly known that a test can be reliable, but may or may not be valid. On the other hand, a test cannot be valid if it is not reliable. This aspect is ignored in some of the psychological tests, for example, TAT/CAT, where there can be significant variations in interpretation of the stories.
Issues Related to Assessment and Interpretation
Validity does not rest with the test. It extends beyond the test. For example, rarely one asks “whether the examiner variable is valid?” This applies to several aspects such as the training one has received and the extent of knowledge about the particular test.
Assessment by Students/Trainees
The majority of the assessment with children and adolescents are carried due to intellectual and academic related difficulties, and this often involves providing reports and certification. Other tests, such as tests to understand temperament/personality traits, interpersonal issues, and psychodiagnostics, are sparingly used. Given this, majority of the testing is done by students in mental health institutions or centers, where they get trained as part of their professional level academic course or as part of internship. Testing requires substantial time and effort to administer, score, interpret, and write reports. Hence, in many settings, established professionals focus on counselling/therapy and delegate assessment to trainees/interns.
Differences Across Institutes and their Training
There are significant differences in training of psychological assessment across and even within institutes. For example, to assess intelligence alone, different institutes use different tests, for example, BKT or MISIC. Even within the institutes, different professionals use different tests to administer and/or different approaches for interpretation. For example, there are institutes where more than 2 or 3 scoring systems of Rorschach3, 7, 12 are used. All these adversely affect the overall validity of psychological assessments in the country.
Though the Gazette has laid down some basic criteria about the type of tests to be used, it is only to diagnose/certify intellectual disability and SLD. However, even these criteria are not complete and it does not address important aspects, for example, which test to use to screen for intelligence if the child is more than 16 years.
To Corroborate with Symptoms/Diagnosis
It is a common knowledge among professionals that psychological test findings, especially of those used for psychodiagnostics using projective techniques, do not always corroborate with psychiatric diagnosis. However, trainees and interns as well as young professionals might try to cherry pick information from the test results that suit the psychiatric diagnosis. The reason for this can vary, such as insecurity with respect to psychological testing, their profession or about themselves. Trainees and young professionals need to understand that it is alright if the psychological testing results do not corroborate with the psychiatric diagnosis. The reason for this can be several. Such as, the test might not be minutely sensitive to the changes in the symptoms of the patient; symptoms vary from patient to patient even though the disorder might be same; and/or the age, education, gender, experiences of different patients vary significantly even though they might be experiencing same disorder or even same symptoms, which might impact the presentation of the symptoms.
Spotlight Mainly on the “Problems”
Another trend observed is giving more “focus to” or reporting only the “problem” and ignoring the “neutral,” “positive,” and/or “scatter” findings observed in test. This can be observed across tests and methods. For example, in Rorschach, one might only consider the pathognomonic signs and ignore popular responses, and/or highlighting only the spelling errors on few words, but not considering correct spellings given for difficult words in SLD assessment.
On the other hand, often times, scatter results are not considered. With respect to ability testing, scatter in a child with average or above average intelligence may not significantly affect the reporting of the total IQ scores. However, if scatter is present in a child with sub-normal intelligence, more careful analysis is required than merely reporting full scale IQ.
Extraneous Variables Affecting the Validity of the Test
Psychological tests cannot be administered, scored, and interpreted blindly. This is true even for the good standardized tests. As one is dealing with the complex working of the brain, several factors influence a person’s performance on the test. One of the significant factors is the “motivation” of the subject, which can affect the result significantly. Further, recent life events/experiences and context matter, especially in projective tests.
Few Important Concepts and Fallacies Observed While Interpreting the Test Results
There are few things a psychologist needs to be aware of when interpreting any test findings. They are discussed below.
Occam’s Razor
Otherwise referred to as “law of simplicity” or “principle of parsimony,” it states that, “if two explanations are equally plausible, then the simpler of the two should be considered as true.” The simpler explanations will be the one that require fewer assumptions and exceptions. In clinical setting, for example, if an adolescent rejects the “intimate/sex card” in TAT, there can be 2 or more than 2 explanations, such as (a) an unconscious preoccupation for intimacy/sex, (b) the adolescent has some issues with sexuality/sexual area, (c) the adolescent avoids intimacy/sexual relationships, or (d) the adolescent is shy. Here the last explanation, which is, “the adolescent is shy,” is the simpler of the 4 explanations, and this needs to be considered for the interpretation, unless some other extra information is available, for example, from the adolescent’s case history.
Cherry-Picking and Confirmation Bias
Cherry-picking refers to choosing data or data sets that we want and ignore what we do not want, so that the study will give desired results. It is also referred to as sampling bias. Confirmation bias happens when we favor information that suits our beliefs and ignore information that contradicts the same. These 2 concepts are related and used interchangeably. With respect to psychological tests, cherry-picking can happen in research as well as in clinical setting. With respect to clinical setting, for example, if a child responds to the SCT sentence stem, “my girlfriend…,” as “my girlfriend is nasty and has attitude,” the examiner cannot cherry-pick only this item and generalize that the child has “unfavorable attitude towards women.”
Barnum/Forer Effect
Barnum/Forer effect is a phenomenon where individuals believe that particular personality descriptions apply to themselves (more so than other people), despite the fact that the personality description has information that is generally observed in majority of the people. Example for such personality statements can be, “you have a great desire that people should accept you,” “sometimes you are critical and harsh on yourself,” and/or “though you have some weakness, you strive hard to overcome them.” One of the important things is that this phenomenon works only if the statements are positive. There is a criticism that some of the psychological assessment reports contain these sorts of statements.
Projective Techniques and Tests
Of all the psychological tests, projective techniques and tests have received substantial criticisms from outside and little from within the profession. Further, given the subject matter, there are no consensus about its relevance, administration, scoring as well as interpretation. One of the major criticisms about the projective techniques as a whole is that the tests are subjective and relatively has poor inter-rater reliability. Apart from these there are several criticisms, such as:
It works like a double edge sword (the subject projects their needs/emotions/belief while responding, and the examiner can project the same while interpreting those response) The influence of subject’s creativity is generally ignored Experience and context are rarely taken into consideration in interpreting the results The extent of human resources, time and the cost required for training is substantial The assessments, scoring, and interpretation are time consuming for the psychologist and this might reflect in increasing charges.
The latter 2 aspects (training cost and time taken) usually result in delegating assessments to trainees/interns who usually have less expertise.
Further, each of the projective techniques might have their own relative standard procedures that are established and these might even have some validity. Given this, the integrity of each test might hold to some extent when these tests are administered individually. However, the question is whether all these tests (eg RIBT, CAT, SCT, Object Sorting Test [OST] 30 ) can be administered sequentially on a same patient in short span of time. This is because, test instructions of one test might adversely affect the performance on other tests. For example, one of the instructions of the CAT involves (directly or indirectly) telling the subject “to be as elaborative/imaginative as possible.” If one administers OST 30 after CAT then there are high chances the subject automatically assumes that s/he need to be as elaborative as possible, and this might make them to provide elaborate explanations for OST sorts, which might be interpreted as “over-inclusion,” which is one of the diagnostic indicators for psychosis.
On the other hand, while Rorschach administration, the subjects would have got used to perceive (mainly) meaningful pictures/figures/people/animals in the otherwise abstract inkblots. After this, if one administers OST, then it might be easier to perceive OST objects of “doll” as child, “male photograph” as male, “thali/mangalsutra” as representing marriage, and so on; where all these symbolic perceptions in OST are considered to be indicative of “psychosis/schizophrenia.”
In Defense
As mentioned earlier, the functioning of the brain is extremely complex to assess. Apart from structural and functional imaging techniques that can reveal some very basic details, there are only few procedures a mental health professional can rely on. These are case-history, mental status examination, few available materials such as written samples (diary entries, letters written to others), school/college reports, and/or psychological assessment reports (both self-report and projective techniques). These sources of information itself can be considered as limited in understanding the person and/or their psychiatric symptoms. In addition, projective techniques as a method of assessment/investigation can provide additional information of subjects’ internal world, which might be crucial in understanding and treatment planning. However, one should consider the several aspects, limitations, and ethics while administering, scoring, and interpreting these tests.
Ability Tests
With respect to ability tests, this article focuses mainly on intelligence tests and those used with children and adolescents. As discussed above, one of the most common issues is that the norms of almost all tests that are standardized on India are several decades old, except WISC-IV and Colored Progressive Matrices, which relatively has recent norms. Apart from the core intelligence tests, there are tests that try to assess intelligence indirectly, such as Draw-a-Man test, 31 SFBT, BGT, and so on. Given that, when there are so many issues with core intelligence tests itself (such as, deciding what is intelligence, how it needs to be assessed, which subtest is better, and how to interpret the scatter among subtest), it is treading too far when one tries to measure intelligence indirectly. Hence, it might be better to use the core intelligence tests which have a smaller number of limitations.
Adaptive Functions
There are relatively few tests that assesses social and adaptive functions. One of the popularly used and a mandatory requirement to be carried out to certify for intellectual disability is VSMS. As with BKT, this test also uses ratio quotient scoring system, and was developed several decades ago. VSMS is highly correlated with IQ when done with children (than adolescents) after which the relationship of social and adaptive functions with intelligence starts reducing. Though this test appears simple to use, great caution needs to be exercised (for a detailed discussion, please refer Roopesh). 25 VABS is the updated version of VSMS, and currently the third edition has been released in US. VABS is far more superior to the VSMS in terms of number of domains and the age range. Some professionals in India do use VABS, however, as it is not standardized on Indian population, the applicability is not guaranteed.
SLD Tests
There are few test batteries in India and among them only one battery, that is NIMHANS Index of Learning Disability, 32 is mandated by Gazette of India 33 to diagnose SLD. This battery is developed using English medium students with the difficulty level of state board syllabus. The test can be used from grade first to about grade tenth; however, it becomes difficult to diagnose in the secondary level. This battery has several issues with its validity, such as improper selection and ordering of test items; difficulty in diagnosing dyscalculia; and difficulty to diagnose when the child has superior intelligence, when the child is studying in CBSE/ICSE board, and when child’s performance is borderline range (between difficulty and disability). However, for moderate to severe cases of SLD, this test does justice and it is simple to interpret, as the criteria followed for the diagnosis is “3 grade lesser—discrepancy criteria” (for a detailed discussion please refer Roopesh). 34
Conclusion
The issues discussed in this article might give an opinion that everything is wrong with the psychological assessments carried out in India. However, it is not so. Even developed countries suffer from several issues with respect the psychological tests they use. For example, developed countries face difficulty in applying tests (that are standardized on their population) on their minority and/or migrant population. Almost everywhere, things that affect the psychological test results, such as the geographic movement and academic education, are changing; and given the difficulty in funding, it is not easy to keep updating the norms. This is more so in a country like India. For any scientific field to develop and become better, the culture and its people should have questioning attitude and scientific vigor. On the other hand, when one completely relies on their profession to identify with, to define who they are, and/or to determine their self-esteem, and if even any remote issues happen with the profession (or any aspect of the profession) then there are chances that it might lead to insecurity in the person (about themselves, their profession, or both). Given this, it might be difficult if anybody questions the validity of the profession or any part of it (such as psychological assessment in general and/or projective techniques in particular). This might lead to denial and rejection of the question/issue itself. However, questioning something and accepting its mistakes, in this case, about psychological tests, actually helps in improving the same by plugging any loopholes. There are thousands of students studying psychological tests, and if proper awareness, direction, and guidance is provided, they can take up the work in this area.
In conclusion, despite shortcomings, there are clear consensus about the validity of certain domains/areas of assessment. That is, testing of ability, achievement, neuropsychological functions, and disability assessments still enjoy good support across the professionals and public. On the other hand, there is no consensus about the use of some of the techniques, such as projective techniques as diagnostics of mental disorders. However, the issues are more with individual tests rather than the domain itself. For example, the issues can be with BKT or MISIC but not with the domain of intelligence itself. Further, sometimes, the issues are not even with the test per se, but with a subtest and/or particular test material. Given this, one can say that, “in general, though there are serious issues with psychological assessment scenario in India, in most cases, the problem is not with the test per se, but with the way it is used and/or abused by the concerned professionals.
Footnotes
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
