Rating Patients in Different Languages: Reliability and Validity

Abstract

Research outcomes in mental health disciplines are usually assessed using rating instruments that were developed as English language versions. However, in countries such as India, English is not the native language, and patients at even a single research center may speak in different regional tongues. It is permissible to assess such patients using rater-administered English language instruments designed to be scored after an unstructured interview conducted in the patient’s preferred language. For many reasons, related to reliability and validity, it is not permissible to assess such patients in their preferred language by translating, impromptu, from English language versions of instruments that were designed to be self-administered or administered as a structured interview. In such situations, standardized, local language versions of the instruments should be used; that is, local language versions with established reliability and validity.

Keywords

Rating instruments translation reliability validity standardization

In mental health research, study outcomes are commonly measured using rating instruments. As examples, outcomes in depression may be measured using the Hamilton Rating Scale for Depression (HAM-D) and the Beck Depression Inventory (BDI). These instruments are widely used in English language versions. In many countries, however, English is not the native language. In India, in particular, a single research center may serve patients who speak a wide range of regional languages. In this context, the following questions arise:

The HAM-D is a rater-administered instrument. Is it permissible for bilingual raters to use it in the patient’s own language?

The BDI is a self-administered instrument. If a local language version is unavailable, is it permissible for bilingual raters to administer it in the patient’s own language?

If a local language version of the BDI is available but the patient is illiterate, is it permissible for raters to read out the items and score items based on the responses received?

What should investigators do when different patients speak different languages, and some are literate and others illiterate?

These questions are answered in turn. Of note, what is explained for the HAM-D and BDI applies to other rater- and self-administered instruments as well.

Rater-administered Instruments

Can the HAM-D be translated, impromptu, and administered in a local language? This is a non-question because the HAM-D was designed to be scored by the rater after an unstructured clinical interview. So, raters can certainly interview patients in their preferred language and then score the interview using the English version of the HAM-D.

However, the structured interview version of the HAM-D¹ should not be translated impromptu and administered because: (a) there is no assurance that the rater’s impromptu translation will be a good fit; and (b) there is no assurance that the rater will translate the instrument in the same manner for every patient. The former concern addresses the validity of the impromptu translation, and the latter addresses its reliability. Whereas such an informal translate-and-rate procedure may not be a serious issue for the structured HAM-D (because it was originally meant to be scored after an unstructured interview and because the structured version is only a guide), it could be problematic for rater-administered instruments that were designed to be administered as structured interviews. An example is the Clinician-Administered PTSD Scale.²

Self-administered Instruments

Can the BDI be translated, impromptu, and administered in a local language? No, for the same reasons stated for the structured interview version of the HAM-D.

If a local language, validated version of the BDI is available, can this be read out verbatim to an illiterate patient and the responses recorded? Again no, for several reasons. Patients may ponder and respond in different ways when they can read and re-read questions and answer leisurely, on paper, than when they listen to a question and answer directly, on face-to-face enquiry; in consequence, interview-based administration may introduce random (nonsystematic) errors: the direction of error is unpredictable and could be different in different patients. Next, patients may not answer honestly to personal questions (such as questions related to sexuality) when they respond to face-to-face enquiries. To other questions, they may provide answers that conform to social expectations, or they may even answer in a way that they believe will please the rater. Thus, interview-based administration may also introduce systematic errors: the expected direction of error is similar in all patients. Finally, responses to face-to-face enquiries may be unpredictably biased by rater-related verbal and nonverbal cues. So, the validity of ratings can be compromised in many different ways.

Possible Solutions

One solution is to only use rater-administered instruments, but this strategy ignores patient perspectives. Another solution is to obtain patient-reported outcomes using a separate visual analog scale (VAS) for each outcome; however, whereas the VAS is a widely accepted method of rating, this strategy requires the patient to consolidate into a single number the complexity of experience of whatever outcome is being assessed. Alternately, using an appropriately standardized translation of each item, the VAS can be used to rate each item of the instrument (e.g., the BDI), and a total score can then be computed. However, the reliability and validity of such a strategy has never been examined. A use of this strategy would also trigger the earlier-mentioned limitations of rater administration of instruments intended for self-administration.

So, as the best solution, it becomes necessary to develop standardized, local language versions of self-administered instruments. This would require forward and back translation exercises and reliability and validity exercises.³ If the instrument is meant to be used in illiterate patients, it must be standardized not as a self-administered instrument but as a rater-administered semi-structured interview with structured responses; because this would change the very nature of the instrument, it might be best to either not use self-rated instruments or not recruit illiterate patients. The latter could make it harder to achieve an adequate sample size for a properly powered study.

Parting Notes

If an unstandardized version (reliability and validity unconfirmed) of a self-administered instrument must unavoidably be administered in a local language, it should at the very least be a version that has been satisfactorily forward and back translated so that an identical version can be read out each time. The options for response should be chosen by the patient in a structured fashion and not subjectively interpreted by the rater and recorded. Whatever is done should be uniformly applied to all patients. It is bad research to administer an instrument in a written form to some patients and as an interview to other patients. This is because, in addition to compromised conclusions arising from differences in reliability and validity of the ratings, there could be an introduction of systematic as well as nonsystematic measurement error, resulting in increased statistical noise and lower statistical power.⁴

Footnotes

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

References

Williams

. A structured interview guide for the Hamilton Depression Rating Scale. Arch Gen Psychiatry, 1988; 45(8): 742–747.

Blake

, Weathers

, Nagy

, . The development of a Clinician-Administered PTSD Scale. J Trauma Stress, 1995; 8(1): 75–90.

Menon

and Praharaj

. Translation or development of a rating scale: plenty of science, a bit of art. Indian J Psychol Med, 2019; 41(6): 503–506.

Andrade

. Understanding statistical noise in research: 1. Basic concepts. Indian J Psychol Med, 2023 Jan; 45(1): 89–90.