Antipsychotic medication side effect assessment tools: A systematic review

Abstract

Objective:

The aim of this systematic review was to critically appraise the psychometric properties of antipsychotic medication side effect assessment tools.

Methods:

Systematic searches were undertaken in PubMed, CINAHL and CENTRAL from inception to October 2014. Studies were included if they detailed the evaluation of psychometric properties of antipsychotic medication side effect assessment tools in mental health populations. Studies were excluded if they examined the use of antipsychotic medication side effect assessment tools in non-mental health populations, including people suffering from dementia, Parkinsonism and Alzheimer’s. Narrative reviews and studies published in any language other than English were also excluded.

Results:

Content validity was appropriately established for only one of the tools, reliability was inappropriately evaluated for all but one tool, and the assessment of responsiveness was not acceptable for any tool.

Conclusion:

Further psychometric studies are warranted to consolidate the psychometric properties of the included antipsychotic medication side effect assessment tools before any of these tools can be confidently recommended for either research or clinical purposes.

Keywords

Systematic review antipsychotic medication side effect assessment tool

Background

Antipsychotic medication remains one of the principal approaches to managing psychotic disorders. These medications reduce psychotic symptoms but at the same time produce side effects that are often unbearable (Morrison et al., 2015a, 2015b, 2015c). Such side effects include severe sedation, substantial weight gain, sleep disorders, sexual dysfunction and difficulties in social activities (Leucht et al., 2009; Lieberman et al., 2005). The intolerability of these side effects has been identified as a key determinant of non-adherence to antipsychotic medication (Haddad et al., 2014a; Lieberman et al., 2005).

Discontinuing antipsychotic medication often leads to relapse in psychotic disorders, which in turn may result in loss of employment, loss of housing, relationship and social problems and increased risk of suicide (Ascher-Svanum et al., 2008; Chapman and Horne, 2013; Yen et al., 2009). Evidently, mental health consumers may find themselves in a perpetual pattern of illness occurrence, prescribed medication subsequently producing unendurable side effects, discontinuation of medication and eventual relapse (Llorca, 2008; Naber and Karow, 2001; Salomon and Hamilton, 2013).

To intervene in this cycle, clinicians need to communicate about side effects resulting from antipsychotic medication in an open manner with mental health consumers, while also considering the potential benefits of taking medication (Gerlach and Larsen, 1999; Happell et al., 2004; Naber and Karow, 2001; Salomon and Hamilton, 2013). However, mental health consumers often experience difficulties in communicating openly with clinicians, possibly because they may be reluctant to fully disclose their personal circumstances, or the medication or the illness has impaired their ability to communicate (Naber, 2008; Roe and Goldblatt, 2009; Seale et al., 2007). Exacerbation of the communication breakdown between mental health consumers and clinicians may be due to clinicians having poor communication or diagnostic skills leading to a less than comprehensive understanding of mental health consumers’ situation (Hungerford and Fox, 2014; Morrison et al., 2000; Roe and Goldblatt, 2009). Studies have found that clinicians consistently underestimate the rate of side effects resulting from antipsychotic medication, which suggests that it is a common clinical problem (Dassori et al., 2003; Hellewell, 1999).

These issues with the identification of antipsychotic medication side effects and how these may be managed constructively could be addressed through methods that facilitate effective communication between mental health consumers and clinicians. This underlines the need for assessment tools that either enable clinicians to readily ascertain the nature, frequency and impact of side effects or allow mental health consumers to detail the side effects they experience (Cabeza et al., 2000; Dott et al., 2001; Goff et al., 2010; Hellewell, 1999). Information from such tools can provide clinicians with a better understanding of the real and potential issues mental health consumers encounter when taking antipsychotic medication, and thereby facilitate discussions with consumers about strategies to ameliorate side effects and improve medication adherence. This problem is central to the quality of life and recovery journey for many mental health consumers (Dassori et al., 2003; De Leeuw et al., 2012; Dott et al., 2001).

The objectives of this systematic review were twofold: (1) to identify all available antipsychotic medication side effect assessment tools and (2) to critically appraise the psychometric properties of these assessment tools.

Methods

Search strategy

Figure 1 displays the implementation of the search strategies and subsequent selection of studies. We developed electronic search strategies to identify English language studies of tools that assessed antipsychotic medication side effects. The PubMed, CINAHL and CENTRAL databases were searched from inception to October 2014. Supplementary Appendix 1 contains the specific search strategies used in PubMed, CINAHL and CENTRAL. The titles and abstracts for all studies retrieved by the initial searches were reviewed to identify potentially relevant studies detailing antipsychotic medication side effect assessment tools.

Figure 1.

Implementation of search strategies and selection of studies.

For each of the identified potentially relevant antipsychotic medication side effect assessment tools, individual PubMed and CINAHL searches were undertaken by using these tools’ full titles and acronyms. Additional studies were identified by a manual search of the citation lists for the identified studies detailing potentially relevant antipsychotic medication assessment tools. Finally, full text copies of studies that described either the validation or use of any of the potentially relevant measures were retrieved and considered for inclusion in this review.

Selection criteria

We included studies published between 1950 and October 2014 that detailed the evaluation of psychometric properties of antipsychotic medication assessment tools in mental health populations. Studies were excluded if they examined the use of antipsychotic medication side effect assessment tools in non-mental health populations, including people suffering from dementia, Parkinsonism and Alzheimer’s. We also excluded narrative reviews and studies published in any language other than English.

Explicit review criteria

To evaluate the quality of the studies included in the review, we used a previously developed critical appraisal tool (Stomski et al., 2010) that was primarily based on the Medical Outcomes Trust (Aaronson et al., 2002) and Terwee et al.’s (2007) recommendations for the evaluation of outcome measures. This tool contained criteria that appraised the following psychometric properties: content validity, construct validity, internal consistency, test–retest/inter-rater reliability, responsiveness and respondent/administrative burden. Table 1 displays these criteria, and the following section explains these criteria in more detail.

Table 1.

Review criteria ratings.

Attribute	Review criteria rating
	Positive	Negative	Indeterminate
Content validity	Both target population and experts involved in item selection and tool’s intended purpose stated	Target population not involved	Only target population involved or intended purpose unclearly stated
Construct validity	A priori hypotheses presented and 75% results congruent with specified hypotheses	Adequate hypotheses but ⩾75% of hypotheses confirmed	No a priori hypotheses presented
Internal consistency	Scale(s) established by factor analysis and Cronbach’s alpha for each scale between 0.70 and 0.95	Cronbach’s alpha < 0.70 or > 0.95	Factor analysis not used or Cronbach’s alpha not derived for each scale
Test–retest or Inter-rater reliability	Acceptable rationale for repeated measure administration and sample ⩾ 50 participants and ICC ⩾ 0.70 or weighted Cohen’s kappa ⩾ 0.70 or kappa statistic ⩾ 0.70	Sample ⩾ 50 participants and rationale adhered to but ICC < 0.70 or weighted Cohen’s kappa < 0.70 or kappa statistic < 0.70	No acceptable rationale or non-compliance with rationale or unacceptable psychometric method or sample < 50 participants
Responsiveness	Smallest detectable change < minimal important change or minimal important change outside limits of agreement or area under the curve ⩾ 0.7 and sample ⩾ 50 participants	Smallest detectable change > minimal important change or minimal important change equal or inside limits of agreement or area under the curve < 0.7 and sample ⩾ 50 participants	Sample < 50 participants
Respondent burden	>15 minutes to complete and reading or comprehension level not beyond a 12-year-old	⩽15 minutes to complete or reading or comprehension level beyond 12-year-old	Time required to complete or reading or comprehension level reported in general terms
Administrative burden	>15 minutes to score and scoring involves only summation	⩽15 minutes to score or scoring involves complex mathematical procedures	Scoring time described in general terms

ICC = Intra-class correlation.

Content validity

Content validity involves the extent to which the measure’s items encompass all relevant concepts (Aaronson et al., 2002). The assessment of content validity was considered to be adequately undertaken if the measure’s intended purpose was clearly stated, and both the target population (mental health consumers in this case) and relevant experts were involved in developing the measure’s items.

Construct validity

Construct validity reflects the degree to which a measure’s scores are consistent with predefined hypotheses such as relationships to scores of other instruments, or differences between pertinent groups (Aaronson et al., 2002). The assessment of construct validity was considered to be adequately undertaken when specific predefined hypotheses regarding anticipated correlations or differences were reported, and these hypotheses were then confirmed.

Internal consistency

Internal consistency establishes the degree to which scale items measure a particular concept (Aaronson et al., 2002). The assessment of internal consistency was considered to be adequately undertaken if: the dimensions of the measure’s scales were established using either factor analysis, principal components analysis or Rasch analysis, and then Cronbach’s alpha was calculated for each identified scale and the value was between 0.70 and 0.95 (Terwee et al., 2007).

Test–retest/inter-rater reliability

Test–retest and inter-rater reliability establish the extent to which a measure’s results remain consistent between repeated administrations over time (Aaronson et al., 2002). The assessment of test–retest, or inter-rater, reliability was considered to be adequately undertaken if: it was evaluated in sample size of at least 50 participants, and the intra-class correlation coefficient, or weighted Cohen’s kappa, or kappa coefficient, equalled or was above 0.70 (Terwee et al., 2007).

Responsiveness

Responsiveness involves the ability of a specific measure to identify change over time (Aaronson et al., 2002). The assessment of responsiveness was considered to be adequately undertaken if: it was evaluated in a sample of at least 50 participants; and either the smallest detectable change was less than the minimal important change; minimal important change was outside of the limits of agreement; or the area under the receiver operating characteristics curve was at least 0.70 (Terwee et al., 2007).

Administrative/respondent burden

Respondent burden involves the time and comprehension skills required to complete the measure (Aaronson et al., 2002). Respondent burden was considered acceptable if the mean time required to complete the measure was less than 15 minutes, and the required comprehension level was not beyond a 12-year-old child. Administrative burden refers to the demands required of those administering the measure (Aaronson et al., 2002). Administrative burden was considered to be acceptable if the mean time required to administer the measure was less than 15 minutes, and only straightforward arithmetical tasks were required to calculate the measure’s scores.

Results

Identified studies

The search strategy yielded 545 potentially relevant studies. After screening titles and abstracts, 87 full text studies were retrieved and considered for inclusion in this review. We included 23 studies, which detailed 16 antipsychotic medication side effect assessment tools. The included antipsychotic medication tools were the Abnormal Involuntary Movement Scale (AIMS) (Guy, 1976), Akathisia Rating Scale (ARS) (Barnes, 1989), Antipsychotic Non-Neurological Side-Effects Rating Scale (ANNSERS) (Ohlsen et al., 2008), Approaches to Schizophrenia Communication–Self-Report (ASC-SR) (Dott et al., 2001), Approaches to Schizophrenia Communication–Clinic (ASC-C) (Dott et al., 2001), Arizona Sexual Experience Scale (ASES) (Byerly et al., 2006), Extrapyramidal Symptom Rating Scale (ESRS) (Chouinard et al., 1980), Extrapyramidal Side Effects Scale (ESES) (Simpson and Angus, 1970), Glasgow Antipsychotic Side-effect Scale (GASS) (Waddell and Taylor, 2008), Hillside Akathisia Scale (HAS) (Fleischhacker et al., 1989), Liverpool University Neuroleptic Side Effect Rating Scale (LUNSERS) (Day et al., 1995), Maryland Psychiatric Research Center Scale (MPRC) (Cassady et al., 1997), Nursing Extra Pyramidal Symptoms Assessment Scale (NEPSAS) (Fagan-Pryor and May, 2000), Prince Henry Hospital Akathisia Rating Scale (PHHARS) (Sachdev, 1994), Systematic Monitoring of Adverse events Related to TreatmentS (SMARTS) (Haddad et al., 2014b), and Yale Extrapyramidal Symptom Scale (YESS) (Mazure et al., 1995).

We identified a further 17 antipsychotic medication side effect assessment tools that were excluded for the following reasons: validated in a language other than English (De Haan et al., 2002; Gerlach et al., 1993; Kaneda, 2009; Kikuchi et al., 2011; Kim et al., 2002; Lako et al., 2013; Lindstrom et al., 2009; Lingjaerde et al., 1987; Loonen et al., 2000; Naber, 1995; Prieto et al., 2004; Wolters et al., 2006), validated in a population of people with intellectual disabilities (Bodfish et al., 1997; Kalachnik and Sprague, 1993; Matson et al., 1998) and not specifically developed to assess antipsychotic medication side effects (Gaebel et al., 2010; Hogan et al., 1983; Mojtabai et al., 2012; Nielsen et al., 2012).

Types of antipsychotic medication side effect tools

About half of the 16 antipsychotic medication side effect assessment tools included in this review assessed drug-induced movement disorders. Of these tools, one (AIMS) assessed only dyskinesia, one (ESES) evaluated only Parkinsonism and several (ARS, HAS, PHHARS) evaluated only akathisia. One tool (MPRC) assessed dyskinesia and Parkinsonism. One tool (YESS) evaluated dyskinesia, Parkinsonism and dystonia. One tool (NEPSAS) assessed akathisia, Parkinsonism and dystonia. Only one tool (ESRS) assessed the entire range of drug-induced movement disorders. All drug-induced movement disorder assessment tools were observer-rated.

Most of the other assessment tools (ANNSERS, ASC-SR, ASC-C, LUNSERS, GASS, SMARTS) included in this review assessed both neurological (movement disorders) and non-neurological side effects, which commonly include sedation, weight gain, prolactinaemic problems, gastrointestinal problems and genitourinary problems. All but two of these tools (ASC-C, ANNSERS) were self-rated. The final tool (ASES) included in this review specifically assessed sexual dysfunction and was self-rated.

Evaluation of the psychometric properties of antipsychotic medication side effect assessment tools

Tables 2 and 3 display summarised details for the included tools’ purpose, scale content, intended user, content validity, construct validity, internal consistency, test–retest reliability, inter-rater reliability, responsiveness, respondent burden and administrative burden. Supplementary Tables S1–S4 display the full details that were extracted to rate each of these psychometric attributes. Supplementary figures and tables can be found online with this article http://anp.sagepub.com/. Supplementary Table S5 displays the summarised ratings for the included tools.

Table 2.

Assessment tools’ purpose, scale content, content validity and construct validity.

Assessment tool	Purpose and scale content	Intended user	Content validity	Construct validity
AIMS (Guy, 1976)	Assesses dyskinetic movements. Scales were unreported.	Observer-rated	Inadequately undertaken because target population was not involved in item selection	Not reported
ARS (Barnes, 1989)	Assesses characteristic motor phenomena and subjective aspects of akathisia. Contains one objective scale and one subjective scale.	Observer-rated	Inadequately undertaken because target population was not involved in item selection	Predefined hypotheses presented and correlations with other relevant measures were confirmed
ANNSERS (Ohlsen et al., 2008)	Assesses non-neurological side effects. Contains one scale.	Observer-rated	Inadequately undertaken because target population was not involved in item selection	Not reported
ASC-SR (Dott et al., 2001)	Assesses communication between consumers and clinicians about antipsychotic medication side effects. Contains 17 items, which are considered individually and not grouped into a scale.	Self-rated	Inadequately undertaken because target population was not involved in item selection	Not reported
ASC-C (Dott et al., 2001)	As above	Observer-rated	Inadequately undertaken because target population was not involved in item selection	Not reported
ASES (Byerly et al., 2006)	Assesses sexual function in both heterosexual and homosexual populations. Contains one scale.	Self-rated	Inadequately undertaken because target population was not involved in item selection	Not reported
ESRS (Chouinard et al., 1980)	Assesses extrapyramidal symptoms. Contains four scales, which examine Parkinsonism, akathisia, dystonia and dyskinesia.	Observer-rated	Inadequately undertaken because target population was not involved in item selection	Predefined hypotheses presented and correlations with other relevant measures were confirmed
ESES (Simpson and Angus, 1970)	Assesses extrapyramidal symptoms. Contains one scale that examines Parkinsonism.	Observer-rated	Inadequately undertaken because target population was not involved in item selection	Not reported
GASS (Waddell and Taylor, 2008)	Assesses side effects resulting from second generation antipsychotic medication.	Self-rated	Adequately undertaken	Predefined hypotheses presented and correlations with other relevant measures were confirmed
HAS (Fleischhacker et al., 1989)	Assesses akathisia. Contains two scales, which examine subjective and objective phenomena.	Observer-rated	Inadequately undertaken because target population was not involved in item selection	Not reported
LUNSERS (Day et al., 1995)	Assesses diverse side effects resulting from first generation antipsychotic medication.	Self-rated	Inadequately undertaken because target population was not involved in item selection	Predefined hypotheses presented and correlations with other relevant measures were confirmed
MPRC (Cassady et al., 1997)	Assesses extrapyramidal symptoms. Contains two scales, which examine Parkinsonism and dyskinesia.	Observer-rated	Inadequately undertaken because target population was not involved in item selection	Predefined hypotheses presented and correlations with other relevant measures were confirmed
NEPSAS (Fagan-Pryor and May, 2000)	Assesses extrapyramidal symptoms. Contains two scales, which examine dystonia and akathisia.	Observer-rated	Inadequately undertaken because target population was not involved in item selection	Not reported
PHHARS (Sachdev, 1994)	Assesses akathisia. Contains two scales, which examine subjective and objective phenomena.	Observer-rated	Inadequately undertaken because target population was not involved in item selection	Predefined hypotheses presented and correlations with other relevant measures were confirmed
SMARTS (Haddad et al., 2014b)	Assesses side effects resulting from second generation antipsychotic medication. Contains 12 items that are assessed individually and not grouped into a single scale.	Self-rated	Inadequately undertaken because target population was not involved in item selection	Not reported
YESS (Mazure et al., 1995)	Assesses acute extrapyramidal symptoms. Contains three scales, which examine Parkinsonism, akathisia and dystonia.	Observer-rated	Inadequately undertaken because target population was not involved in item selection	Predefined hypotheses presented and correlations with other relevant measures were confirmed

AIMS: Abnormal Involuntary Movement Scale; ARS: (Barnes) Akathisia Rating Scale; ANNSERS: Antipsychotic Non-Neurological Side-Effects Rating Scale; ASC-SR: Approaches to Schizophrenia Communication–Self-Report; ASC-C: Approaches to Schizophrenia Communication–Clinic; ASES: Arizona Sexual Experience Scale; ESRS: Extrapyramidal Symptom Rating Scale; ESES: Extrapyramidal Side Effects Scale; GASS: Glasgow Antipsychotic Side-effect Scale; HAS: Hillside Akathisia Scale; LUNSERS: Liverpool University Neuroleptic Side Effect Rating Scale; MPRC: Maryland Psychiatric Research Center Scale; NEPSAS: Nursing Extra Pyramidal Symptoms Assessment Scale; PHHARS: Prince Henry Hospital Akathisia Rating Scale; SMARTS: Systematic Monitoring of Adverse events Related to TreatmentS; YESS: Yale Extrapyramidal Symptom Scale.

Table 3.

Included tool’s reliability, responsiveness and burden.

Assessment tool	Internal consistency	Test–retest/inter-rater reliability	Responsiveness	Respondent/administrative burden
AIMS (Guy, 1976)	Not reported	Assessed in inadequate sample size	Not reported	Not reported.
ARS (Barnes, 1989)	Not reported	Assessed in inadequate sample size	Not reported	Not reported.
ANNSERS (Ohlsen et al., 2008)	Not reported	Assessed in inadequate sample size	Not reported	Not reported.
ASC-SR (Dott et al., 2001)	Not reported	Not reported	Not reported	Inadequately detailed. Described as brief and easy to administer.
ASC-C (Dott et al., 2001)	Not reported	Not reported	Not reported	Inadequately detailed. Described as brief and easy to administer.
ASES (Byerly et al., 2006)	Inadequately undertaken as factor analysis not used to confirm scale structure before assessing internal consistency	Not reported	Not reported	Inadequately detailed. Described as requiring about 5 minutes to administer.
ESRS (Chouinard et al., 1980)	Not reported	Sufficient sample size but Pearson’s r, rather than ICC, used to establish reliability	Inadequately undertaken as neither minimal important change nor smallest detectable change reported	Inadequately detailed. Unclear how stated completion time was derived. Described as requiring no specialised knowledge to administer.
ESES (Simpson and Angus, 1970)	Not reported	Assessed in inadequate sample size	Not reported	Inadequately detailed. Described as simple and rapid to administer.
GASS (Waddell and Taylor, 2008)	Not reported	Assessed in sufficient sample size and Cohen’s kappa was adequate (k = 0.72)	Not reported	Not reported.
HAS (Fleischhacker et al., 1989)	Inadequately undertaken as factor analysis was not used to confirm scale structure before assessing internal consistency	Not reported	Not reported	Inadequately detailed. Unclear how stated completion time was derived. Described as self-explanatory.
LUNSERS (Day et al., 1995)	Inadequately undertaken as factor analysis was not used to confirm scale structure before assessing internal consistency	Inadequately undertaken because Pearson’s r was used to establish reliability	Not reported	Inadequately detailed. Mean time required to administer was not reported, but range was 5–20 minutes.
MPRC (Cassady et al., 1997)	Inadequately undertaken as factor analysis was not used to confirm scale structure before assessing internal consistency	Assessed in inadequate sample size	Inadequately undertaken as neither minimal important change nor smallest detectable change reported	Inadequately detailed. Reported as possibly requiring specialised training to administer.
NEPSAS (Fagan-Pryor and May, 2000)	Not reported	Assessed in inadequate sample size	Not reported	Not reported.
PHHARS (Sachdev, 1994)	Inadequately undertaken as Cronbach’s alpha was assessed for total scale score rather than subscales identified by factor analysis	Sufficient sample size but Pearson’s r, rather than ICC, used to establish reliability	Not reported	Inadequately detailed. Reported as requiring at least 12 minutes to administer.
SMARTS (Haddad et al., 2014b)	Not reported	Not reported	Not reported	Inadequately detailed. Described as brief and understandable.
YESS (Mazure et al., 1995)	Not reported	Assessed in inadequate sample size	Not reported	Not reported.

AIMS: Abnormal Involuntary Movement Scale; ARS: (Barnes) Akathisia Rating Scale; ANNSERS: Antipsychotic Non-Neurological Side-Effects Rating Scale; ASC-SR: Approaches to Schizophrenia Communication–Self-Report; ASC-C: Approaches to Schizophrenia Communication–Clinic; ASES: Arizona Sexual Experience Scale; ESRS: Extrapyramidal Symptom Rating Scale; ESES: Extrapyramidal Side Effects Scale; GASS: Glasgow Antipsychotic Side-effect Scale; HAS: Hillside Akathisia Scale; LUNSERS: Liverpool University Neuroleptic Side Effect Rating Scale; MPRC: Maryland Psychiatric Research Center Scale; NEPSAS: Nursing EPS Assessment Scale; PHHARS: Prince Henry Hospital Akathisia Rating Scale; SMARTS: Systematic Monitoring of Adverse events Related to TreatmentS; YESS: Yale Extrapyramidal Symptom Scale; ICC: intra-class correlation.

Content validity

Mental health consumers and experts were involved in item generation for only one (GASS) of the tools, and hence only this tool received a positive rating for content validity.

Construct validity

Construct validity was evaluated for eight tools (ANNSERS, ARS, ESRS, GASS, LUNSERS, MPRC, PHHARS, YESS), and in all cases the correlation coefficients were adequate and confirmed the predefined hypotheses. Hence, these tools received a positive rating for construct validity.

Internal consistency

Of the 12 tools that contained scales, internal consistency was only assessed for five tools (ASES, HAS, LUNSERS, MPRC, PHHARS), but factor analysis was not used to establish the scales’ dimensions for four of those tools (ASES, HAS, LUNSERS, MPRC). In the case of the tool (PHHARS) that factor analysis was used to establish the scales’ dimensions, Cronbach’s alpha was not calculated for the identified subscales. No tool received a positive rating for internal consistency.

Test–retest/inter-rater reliability

In considering the studies that assessed test–retest or inter-rater reliability, an adequate sample size was only used to assess three tools (ESRS, GASS, PHHARS). Of these tools, an appropriate statistical approach to establish reliability was only used for one tool (GASS), which therefore was the only tool that received a positive reliability rating.

Responsiveness

The assessment of responsiveness was not adequately undertaken for any tool included in this review.

Respondent/administrative burden

The reporting of administrative or respondent burden was not adequately detailed for any tool included in this review.

Discussion

The overall quality of the psychometric properties of the antipsychotic medication side effect assessment tools included in this systematic review was very modest. Foremost among the issues that need to be addressed in subsequent validation studies is a re-evaluation of the tools’ content validity. Only one of the tools (GASS) reviewed here incorporated the views of mental health consumers in generating the tool’s items. However, both the target population and clinicians’ views need to be elicited in deriving the items to ensure that the content reflects all pertinent aspects of the constructs captured by the tool (Reeve et al., 2013; Terwee et al., 2007). The reassessment of the tools’ content validity should be prioritised because evaluations of construct validity, reliability and responsiveness are immaterial until content validity has been acceptably established (Evans et al., 2004; Terwee et al., 2007).

Once content validity has been clearly established, some form of statistical approach should be used to determine whether the items tap one scale or form more than one scale, and if some items may be redundant (Aaronson et al., 2002). Acceptable statistical approaches to examine dimensionality include factor analysis, principal components analysis and Rasch analysis (Terwee et al., 2007). After such approaches have confirmed the tool’s dimensions, Cronbach’s alpha can be derived to establish whether the correlation between items is high enough to justify summation of the item scores (Terwee et al., 2007). For the tools included in this review, a statistical approach examining dimensionality was only used in five cases (ASES, ESRS, ESES, MPRC, PHHARS), and even then the items were appropriately grouped into the identified scales for only one tool (MPRC). This finding indicates that additional studies are required to establish the structure of most of the tools included in this review.

Reliability was assessed for almost all tools included in this review, but only in one case (GASS) was the evaluation of reliability consistent with recommended guidelines (Barten et al., 2012; Reeve et al., 2013; Terwee et al., 2007). The most common issue, found in 11 studies examining the tools’ reliability, was the use of an inadequately sized sample. When reliability studies comprise an insufficient number of participants, it results in an unacceptable level of imprecision in reliability estimates (Terwee et al., 2007). Another common issue was the use of Pearson correlation coefficients to establish reliability, which does not adjust for systematic differences between raters leading to imprecise reliability estimations (Mokkink et al., 2010; Reeve et al., 2013). Finally, in several studies, reliability coefficients were calculated for individual items, and in some cases not all items, rather than overall scale scores. In summary, our findings demonstrate that reliability should be reassessed for all tools included in this review apart from the GASS.

Responsiveness was only assessed for one of the tools included in this review, despite the developers of many of the tools claiming that the tools would be useful in monitoring changes in antipsychotic medication side effects. Reliability was commonly evaluated for the tools included in this review, and it may be the case that the tools’ developers, in establishing reliability, thought that it also provides support for the tools’ ability to track change over time. However, reliability relates to the extent to which a tool’s scores remain consistent when the tool is administered over a period in which the construct (e.g. akathisia or dyskinesia) under evaluation has not changed (Barten et al., 2012; Terwee et al., 2007). Alternatively, responsiveness establishes a tool’s ability to identify change in the construct under evaluation over time (Barten et al., 2012; Terwee et al., 2007). These psychometric properties, responsiveness and reliability, differ distinctly and require different statistical approaches (Barten et al., 2012).

Limitations

Our search strategy identified a substantial number of antipsychotic medication side effect assessment tools. However, the sensitivity and specificity of search strategies for the identification of side effect assessment tools have not been formally evaluated, and hence, it seems likely that not all relevant tools were located. Also, screening of titles and abstracts was undertaken by only one author, which increases the likelihood of overlooking relevant studies. Finally, the critical appraisal criteria used in this study were based on the Medical Outcome Trust (Aaronson et al., 2002) and Terwee et al.’s (2007) recommendations, but as they note these criteria are generally derived from ‘useful rules of thumb’ and no empirical evidence exits to support their use.

Conclusion

In general, the psychometric properties of the antipsychotic medication side effect tools included in this review were deficient in almost every regard. None of the tools included in this review received a positive rating for all of the appraised psychometric properties. Indeed, only one of the tools received more than one positive rating, which indicates that further studies are required to consolidate the psychometric quality of the tools included in this study. Given this context, and the need to develop evidence-based health care for mental health consumers, the findings of this systematic review may be of interest to policy makers and practitioners and to the consumers they serve.

Footnotes

Declaration of interest

The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship and/or publication of this article.

References

Aaronson

Alonso

Burnam

. (2002) Assessing health status and quality-of-life instruments: Attributes and review criteria. Quality of Life Research 11: 193–205.

Ascher-Svanum

Zhu

Faries

. (2008) Adherence and persistence to typical and atypical antipsychotics in the naturalistic treatment of patients with schizophrenia. Patient Prefer Adherence 2: 67–77.

Barnes

(1989) A rating scale for drug-induced akathisia. The British Journal of Psychiatry 154: 672–676.

Barten

Pisters

Huisman

. (2012) Measurement properties of patient-specific instruments measuring physical function. Journal of Clinical Epidemiology 65: 590–601.

Bodfish

Newell

Sprague

. (1997) Akathisia in adults with mental retardation: Development of the Akathisia Ratings of Movement Scale (ARMS). American Journal of Mental Retardation 101: 413–423.

Byerly

Nakonezny

Fisher

. (2006) An empirical evaluation of the Arizona sexual experience scale and a simple one-item screening test for assessing antipsychotic-related sexual dysfunction in outpatients with schizophrenia and schizoaffective disorder. Schizophrenia Research 81: 311–316.

Cabeza

Amador

Lopez

. (2000) Subjective response to antipsychotics in schizophrenic patients: Clinical implications and related factors. Schizophrenia Research 41: 349–355.

Cassady

Thaker

Summerfelt

. (1997) The Maryland Psychiatric Research Center scale and the characterization of involuntary movements. Psychiatry Research 70: 21–37.

Chapman

Horne

(2013) Medication nonadherence and psychiatry. Current Opinion in Psychiatry 26: 446–452.

10.

Chouinard

Ross-Chouinard

Annable

. (1980) Extrapyramidal symptom rating scale. Canadian Journal of Neurological Sciences 7: 233–239.

11.

Dassori

Miller

Weiden

(2003) The approaches to schizophrenia communication (ASC) tool: Including the patient perspective in treatment. Disease Management and Health Outcomes 11: 699–708.

12.

Day

Wood

Dewey

. (1995) A self-rating scale for measuring neuroleptic side-effects. Validation in a group of schizophrenic patients. The British Journal of Psychiatry 166: 650–653.

13.

De Haan

Weisfelt

Dingemans

. (2002) Psychometric properties of the subjective well-being under neuroleptics scale and the subjective deficit syndrome scale. Psychopharmacology 162: 24–28.

14.

De Leeuw

Van Meijel

Grypdonck

. (2012) The quality of the working alliance between chronic psychiatric patients and their case managers: Process and outcomes. Journal of Psychiatric and Mental Health Nursing 19: 1–7.

15.

Dott

Weiden

Hopwood

. (2001) An innovative approach to clinical communication in schizophrenia: The approaches to schizophrenia communication checklists. CNS Spectrums 6: 333–338.

16.

Evans

Elwyn

Edwards

(2004) Review of instruments for peer assessment of physicians. British Medical Journal 328: 1240.

17.

Fagan-Pryor

May

(2000) Establishment of interrater reliability for a nursing extrapyramidal side effects (EPS) assessment scale. Journal of Nursing Care Quality 14: 54.

18.

Fleischhacker

Bergmann

Perovich

. (1989) The hillside akathisia scale: A new rating instrument for neuroleptic-induced akathisia. Psychopharmacology Bulletin 25: 222–226.

19.

Gaebel

Riesbeck

Von Wilmsdorff

. (2010) Drug attitude as predictor for effectiveness in first-episode schizophrenia: Results of an open randomized trial (EUFEST). European Neuropsychopharmacology 20: 310–316.

20.

Gerlach

Larsen

(1999) Subjective experience and mental side-effects of antipsychotic treatment. Acta Psychiatrica Scandinavica – Supplementum 395, 113–117.

21.

Gerlach

Korsgaard

Clemmesen

. (1993) The St. Hans Rating Scale for extrapyramidal syndromes: Reliability and validity. Acta Psychiatrica Scandinavica 87: 244–252.

22.

Goff

Hill

Freudenreich

(2010) Strategies for improving treatment adherence in schizophrenia and schizoaffective disorder. The Journal of Clinical Psychiatry 71(Suppl. 2): 20–26.

23.

Guy

(1976) Abnormal Involuntary Movement Scale. Rockville, MD: National Institute of Mental Health, U.S. Department of Health and Human Services.

24.

Haddad

Brain

Scott

(2014a) Nonadherence with antipsychotic medication in schizophrenia: Challenges and management strategies. Patient Related Outcome Measure 5: 43–62.

25.

Haddad

Fleischhacker

Peuskens

. (2014b) SMARTS (Systematic Monitoring of Adverse events Related to TreatmentS): The development of a pragmatic patient-completed checklist to assess antipsychotic drug side effects. Therapeutic Advances in Psychopharmacology 4: 15–21.

26.

Happell

Manias

Roper

(2004) Wanting to be heard: Mental health consumers’ experiences of information about medication. International Journal of Mental Health Nursing 13: 242–248.

27.

Hellewell

JSE

(1999) Do we know what matters to our patients. Clear perspectives: Manage issues Schizophrenia Bulletin 2: 1–4.

28.

Hogan

Awad

Eastwood

(1983) A self-report scale predictive of drug compliance in schizophrenics: Reliability and discriminative validity. Psychological Medicine 13: 177–183.

29.

Hungerford

Fox

(2014) Consumer’s perceptions of recovery-oriented mental health services: An Australian case-study analysis. Nursing and Health Sciences 16: 209–215.

30.

Kalachnik

Sprague

(1993) The Dyskinesia Identification System Condensed User Scale (DISCUS): Reliability, validity, and a total score cut-off for mentally ill and mentally retarded populations. Journal of Clinical Psychology 49: 177–189.

31.

Kaneda

(2009) Assessing weight-related quality of life in persons with schizophrenia. International Medical Journal 16: 107–111.

32.

Kikuchi

Iwamoto

Sasada

. (2011) Reliability and validity of a new sexual function questionnaire (Nagoya Sexual Function Questionnaire) for schizophrenic patients taking antipsychotics. Human Psychopharmacology 26: 300–306.

33.

Kim

Jung

Kang

. (2002) Metric characteristics of the drug-induced extrapyramidal symptoms scale (DIEPSS): A practical combined rating scale for drug-induced movement disorders. Movement Disorders 17: 1354–1359.

34.

Lako

Bruggeman

Liemburg

. (2013) A brief version of the subjects’ response to antipsychotics questionnaire to evaluate treatment effects. Schizophrenia Research 147: 175–180.

35.

Leucht

Komossa

Rummel-Kluge

. (2009) A meta-analysis of head-to-head comparisons of second-generation antipsychotics in the treatment of schizophrenia. American Journal of Psychiatry 166: 152–163.

36.

Lieberman

Stroup

McEvoy

. (2005) Effectiveness of antipsychotic drugs in patients with chronic schizophrenia. New England Journal of Medicine 353: 1209–1223.

37.

Lindstrom

Jedenius

Levander

(2009) A symptom self-rating scale for schizophrenia (4S): Psychometric properties, reliability and validity. Nordic Journal of Psychiatry 63(Suppl. 1–4): 368–374.

38.

Lingjaerde

Ahlfors

Bech

. (1987) The UKU side effect rating scale. A new comprehensive rating scale for psychotropic drugs and a cross-sectional study of side effects in neuroleptic-treated patients. Acta Psychiatrica Scandinavica – Supplementum 334, 1–100.

39.

Llorca

(2008) Partial compliance in schizophrenia and the impact on patient outcomes. Psychiatry Research 161: 235–247.

40.

Loonen

Doorschot

Van Hemert

. (2000) The schedule for the assessment of drug-induced movement disorders (SADIMoD): Test-retest reliability and concurrent validity. International Journal of Neuropsychopharmacology 3: 285–296.

41.

Matson

Mayville

Bielecki

. (1998) Reliability of the Matson Evaluation of Drug Side Effects Scale (MEDS). Research in Developmental Disabilities 19: 501–506.

42.

Mazure

Cellar

Bowers

Jr . (1995) Assessment of extrapyramidal symptoms during acute neuroleptic treatment. Journal of Clinical Psychiatry 56: 94–100.

43.

Mojtabai

Corey-Lisle

. (2012) The patient assessment questionnaire: Initial validation of a measure of treatment effectiveness for patients with schizophrenia and schizoaffective disorder. Psychiatry Research 200: 857–866.

44.

Mokkink Lidwine

Terwee Caroline

Patrick Donald

. (2010) The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. Journal of Clinical Epidemiology 63: 737–745.

45.

Morrison

Meehan

Stomski

(2015a) Australian case managers’ perceptions of mental health consumers use of antipsychotic medications and associated side-effects. International Journal of Mental Health Nursing 24: 104–111.

46.

Morrison

Meehan

Stomski

(2015b) Australian case managers’ views about the impact of antipsychotic medication on mental health consumers. International Journal of Mental Health Nursing. Epub ahead of print 28 January. DOI: 10.1111/inm.12154.

47.

Morrison

Meehan

Stomski

(2015c) Living with antipsychotic medication side-effects: The experience of Australian mental health consumers. International Journal of Mental Health Nursing 24: 253–261.

48.

Morrison

Meehan

Gaskill

. (2000) Enhancing case managers’ skills in the assessment and management of antipsychotic medication side-effects. Australian and New Zealand Journal of Psychiatry 34: 814–821.

49.

Naber

(1995) A self-rating to measure subjective effects of neuroleptic drugs, relationships to objective psychopathology, quality of life, compliance and other clinical variables. International Clinical Psychopharmacology 10 (Suppl. 3): 133–138.

50.

Naber

(2008) Subjective effects of antipsychotic drugs and their relevance for compliance and remission. Epidemiologia e Psichiatria Sociale 17: 174–176.

51.

Naber

Karow

(2001) Good tolerability equals good results: The patient’s perspective. European Neuropsychopharmacology 11 (Suppl. 4): S391–S396.

52.

Nielsen

Lindstrom

Nielsen

. (2012) DAI–10 is as good as DAI–30 in schizophrenia. European Neuropsychopharmacology 22: 747–750.

53.

Ohlsen

Williamson

Yusufi

. (2008) Interrater reliability of the antipsychotic non-neurological side-effects rating scale measured in patients treated with clozapine. Journal of Psychopharmacology 22: 323–329.

54.

Prieto

Sacristan

Gomez

(2004) The validity and reliability of the global index of safety (GIS). Current Medical Research and Opinion 20: 1825–1832.

55.

Reeve

Wyrwich

. (2013) ISOQOL recommends minimum standards for patient-reported outcome measures used in patient-centered outcomes and comparative effectiveness research. Quality of Life Research 22: 1889–1905.

56.

Roe

Goldblatt

(2009) Why and how people decide to stop taking prescribed psychiatric medication: Exploring the subjective process of choice. Psychiatric Rehabilitation Journal 33: 38–46.

57.

Sachdev

(1994) A rating scale for acute drug-induced akathisia: Development, reliability, and validity. Biological Psychiatry 35: 263–271.

58.

Salomon

Hamilton

(2013) ‘All roads lead to medication?’ Qualitative responses from an Australian first-person survey of antipsychotic discontinuation. Psychiatric Rehabilitation Journal 36: 160–165.

59.

Seale

Chaplin

Lelliott

. (2007) Antipsychotic medication, sedation and mental clouding: An observational study of psychiatric consultations. Social Science & Medicine 65: 698–711.

60.

Simpson

Angus

(1970) A rating scale for extrapyramidal side effects. Acta Psychiatrica Scandinavica – Supplementum 212: 11–19.

61.

Stomski

Mackintosh

Stanley

(2010) Patient self-report measures of chronic pain consultation measures: A systematic review. The Clinical Journal of Pain 26: 235–243.

62.

Terwee

Bot

De Boer

. (2007) Quality criteria were proposed for measurement properties of health status questionnaires. Journal of Clinical Epidemiology 60: 34–42.

63.

Waddell

Taylor

(2008) A new self-rating scale for detecting atypical or second-generation antipsychotic side effects. Journal of Psychopharmacology 22: 238–243.

64.

Wolters

Knegtering

Wiersma

. (2006) Evaluation of the subjects’ response to antipsychotics questionnaire. International Clinical Psychopharmacology 21: 63–69.

65.

Yen

Lee

Tang

. (2009) Predictive value of self-stigma, insight, and perceived adverse effects of medication for the clinical outcomes in patients with depressive disorders. Journal of Nervous and Mental Disease 197: 172–177.