Sage Journals: Discover world-class research

Abstract

Gastrointestinal illnesses cause physical, emotional and social impact on patients. Patient reported outcome measures (PROMs) are increasingly used in clinical decision-making, clinical research and approval of new therapies. In the last decade, there has been a rapid increase in the number of PROMs in gastroenterology and, therefore, the choice between which of these PROMs to use can be difficult. Not all PROM instruments currently used in research and clinical practice in gastroenterology have gone through a rigorous development methodology. New drugs and therapies will not have access to the market if the PROMs used in their clinical trials are not validated according to the guidelines of the international agencies. Therefore, it is important to know the required properties of PROMs when choosing or evaluating a drug or a clinical intervention. This paper reviews the current literature on how to assess the validity and reliability of PROMs. It summarises the required properties into a practical guide for gastroenterologists to use in assessing an instrument for use in clinical practice or research.

Keywords

Patient reported outcome measures quality of life

Introduction

More than a century ago, the first health outcome measure was proposed by Florence Nightingale by classifying patients into relieved, unrelieved and dead.¹ Other guides such as mortality rates have historically been used to measure health outcomes in the population.² However, the definition of health has changed in the past century to include a wider view of outcomes which includes freedom from disease, ability to perform daily activities, happiness, social and emotional well being, and quality of life. The World Health Organisation (WHO) has defined health as ‘physical, mental, and social well-being, and not merely the absence of disease and infirmity' (p. 100).³ As a result numerous measures have been developed in an attempt to quantify health. Health outcome measures are tools used to evaluate an individuals’ health using different health related parameters. Patient reported outcome measures (PROMs) are instruments that are completed by patients and capture one or more aspects of health.^4,5 The use of PROMs to monitor surgical outcomes formally in England has been an important development.⁶ Since 2007, the Department of Health in England has required the routine measurement of patient- reported health outcomes for all National Health Service (NHS) patients via its PROMs programme.^5,6 PROMs are increasingly used in decision-making to encourage a patient-centred approach.^5,7 For this reason, PROMs that are chosen and used in practice must be valid, reliable and clinically useful measures.

There are over 100 PROMs applicable to gastroenterological disorders which are described in the gastrointestinal PROMs (GI-PRO) database.⁴ The classical physician-based health outcome measures are now increasingly being replaced by PROMs which enable the assessment and monitoring of disease or treatment effects from the patients’ own perception, rather than the health professionals’ judgments (which does not always reflects patients’ views).^8,9

A doctor’s ability to interpret and apply PROMs in clinical practice has great potential to contribute to a better understanding of the patients' well-being. There is an increasing use of PROMS in both research and clinical work in gastroenterology. It is important differentiate between PROMS for functional disorders and those for organic disorders. PROMs should be used in identification of the symptomatic profiles, diagnosis and treatment of functional disorders such as post prandial distress syndrome, epigastric pain syndrome, chronic idiopathic nausea, excessive belching, IBS and other functional diseases.¹⁰ The lack of objective measurable markers of symptoms improvement, such as stool frequency and rectal bleeding, means the evaluation of treatment response has to be based on the patients’ reporting of symptoms. The Rome III criteria^11,12 are very useful to assess the outcomes of new treatments for functional gastrointestinal disorders.

Nowadays, PROMs are categorised as generic or disease-specific.^13,14 Generic PROMs are applicable to any disease and are useful for comparison or economic studies between different conditions. Specific PROMs, on the other hand, are specific for one condition. Both types of measures, generic and specific, are now seen as complementary rather than conflicting when appraising patient outcomes. Some of the commonly used generic tools are the European Quality of life - five dimensions (EQ-5D),¹⁵ the Short Form 36 (SF-36) questionnaire,¹⁶ the Cleveland Quality of life,^17,18 and the Short Form 12 (SF-12).¹⁹ Examples of the disease specific PROMs are the Inflammatory Bowel Disease Questionnaire (IBDQ)²⁰ and the rating form of IBD patient concerns.²¹ A good review of PROMs that have been used to evaluate the efficacy of therapeutic agents in functional dyspepsia trials was done by Ang et al.²² Mouzas and Pallis reported a good review of the PROMs used in inflammatory bowel disease.²³ The patient-reported outcome and quality of life measures database provides a comprehensive list of the available PROM questionnaires.²⁴

The increasing number of PROMs in the recent years requires gastroenterologists to decide which PROM to use and how to assess each measure. Several studies have suggested using standards to assess properties such as validity, reliability and responsiveness. Examples of these standards have been presented by Terwee et al.,²⁵ scientific advisory committee of the Medical Outcome Trust,²⁶ Evaluating the Measurement of Patient-Reported Outcomes (EMPRO) tool,²⁷ Bombardier and Tugwell,²⁸ Andresen,²⁹ Steiner,³⁰ DeVon et al.,³¹ McDowell and Jenkinson,³² the Food and Drug Administration (FDA)^33,34 and the European Medicines Agency³⁵ guidelines. The FDA guidance in 2006^33,34 describes how to evaluate PROMs used as effectiveness endpoints in clinical trials. These publications describe the required criteria for a successful PROM and are written mainly for health outcome specialists and methodologists who are involved in developing health outcome measures for clinical trials or a evaluating new medical technology. None of these publications individually summarises these standards into one relatively brief yet fairly comprehensive practical checklist for doctors to use in their day-to-day clinical practice. Before using health outcome measures for research or in clinical practice, it is essential to ensure that they are appropriate to the context, perform well and possess the required characteristics. In this article we describe how to assess these requirements: the concept of item generation, validity, reliability, responsiveness, utility and cross-cultural adaptation; and how to evaluate these measures in a way that is easy to follow and applicable in clinical practice.

The quality properties of PROMs

There are five main components for good quality PROMs: item selection, validity, reliability, responsiveness and interpretability. With the increased number of multinational and multicultural clinical research studies, certain criteria regarding cultural, educational and social adaptation of the PROMs are needed to use the questionnaires in a different language or country.

Item selection

Items can be derived from three main sources:^13,36,37

Research: reviewing old PROMs is the most commonly used approach in finding items. There are several reasons why old measures can be used to derive the new PROMs items; it saves a lot of time and effort, there are possibly a limited number of questions to ask about a specific problem such as abdominal pain, vomiting, etc. and old measures have been repeatedly used and tested in many studies and trials.

Patients: by asking the patients to identify items and domains to be included in the scale. Patients can be excellent sources of health outcome measure items. Some techniques like focus groups and key informant interviews have been used to collect patients' viewpoints in a systematic manner.¹³ This method of item generation has been useful in constructing a quality of life measure for example the IBDQ,²⁰ the rating form of IBD patient concerns²¹ and functional dyspepsia.³⁸

Clinical observations: items are derived by clinicians based on their experience.

However, the FDA statement^33,34 considers that inclusion of patients in developing a PROM questionnaire is the most important source of items. It stresses that item generation should include a wide range of patients to represent variations in severity and in population characteristics such as age, sex and educational level. It is important to assess the respondents and administrator burden when choosing items. Items that cause undue physical, emotional or cognitive strain on patients generally decrease the quality and completeness of PROM data. The language in a PROM should be clear and not technical. Items should not offend or discriminate against people for example when assessing the emotional aspect of quality of life. Therefore, items should be tested on small group of patients for a preliminary or a pilot testing to make sure that they are understandable and not ambiguous.³⁹ This pilot testing can include any number of patients. The FDA guidance mention that the number of patients in the pilot testing is not as critical as the cognitive interview quality and patient diversity included in the sample. Pilot testing of items is commonly used in developing quality of life questionnaires such as the IBDQ and the UK-IBDQ.^20,40

Once the pool of items has been created, a number of statistical techniques can be used in order to select the most relevant items:

Frequency of endorsement: The frequency of endorsement (also called response rate) examines the proportion of people who select the same item response. Only items with endorsement rate between 0.2 – 0.8 (or 20%–80%) are chosen.^13,41 Items with lower or higher rates are considered redundant because they will add little value to the index. Items with high response rates more than 80% (i.e. more than 80% of patients chose the same answer) are considered for removal because they cannot be used to distinguish between patients. If the same answer was chosen by less than 20% of patients, then it is possibly not related to the condition and can be removed.

Item-total correlation: The item-total correlation is the statistical correlation of each item with the total PROM score. The accepted range is between 0.2–0.8.¹³ A value below 0.2 indicates that the item is not relevant. A value of more than 0.8 indicates that the item is redundant and does not add a value to the total scale.

Internal consistency: Internal consistency is the statistical correlation or the homogeneity between the items in the measure.^20,25 The internal consistency is commonly measured by calculating Cronbach α statistic.^13,36,42 The acceptable value of Cronbach α is between 0.7–0.9.^13,25 Higher values more than 0.9 may indicate an overlap between items.

Reliability

Reliability is the consistency between the score of a health outcome measure applied in different circumstances. The principle of reliability is that applying the PROM in different occasions or by different observers produces similar results.^13,43 Statisticians suggest that a reliability of 0.75 should be the minimum requirement for a useful health outcome measure.¹³ Common reliability statistics are the intraclass correlation co-efficient (ICC) and the Pearson correlation co-efficient (r). They are expressed as a numbers between –1 and 1 with 0 indicating no reliability; 1 is perfect reliability between the two set of tests and a negative number indicates that the two sets of tests change inversely.

Common types of reliability testing are

Inter-observer reliability is used to assess the degree of consistency between different observers assessing the same patients.

Test-retest reliability is used to assess reliability of the PROM when applied on two separate occasions. To estimate test-retest reliability, the measure should be administered to the same group of patients on two separate occasions between which there has been no overall change in the clinical condition of the patients. Typically a period of 14 days is acceptable.¹³

Responsiveness

Responsiveness is the ability of the PROM tool to detect a change in patients’ clinical condition. This is estimated by applying the health outcome measure to a group of patients whose clinical condition has changed.^13,44 There are several statistics for responsiveness but the commonest is the responsiveness ratio, which is calculated by dividing the mean change in scores for patients who reported a change by the standard deviation of the scores of stable patients.^13,44 Other responsiveness indicators mentioned in literature are effect size (ES)⁴⁵ and standardised response mean (SRM).⁴⁶ The acceptable value for responsiveness ratio is 0.5 or 50%.^44,47,48

Validity

Validity is the ability of the test to measure what is intended to measure. Validity can be broadly divided into three types (referred to as the 3 Cs):¹³ content validity, construct validity and criterion validity.

Content validity: checks if the measure as a whole covers all the relevant and important aspects of the disease. Experts in the field usually judge content validity to ensure an item appropriately measures the desired health outcome.

Construct validity: is used when there is no ‘gold' standard measure with which a new PROM can be compared.¹³ A combination of laboratory tests, other health outcome measures, or clinical observations might be necessary to provide the data that support the construct validation of the PROM.^25,49 The common statistic to assess construct validity is by calculating a correlation coefficient. An appropriate correlation coefficient for construct validity should be somewhere between 0.4–0.8.^13,47

Criterion validity: measures the correlation of the new measure with a ‘gold standard' measure, which exhibit the same characteristics. When the correlation is explored at the same time then it is described as ‘concurrent validation'. This is often used when an existing measure is potentially to be replaced by a shorter, cheaper or less invasive measure.¹³ In this case, we would expect a very high correlation co-efficient (≥0.8). When the new measure is compared with a criterion that is measured later, this type of validation is called ‘predictive validation'. This type of validation is often used with measures that predict future events like response to treatment or mortality.

Interpretability

Interpretability means assigning qualitative meaning to the health outcome measure scores.^25,50–52 To aid using PROM in clinical practice, doctors should be able to translate the PROM score to clinical meaning by knowing the minimal important change (MIC). The MIC is defined as ‘the smallest difference in score in the domain of interest which patients perceive as beneficial and would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient’s management’ (p. 407).⁵³ Additional useful information is derived from the ‘floor and ceiling effects'. The ceiling effect is a term used to describe the effect when the majority of the patient scores are close to or at the top of the measure. The floor effect is a term used when the majority of patient scores are close to or at the bottom of the measure.²⁵ A measure is said to have a floor or ceiling effect when more than 15% of patients score the lowest or highest possible score, respectively.²⁵ If floor or ceiling effects are present, it is difficult to accurately assess the health outcomes of patients who score at the extreme ends of the PROM. Results from those groups of patients should be interpreted with caution.

Cross-cultural adaptation

Cross-cultural adaptation is the process that deals with language and cultural adaptation issues when preparing the PROMs for use in another country.^54,55 Items should not only be translated linguistically, but also must adapt culturally to the target country culture. For example a question about difficulty in using a fork in eating may not be applicable in a country where a fork is not used in eating. The cross-cultural adaptation involves forward and backward translation of the questions or items, review of the results by linguistics, methodologists, statisticians and health care professionals, and pretesting the PROM in a small group of patients to check the clarity of the PROM in the new setting and its consistency with the original PROM version.⁵⁴ The final step involves psychometric validation of the new PROM in the target population to check validity, reliability and responsiveness.^13,55 In some ethnic minorities or special groups of patients, there is a need for specific cultural and language educational materials such as the use of pictogram, smileys or other picture-based representations.^56,57 A good example of a picture-based PROM is the gastro-oesophageal reflux disease analyser (GERDyser) questionnaire which comprises 10 dimensions, each is illustrated by pictogram drawings.⁵⁸ In fact a recent study published by Tack et al. showed that the use of pictograms with verbal descriptors significantly improves the reliability of PROMs by around 30% by avoiding potential bias by patients.⁵⁹

Checklist to evaluate PROMs

The five important aspects when evaluating the PROMS are item generation, reliability, validity, responsiveness and interpretability. Each aspect includes one or more psychometric tests. Cross-cultural adaptation is only needed when using the PROM in another language or country. We have produced a simple checklist (Table 1) to evaluate the PROMs without using excessive statistical technical terms.

Table 1.

Checklist for evaluating the patient reported outcome measure (PROM) questionnaires

1	Item generation	Were items assessed for: Clarity and lack of ambiguity Frequency of endorsement (desired value 20–80%) Internal consistency (desired Cronbach Alpha 0.7–0.9) Item total correlation (desired value 0.2–0.8)
2	Reliability	Was the PROM checked for test-retest reliability Was the PROM checked for inter-observer reliability Was the reliability coefficient acceptable (more than 0.75)
3	Responsiveness	Was the PROM assessed for responsiveness using an appropriate measure (such as responsiveness ratio, effect size, or standardized response mean) Was the value acceptable (e.g. responsiveness ratio more than 0.5)
4	Validity	Was the validity of the PROM assessed using the appropriate method (construct validity and/or criterion validity) Was the correlation coefficient ≥ the required value of 0.4 for construct validity and 0.8 for criterion validity
5	Interpretability	Can the PROM results be interpreted easily for clinical practice? Were the values for MIC, SEM, SDC and floor and ceiling effects reported properly?
6	Cross cultural adaptation (only if the PROM is used in a different language/culture)	Did the PROM go through a proper cross cultural validation process?

MIC: minimal important change; SEM: standard error of measurement; SDC: smallest detectable change.

Conclusions

This paper intends to provide a practical overview of the main components for a good quality PROM; it does not intend to provide a detailed description of each component. Readers who require more detailed explanations are encouraged to refer the references cited in the paper. The FDA³³ and the European Medicines Agency³⁵ guidelines provide further recommendations on the proper development and validation of PROMs especially for clinical trials.

A good example of a well-validated PROM is the IBDQ. The IBDQ was developed in 1985 as a quantitative, disease-specific Health related quality of life (HRQoL) measure in patients with inflammatory bowel disease (IBD). A number of patients with IBD and health professionals were asked to list all problems they had observed or experienced as a direct result of IBD. This process resulted in a total of 150 items. All these items were then administered to another group of patients with IBD to rate each problem on a five-point Likert scale, ranging from a low score (score 1) indicating no problem to a high score (score 5) indicating a severe problem. A final list of 32 items was derived and reviewed by experienced clinicians, the items were grouped into four groups: gastrointestinal symptoms, systemic symptoms, emotional dysfunction and social dysfunction.⁶⁰ The final version of IBDQ had good reproducibility (ICC was 0.7) and responsiveness (by calculating the responsiveness ration on patients who reported change in a seven-point assessment of their condition). The IBDQ had good construct validity when correlated with disease specific and generic PROMs.^61–65 The clinically important change in score was observed to be a decrease of between 16–30 points.⁶⁶ IBDQ was validated into different languages versions and have further proved its validity, internal consistency and reliability in several validation studies worldwide.^{64,65,67–77}

As new therapies in gastroenterology are rapidly emerging, PROMs are increasingly used in clinical decision-making. There is a need to support and educate the gastroenterologist on how to assess these tools to encourage them to use them in clinical practice.

Every PROM tool should have five important properties: items should be selected from a reliable source and should be clear to patients; the PROM must reliably yield consistent measurements; the PROM must measure what is intended; the PROM should change with the change in patients’ condition (i.e. be ‘responsive'); and the PROM should be easily transferred to clinical meaningful values, showing ‘interpretability'.

Footnotes

Author contribution

LA contributed in writing the manuscript, reviewing the literature. HAH and JGW contributed in reviewing the literature and examining the content of the manuscript. All authors approved the final version of the manuscript.

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Conflict of interest

None declared.

References

Nightingale

. Notes on hospitals, 1st edn. London, West Strand: John W Parker and son, 1859, 1986.

McDowell

. Measuring health: A guide to rating scales and questionnaires, 3rd ed. New York: Oxford University Press, 2006, 2006.

World Health Organisation. Preamble to the constitution of the World Health Organisation. New York, USA, 1948.

Khanna

Agarwal

Khanna

. Development of an online library of patient-reported outcome measures in gastroenterology: The GI-PRO Database. Am J Gastroenterol 2014; 109: 234–248.

Devlin

Appleby

. Getting the most out of PROMS: Putting health outcomes at the heart of NHS decision-making, London: The Kings Fund, 2010, 2010.

Department of Health. Guidance on the routine collection of patient reported outcome measures (PROMs). London: Department of Health, 2009.

Lipscomb

Gotay

Snyder

. Patient-reported outcomes in cancer: A review of recent research and policy initiatives. CA Cancer J Clin 2007; 57: 278–300.

Soreide

. Using patient-reported outcome measures for improved decision-making in patients with gastrointestinal cancer - the last clinical frontier in surgical oncology? Front Oncol 2013; 3: 157–157.

Montazeri

. Quality of life data as prognostic indicators of survival in cancer patients: An overview of the literature from 1982 to 2008. Health Qual Life Outcomes 2009; 7: 102–102.

10.

Vandenberghe

Mion

Allescher

. Functional dyspepsia: Still a serious challenge for medical practitioners and new drug investigators? A Belgian, French, German and Hungarian opinion. Acta Gastroenterol Belg 2010; 73: 360–365.

11.

Irvine

Whitehead

Chey

. Design of treatment trials for functional gastrointestinal disorders. Gastroenterology 2006; 130: 1538–1551.

12.

Drossman

Dumitrascu

. Rome III: New standard for functional gastrointestinal disorders. J Gastrointestin Liver Dis 2006; 15: 237–241.

13.

Streiner

Norman

. Health measurement scales: A practical guide to their development and use, Fourth ed.. New York: Oxford University Press, 2008, 2008.

14.

Pallis

Mouzas

. Instruments for quality of life assessment in patients with inflammatory bowel disease. Dig Liver Dis 2000; 32: 682–688.

15.

Brooks R. EuroQol: the current state of play. Health Policy 1996; 37(1): 53–72.

16.

Brazier

Harper

Jones

. Validating the SF-36 health survey questionnaire: New outcome measure for primary care. Br Med J 1992; 305: 160–164.

17.

Fazio

O'Riordain

Lavery

. Long-term functional outcome and quality of life after stapled restorative proctocolectomy. Ann Surg 1999; 230: 575–584. discussion 84–86.

18.

Kiran

Delaney

Senagore

. Prospective assessment of Cleveland Global Quality of Life (CGQL) as a novel marker of quality of life and disease activity in Crohn's disease. Am J Gastroenterol 2003; 98: 1783–1789.

19.

Ware

Jr Kosinski

Keller

. A 12-Item Short-Form Health Survey: Construction of scales and preliminary tests of reliability and validity. Med Care 1996; 34: 220–233.

20.

Guyatt

Mitchell

Irvine

. A new measure of health status for clinical trials in inflammatory bowel disease. Gastroenterology 1989; 96: 804–810.

21.

Drossman

Leserman

. The rating form of IBD patient concerns: A new measure of health status. Psychosom Med 1991; 53: 701–712.

22.

Ang

Talley

Simren

. Review article: Endpoints used in functional dyspepsia drug therapy trials. Aliment Pharmacol Ther 2011; 33: 634–649.

23.

Mouzas

Pallis

. Assessing quality of life in medical trials on patients with inflammatory bowel disease. Ann Gastroenterol 2000; 13: 261–263.

24.

Mapi Research Trust team. Patient-reported outcome and quality of life instruments database (PROQOLID), http://www.proqolid.org (2001, accessed 1 September 2013).

25.

Terwee

Bot

de Boer

. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol 2007; 60: 34–42.

26.

Aaronson

Alonso

Burnam

. Assessing health status and quality-of-life instruments: Attributes and review criteria. Qual Life Res 2002; 11: 193–205.

27.

Valderas

Ferrer

Mendivil

. Development of EMPRO: A tool for the standardized assessment of patient-reported outcome measures. Value Health 2008; 11: 700–708.

28.

Bombardier

Tugwell

. Methodological considerations in functional assessment. J Rheumatol Suppl 1987; 14: S6–S10.

29.

Andresen

. Criteria for assessing the tools of disability outcomes research. Arch Phys Med Rehabil 2000; 81: S15–S20.

30.

Streiner

. A checklist for evaluating the usefulness of rating scales. Can J Psychiatry 1993; 38: 140–148.

31.

DeVon

Block

Moyle-Wright

. A psychometric toolbox for testing validity and reliability. J Nurs Scholarsh 2007; 39: 155–164.

32.

McDowell

Jenkinson

. Development standards for health measures. J Health Serv Res Policy 1996; 1: 238–246.

33.

US Food and Drug Administration (FDA). Guidance for industry. Patient reported outcome measures: Use in medical product development to support labeling claims. US Food and Drug Administration (FDA), Maryland, USA, 2009.

34.

US Department of Health and Human Services. Guidance for industry: Patient-reported outcome measures: Use in medical product development to support labeling claims: Draft guidance. Health Qual Life Outcomes 2006; 4: 79.

35.

Szende

Leidy

Revicki

. Health-related quality of life and other patient-reported outcomes in the European centralized drug regulatory process: A review of guidance documents and performed authorizations of medicinal products 1995 to 2003. Value Health 2005; 8: 534–548.

36.

Keszei

Novak

Streiner

. Introduction to health measurement scales. J Psychosom Res 2010; 68: 319–323.

37.

Marx

Bombardier

Hogg-Johnson

. Clinimetric and psychometric strategies for development of a health measurement scale. J Clin Epidemiol 1999; 52: 105–111.

38.

Carbone

Holvoet

Vandenberghe

. Functional dyspepsia: Outcome of focus groups for the development of a questionnaire for symptom assessment in patients suffering from postprandial distress syndrome (PDS). Neurogastroenterol Motil 2014; 26: 1266–1274.

39.

Paulsen

Pedersen

Overgaard

. Feasibility of 4 patient-reported outcome measures in a registry setting. Acta Orthop 2012; 83: 321–327.

40.

Cheung

Garratt

Russell

. The UK IBDQ – a British version of the inflammatory bowel disease questionnaire. development and validation. J Clin Epidemiol 2000; 53: 297–306.

41.

Kovacs

Abraira

Royuela

. Minimum detectable and minimal clinically important changes for pain in patients with nonspecific neck pain. BMC Musculoskelet Disord 2008; 9: 43–43.

42.

Kuder

Richardson

. The theory of the estimation of test reliability. Psychometrika 1937; 2: 151–160.

43.

Kottner

Audige

Brorson

. Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed. J Clin Epidemiol 2011; 64: 96–106.

44.

Sackett

Chambers

MacPherson

. The development and application of indices of health: General methods and a summary of results. Am J Public Health 1977; 67: 423–428.

45.

Kazis

Anderson

Meenan

. Effect sizes for interpreting changes in health status. Med Care 1989; 27: S178–S189.

46.

Revicki

Hays

Cella

. Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol 2008; 61: 102–109.

47.

Murphy

KaCD

. Psychological testing principles and applications, 4th ed. New Jersey: Prentice-Hall Inc, 1998, 1998.

48.

Kirshner

Guyatt

. A methodological framework for assessing health indices. J Chronic Dis 1985; 38: 27–36.

49.

Domino

. Psychological testing: An introduction, New Jersey: Prentice-Hall Inc, 2000, 2000.

50.

Mokkink

Terwee

Patrick

. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: An international Delphi study. Qual Life Res 2010; 19: 539–549.

51.

Deyo

Inui

. Toward clinical applications of health status measures: Sensitivity of scales to clinically important changes. Health Serv Res 1984; 19: 275–289.

52.

Lohr

Aaronson

Alonso

. Evaluating quality-of-life and health status instruments: Development of scientific review criteria. Clin Ther 1996; 18: 979–992.

53.

Jaeschke

Singer

Guyatt

. Measurement of health status. Ascertaining the minimal clinically important difference. Control Clin Trials 1989; 10: 407–415.

54.

Beaton

Bombardier

Guillemin

. Guidelines for the process of cross-cultural adaptation of self-report measures. Spine 2000; 25: 3186–3191.

55.

Wild

Grove

Martin

. Principles of good practice for the translation and cultural adaptation process for patient-reported outcomes (PRO) measures: Report of the ISPOR Task Force for Translation and Cultural Adaptation. Value Health 2005; 8: 94–104.

56.

Spaccavento

Craca

Del Prete

. Quality of life measurement and outcome in aphasia. Neuropsychiatr Dis Treat 2014; 10: 27–37.

57.

Jones

Mawani

King

. Tackling health literacy: Adaptation of public hypertension educational materials for an Indo-Asian population in Canada. BMC Public Health 2011; 11: 24–24.

58.

Holtmann

Chassany

Devault

. International validation of a health-related quality of life questionnaire in patients with erosive gastro-oesophageal reflux disease. Aliment Pharmacol Ther 2009; 29: 615–625.

59.

Tack

Carbone

Holvoet

. The use of pictograms improves symptom evaluation by patients with functional dyspepsia. Aliment Pharmacol Ther 2014; 40: 523–530.

60.

Irvine

. Development and subsequent refinement of the inflammatory bowel disease questionnaire: A quality-of-life instrument for adult patients with inflammatory bowel disease. J Pediatr Gastroenterol Nutr 1999; 28: S23–S27.

61.

Russel

Pastoor

Brandon

. Validation of the Dutch translation of the Inflammatory Bowel Disease Questionnaire (IBDQ): A health-related quality of life questionnaire in inflammatory bowel disease. Digestion 1997; 58: 282–288.

62.

Kim

Cho

Yoo

. Quality of life in Korean patients with inflammatory bowel diseases: Ulcerative colitis, Crohn's disease and intestinal Behcet's disease. Int J Colorectal Dis 1999; 14: 52–57.

63.

Lopez-Vivancos

Casellas

Badia

. Validation of the spanish version of the inflammatory bowel disease questionnaire on ulcerative colitis and Crohn's disease. Digestion 1999; 60: 274–280.

64.

Pallis

Vlachonikolis

Mouzas

. Quality of life of Greek patients with inflammatory bowel disease. Validation of the Greek translation of the inflammatory bowel disease questionnaire. Digestion 2001; 63: 240–246.

65.

Pallis

Mouzas

Vlachonikolis

. The inflammatory bowel disease questionnaire: A review of its national validation studies. Inflamm Bowel Dis 2004; 10: 261–269.

66.

Irvine

Feagan

Rochon

. Quality of life: A valid and reliable measure of therapeutic efficacy in the treatment of inflammatory bowel disease. Gastroenterology 1994; 106: 287–296.

67.

Muller

Jan Irvine

Gathany

. PGI18 linguistic validation of the inflammatory bowel disease questionnaire (IBDQ) in 35 languages. Value Health 2008; 11: A89–A89.

68.

Verissimo

. Quality of life in inflammatory bowel disease: Psychometric evaluation of an IBDQ cross-culturally adapted version. J Gastrointestin Liver Dis 2008; 17: 439–444.

69.

Hjortswang

Jarnerot

Curman

. Validation of the inflammatory bowel disease questionnaire in Swedish patients with ulcerative colitis. Scand J Gastroenterol 2001; 36: 77–85.

70.

Bernklev

Moum

. Quality of life in patients with inflammatory bowel disease: Translation, data quality, scaling assumptions, validity, reliability and sensitivity to change of the Norwegian version of IBDQ. Scand J Gastroenterol 2002; 37: 1164–1174.

71.

Hashimoto

Green

Iwao

. Reliability, validity, and responsiveness of the Japanese version of the Inflammatory Bowel Disease Questionnaire. J Gastroenterol 2003; 38: 1138–1143.

72.

Pontes

Miszputen

Ferreira-Filho

. [Quality of life in patients with inflammatory bowel diseases: Translation to Portuguese language and validation of the ‘Inflammatory Bowel Disease Questionnaire' (IBDQ)]. Arq Gastroenterol 2004; 41: 137–143.

73.

Janke

Klump

Steder-Neukamm

. [Validation of the German version of the Inflammatory Bowel Disease Questionnaire (Competence Network IBD, IBDQ-D)]. Psychother Psychosom Med Psychol 2006; 56: 291–298.

74.

Masachs

Casellas

Malagelada

. [Spanish translation, adaptation, and validation of the 32-item questionnaire on quality of life for inflammatory bowel disease(IBDQ-32)]. Rev Esp Enferm Dig 2007; 99: 511–519.

75.

Ren

Lai

Chen

. Validation of the mainland Chinese version of the Inflammatory Bowel Disease Questionnaire (IBDQ) for ulcerative colitis and Crohn's disease. Inflamm Bowel Dis 2007; 13: 903–910.

76.

Vidal

Gomez-Gil

Sans

. Psychometric properties of the original Inflammatory Bowel Disease Questionnaire, a Spanish version. Gastroenterol Hepatol 2007; 30: 212–218.

77.

Ciccocioppo

Klersy

Russo

. Validation of the Italian translation of the Inflammatory Bowel Disease Questionnaire. Dig Liver Dis 2011; 43: 535–541.

Assessing patient reported outcome measures: A practical guide for gastroenterologists

Abstract

Keywords

Introduction

The quality properties of PROMs

Item selection

Reliability

Responsiveness

Validity

Interpretability

Cross-cultural adaptation

Checklist to evaluate PROMs

Conclusions

Footnotes

Author contribution

Funding

Conflict of interest

References