Abstract
Aims: Self-reported information from questionnaires is frequently used in clinical epidemiological studies, but few provide information on the reproducibility of instruments applied in secondary coronary prevention studies. This study aims to assess the test–retest reproducibility of the questionnaire applied in the cross-sectional NORwegian CORonary (NOR-COR) Prevention Study. Methods: In the NOR-COR study 1127 coronary heart disease (CHD) patients completed a self-report questionnaire consisting of 249 questions, of which there are both validated instruments and de novo questions. Test–retest reliability of the instrument was estimated after four weeks in 99 consecutive coronary patients. Intraclass Correlation Coefficient (ICC) and Kappa (κ) were calculated. Results: The mean interval between test and retest was 33 (±6.4) days. Reproducibility values for questions in the first part of the questionnaire did not differ from those in the latter. A good to very good reproducibility was found for lifestyle factors (smoking: κ = 1.0; exercise: ICC = 0.90), medical factors (drug adherence: ICC = 0.74; sleep apnoea: ICC = 0.87), and psychosocial factors (anxiety and depression: ICC = 0.95; quality of life 12-Item Short-Form Health Survey (SF12): ICC = 0.89), as well as for the majority of de-novo-created variables covering the patient’s perceptions, motivation, needs, and preferences.
Keywords
Introduction
As a result of the contemporary management of coronary heart disease (CHD), an increasing proportion of patients survive and require optimal secondary prevention [1]. A high prevalence of unhealthy lifestyle and poor risk factor control in CHD patients was demonstrated in a large European multicentre study [2]. The reasons for these findings are complex and somewhat poorly understood, and the identification of optimal patient management and healthcare factors of importance for an improved coronary risk profile remains a public health priority [3]. The aims of an ongoing cross-sectional study, the NORwegian CORonary (NOR-COR) Prevention Study [4], are to identify medical and psychosocial factors associated with unfavourable risk factor control after a coronary event. Most of the data to be explored has been collected through a comprehensive self-report questionnaire.
Self-report questionnaires are frequently used in health research because they are easy to utilize, feasible, and cheap to apply. In order to ensure reproducibility and reliability, a test–retest study is of great importance. The reliability of such a test is assessed by measuring the responses of the same study sample to an identical questionnaire at two or more points in time. A reproducibility test will assess random measurement errors as well as the stability of the construct measured, but cannot in itself distinguish between the two [5]. Thus, one must take into consideration that any real change in the phenomenon of interest that may have occurred during the intervening period between tests will result in seemingly low levels of reliability.
There are no standards for the ideal time span between the initial test and the retest in reproducibility studies. The interval should be long enough to prevent memory effects and short enough to ensure that no real clinical change has occurred among participants [6]. Intervals of one to two weeks [7] and one month [8] have been suggested.
Self-reported information from questionnaires is frequently used in clinical epidemiological studies, but few provide information on the reproducibility of instruments applied in secondary coronary prevention studies. Those available are limited by only addressing single questionnaires with a moderate range of items. So far, few studies have explored whether reproducibility remains satisfactory in a comprehensive questionnaire applied in clinical patient studies. The purpose of this study was to evaluate the test–retest reliability of an extensive self-report questionnaire assembled and created to be used in the NOR-COR study [4]. Given acceptable reproducibility results, such a questionnaire could be valuable in future studies on risk factor control and lifestyle measures in long-term secondary coronary prevention.
Materials and methods
A complete description of the design and methodology applied in the NOR-COR study is published elsewhere [4]. In the present study, the self-report questionnaire used in NOR-COR was completed twice by 99 stable patients with an interval of four weeks.
Design of the NOR-COR questionnaire
The NOR-COR questionnaire contains 249 questions derived from a number of medical and psychosocial instruments that have previously, to some extent, been demonstrated to be associated with coronary risk factors, adherence to medication, and prognosis in cardiac patients [9–15]. As there were no validated instruments for revealing the patient’s needs and preferences, a number of questions/items were created de novo following an extensive process [16], described in detail previously [4]. The NOR-COR questionnaire was pilot-tested in two CHD patients in order to incorporate the patients’ perspective, and subsequently tested in 20 randomly selected eligible CHD patients in order to establish relevance, acceptance, and feasibility.
The following descriptive variables have been obtained from the questionnaire:
Socio-demographic factors:
Marital status; Level of education.
Behaviour/lifestyle risk factors:
Smoking status (never, previous, or current smoking); Physical activity (frequency, duration, intensity, and a sum-score) [17]; Diet (the frequency of intake of fish, vegetables, and fruits); Alcohol consumption (the past four weeks).
Medical factors:
Psychosocial factors
Quality of life (12-Item Short-Form Health Survey (SF12)): a 12-item measure of generic quality of life with a physical health sub-scale Physical Component Summary (PCS12) and mental health sub-scale Mental Component Summary (MCS12) [20]; Anxiety and depression (Hospital Anxiety and Depression Scale, HADS) [21]; Rumination (Ruminative Response Scale, RRS): a 22-item self-report inventory designed to assess the tendency to ruminate in response to a depressed mood [15]; Worry (Penn State Worry Questionnaire, PSWQ): a 16-item measure of pathological worry [22]; Type D personality (distressed personality type, Type D Scale, DS-14): a 14-item instrument with seven items each on the sub-scales of negative affectivity (NA) and social inhibition (SI) [23]; Illness perception (Brief Illness Perception Questionnaire, BIPQ): an 8-item measure of illness identity, personal and treatment ability to control the illness, consequences, understanding and concern about the illness rated on a Likert scale from 0 to 10, and one item about what caused the patient’s illness [24]; Perceived risk perception (PRP): a 3-item measure on a Likert scale from 0 to 10; probability for a new event within 12 months, your own ability to reduce coronary risk, and to what degree the disease will limit your activities [13]; Insomnia (Bergen Insomnia Scale): a 7-item measure of on an 11-point Likert scale from 0 to 10 [25].
Treatment desires, perceived needs, beliefs about causes, motivation (de-novo-created questions)
Beliefs regarding what caused the patient’s CHD, ranking known CHD risk factors from 0 to 10 on a Likert scale indicating to what extent the patient believed that each risk factor had caused the disease to develop; Motivation for further lifestyle changes and changes already achieved in these lifestyle factors; Perceived needs of sufficient health information about CHD and the risk factors; Participation in healthcare follow-up (cardiac rehabilitation, follow-up visits in primary healthcare); Perception of the information provided by healthcare workers [16] with four assertions: I am cured, but have to change my lifestyle; I am cured and do not need to change my lifestyle; I still have heart disease and need to change my lifestyle; and I still have heart disease, but do not need to change my lifestyle; Perceived needs for further secondary preventive follow-up today in order to meet the goal of prevention (email/telephone, nurse, cardiac rehabilitation, physiotherapist, nutritionist, psychiatrist/psychologist, Internet, and/or mobile app).
Study population
A total of 1127 (83% participation rate) patients aged 31–80 (mean 62) with first or recurrent diagnosis or treatment for CHD (acute myocardial infarction, coronary artery bypass graft operation, and/or elective or emergency Percutaneous Coronary Intervention (PCI)) within the time period from eight weeks to three years previously, participated in the NOR-COR study, and completed the questionnaire. The study was conducted at two Norwegian hospitals, Drammen and Vestfold. Initially, 28 of the participants recruited from Vestfold Hospital completed the NOR-COR questionnaire a second time after four weeks. It was decided to increase the number of participants to approximately 100 in order to obtain sufficient statistical power in this reproducibility study. Accordingly, 71 consecutive patients referred to cardiac rehabilitation in Vestfold Hospital performed an identical retest, with inclusion criteria identical to those in the NOR-COR study [3]. The participants in this reproducibility study were considered as having been stable with respect to their CHD, and none had been re-hospitalized during the interval between test and retest. The same observer conducted all tests and retests, and was very alert for possible changes in the patients’ physical or psychological condition that might affect retest results. In order to evaluate possible group differences, patient characteristics in the reproducibility sample and the entire NOR-COR population were compared.
Statistics
Descriptive data are presented as means ± standard deviations (SDs), while reproducibility results are presented with 95% confidence intervals (CIs). Differences between the reproducibility sample and the NOR-COR population regarding age, sex, education, and type of event were assessed with independent two-sample t-tests and chi-square tests. Test–retest reliability was calculated by comparing the data obtained at test sessions 1 and 2 using Intraclass Correlation Coefficients (ICCs) for continuous data and for ordinal variables with at least five response categories [5], and Kappa (κ) for nominal and ordinal variables [26] for each individual question in the NOR-COR questionnaire, as well as for summarized scores when available, such as for exercise, drug adherence, sleep apnoea, and the psychosocial questionnaires. ICC was calculated based on a two-way mixed-effect analysis of variance with 95% CIs. An acceptable reproducibility was set at the often-recommended level of ICC ⩾ 0.70 and κ values were defined as acceptable if above 0.5. The guidelines for interpreting κ with strength of agreement based on Landis and Koch [26] suggest that values are fair between 0.21 and 0.4, moderate between 0.41 and 0.6, good between 0.61 and 0.8, and very good above 0.81. These guidelines for κ agreement will also be applied to continuous data using ICC. Internal consistency was calculated with standardized Cronbach’s alpha for each set of items or scales. Analyses of covariance were used to examine potential differentials in reproducibility across age, gender, or education. Statistical analyses were conducted with the SPSS version 21 (SPSS Inc., US). The significance level was set at p < 0.05.
Ethics
This study was approved by the Regional Committee of Ethics in Medical Research, approval number 2013/1885. Written informed consent was obtained from all included participants. The study is registered at www.clinicaltrials.gov (ID NCT02309255).
Results
A total of 99 patients completed the retest within an interval of 33 (±6.4) days. One patient who broke his leg and a woman who lost her son within the interval between tests were excluded from the reproducibility study. The mean time interval between index hospitalization and first-time completion of the questionnaire was 34 weeks (range 8–83). The amount of missing data was 1.1% in the first test session and 3.0% in the retest, at the same level throughout the questionnaire. Participant feedback revealed that the time used to fill out the questionnaire was 30 to 45 minutes. Reproducibility figures obtained from the first part of the questionnaire did not differ from those of the last part.
There were no significant differences between patient characteristics among the NOR-COR population and the reproducibility study sample (Table I). The reproducibility values were very good for exercise and smoking (Table II), good for the use of alcohol, and moderate for diet. The reproducibility coefficients for drug adherence were acceptable, and very good for obstructive sleep apnoea.
Demographic and medical characteristics of the NOR-COR sample and the reproducibility sample.
NOR-COR: NORwegian CORonary Prevention Study; n: sample size; SD: standard deviation; ns: non-significant; MI: myocardial infarction; CHD: coronary heart disease; ST: ST-segment.
Low education was defined as completion of primary or secondary school only.
Test–retest reliability of lifestyle risk factors and medical factors.
ICC: intraclass correlation coefficient; κ: Kappa agreement; CI: confidence interval; SD: standard deviation.
Exercise sum score, sum of frequency, duration, and intensity.
Fruit and/or vegetables at least twice a day.
Berlin category 1 sum, snoring, and sleep apnoea; Berlin category 2 sum, tired or exhausted.
The test–retest reliability calculations of the psychosocial factors presented in Table III show good reproducibility for quality of life (PCS12) and very good for all other psychosocial instruments. The majority of the questions covering the patient’s perceptions, needs, preferences, and motivation were above the limits for acceptable reproducibility (Table IV). The participants were asked about their preferences for follow-up to meet their present needs of optimal prevention. The reproducibility level for these replies was fair to good.
Test–retest reliability of psychosocial factors.
SD: standard deviation; CI: confidence interval; ICC: intraclass correlation coefficient; CI: confidence interval.
Test–retest reliability of beliefs about disease causes, motivation, perceived needs, and treatment desires.
ICC, intraclass correlation coefficient; κ, Kappa agreement; CI, confidence interval; CHD, coronary heart disease.
Fair internal consistency was found for sleep apnoea Berlin Category 1 sum (Cronbach’s alpha = 0.45 in test 1 and 0.35 in retest); however, the values improved to good (Cronbach’s alpha = 0.68 in test 1 and 0.66 in retest) if item 4 (“does your snoring bother others”) was deleted from computation. Moderate internal consistency was found for the 8-item Morisky Medication Adherence Scale (Cronbach’s alpha = 0.54 in both tests) and SF12 (Cronbach’s alpha = 0.65 in test 1, 0.61 in retest). All other scales showed good to very good internal consistency and Cronbach’s alpha ranged from 0.69 to 0.95, with slightly higher values in the second test.
Significant differences in the level of reproducibility across gender, age, or education level were found in a small proportion of the variables; however, there was no consistency regarding which subgroup showed the highest level of reproducibility.
Discussion
The present study analysed the reproducibility of the questionnaire applied in the NOR-COR study. Our findings demonstrated acceptable to excellent values for almost all of the variables explored. This level of reproducibility in data from the NOR-COR questionnaire will be valuable in performing further analyses, findings, and, indeed, conclusions of the project.
There were few missing data in both tests, and the reproducibility remained high throughout the rather extensive questionnaire. The test–retest sample had similar patient characteristics to those of the total NOR-COR population. Information obtained by self-report questionnaires may be distorted by systematic errors such as the patient giving socially desirable answers, using scales and response options in idiosyncratic ways, as well as recall bias. Systematic errors and biases are hard to assess and control, and would in fact tend to boost test–retest correlations. On the other hand, poor or oscillating understanding of the underlying meaning of a question, being distracted or confused, or responding based on current mood will introduce random error or noise in measurements, thereby reducing statistical associations of substantive interest, as well as test–retest correlations. Test–retest correlations allow for estimates of random measurement errors to be established, given that the underlying construct is stable [5]. Thus, acceptable intra-individual reproducibility is reassuring in the sense that one has apparently minimized the risk of committing type II errors because of random error or noise. Conversely, reliability estimates typically based on internal consistency within a set of items tend to be boosted by systematic errors such as response scale effects and thus may yield misleadingly favourable results [27].
The mean time interval between index hospitalization and first-time completion of the questionnaire was eight months. After this relatively long period the majority had completed the rehabilitation process in our hospital. Possible early problems with medication habituation, anxiety, and depression were considered sufficiently diminished, and the patients’ physical activity level was restored. It is, however, not possible to guarantee total stability (i.e. lack of “true” change) over four weeks, but the abovementioned should have reduced the risk of clinically important improvements or deteriorations that might have influenced reproducibility in the data presented. Test–retest correlations tend to be higher when the time interval between the two points of measurement is short, because few changes have occurred, but there is also a risk of memory effects; that is, respondents recalling their response on the first occasion and choosing the same option on the second occasion in order to appear “consistent”. In order to avoid influence of memory effects and to reduce the possibility of significant events and real changes between the two tests, four to eight weeks has been suggested as the ideal time between the two measurements [28,29].
In post-Myocardial infarction (MI) patients, the assessment of type D personality has been shown to be very stable over 18 months [30] and comparable to the good reproducibility of frequency of exercise per week in stable coronary patients when measured with one week interval between the two tests, whereas reproducibility for exercise diminishes with a longer interval between tests [31–33]. In the present context, the majority of our study participants clearly belonged to the category of stable CHD.
We had expected a tendency towards lower reproducibility in questions from the last part of the questionnaire due to tiredness or fatigue. This did not turn out to be the case, as was also observed in a diet study where the length of the questionnaire had only a minor impact on the response rate and data quality [34].
In the INTERHEART study [35] structured questionnaires were administered to obtain information about socio-demographic factors and cardiovascular risk factors. Repeat measures of risk factors were made in 279 controls at a median interval of 409 days. Except from a nearly identical, and very good agreement rate for smoking in INTERHEART and the present study (κ = 0.94 vs. 1.0, respectively), the respective reproducibility values in INTERHEART and our study differed for depression (κ = 0.44 vs. ICC = 0.94), regular physical activity (κ = 0.56 vs. ICC = 0.85 for frequency), and alcohol (κ = 0.52 vs. ICC = 0.75). Different questionnaires and time interval that had elapsed between test and retest may explain these divergences.
The reproducibility of drug adherence, sleep apnoea, and psychosocial factors based upon widely used questionnaires in the present study was high and in line with most other studies [18–25,36].
Our findings of quite acceptable reproducibility data with only few exceptions can be explained by the extensive process used to develop de novo questions and inclusion of questionnaires that have previously been validated and found to have acceptable reproducibility [17–25,36].
These robust reproducibility data will have practical implications for future analysis of the association between potentially modifiable patient factors and unfavourable risk factor control. Since most of the data to be used in this context is derived from the questionnaire, the present findings are reassuring for further NOR-COR projects and for its application in future clinical studies of secondary CHD prevention.
Study limitations
The participants of the reproducibility study exclusively represented Vestfold Hospital where nearly 80% attend cardiac rehabilitation. Since only half of the NOR-COR study participants attended such a programme, a selection bias cannot be excluded. However, no socio-demographic differences were observed between the entire NOR-COR population and the reproducibility study sample.
True change in the underlying phenomenon will of course result in low or at least reduced test–retest correlations, thus giving the impression of relatively poor measurement reliability (if no true change is assumed). Weak test–retest correlations, therefore, must be viewed with caution since we may in fact be underestimating reliability. However, in the present study this seems to be a rather unnecessary concern since we have consistently found very high test–retest correlations, also for measures for which one might suspect some true change to have occurred in the time period between test and retest (e.g. the reproducibility for physical activity frequency was found to be ICC 0.85). The risk of overestimating measurement reliability because of artificially boosted test–retest correlations (caused by memory effects, stable biases and/or response styles, mode of administration, etc.) can only be assessed by applying alternative research designs, such as having access to a gold standard, systematically altering instrument style and formatting, switching modes of administration etc., which is clearly outside the scope of the present paper.
Conclusion
Reliability studies based on test–retests are essential elements when it comes to establishing the quality of self-report data. A good to very good reproducibility was found for almost all of the items and scales used in the comprehensive NOR-COR questionnaire. Thus, this instrument emerges as a valuable tool for evaluating risk factor control in CHD patients in general, laying the foundation not only for further analyses, findings, and conclusions in the NOR-COR project, but also for similar comprehensive questionnaires applied in future clinical patient studies.
Footnotes
Acknowledgements
The NOR-COR project was carried out at Drammen and Vestfold Hospitals and was developed in collaboration with communities at the medical faculty of University of Oslo.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by a research grant from Extrastiftelsen, Norway (grant number 76728).
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
