Sage Journals: Discover world-class research

Abstract

Objective:

Self-reported information from questionnaires is frequently used in epidemiological studies, but few of these studies provide information on the reproducibility of individual items contained in the questionnaire. We studied the test–retest reliability of self-reported diabetes among 33,919 participants in Norwegian Women and Cancer Study.

Methods:

The test–retest reliability of self-reported type 1 and type 2 diabetes diagnoses was evaluated between three self-administered questionnaires (completed in 1991, 1998, and 2005 by Norwegian Women and Cancer participants) by kappa agreement. The time interval between the test–retest studies was ~7 and ~14 years. Sensitivity of the kappa agreement for type 1 and type 2 diabetes diagnoses was assessed. Subgroup analysis was performed to assess whether test–retest reliability varies with age, body mass index, physical activity, education, and smoking status.

Results:

The kappa agreement for both types of self-reported diabetes diagnoses combined was good (⩾0.65) for all three test–retest studies (1991–1998, 1991–2005, and 1998–2005). The kappa agreement for type 1 diabetes was good (⩾0.73) in the 1991–2005 and the 1998–2005 test–retest studies, and very good (0.83) in the 1991–1998 test–retest study. The kappa agreement for type 2 diabetes was moderate (0.57) in the 1991–2005 test–retest study and good (⩾0.66) in the 1991–1998 and 1998–2005 test–retest studies. The overall kappa agreement in the 1991–1998 test–retest study was stronger than in the 1991–2005 test–retest study and the 1998–2005 test–retest study. There was no clear pattern of inconsistency in the kappa agreements within different strata of age, BMI, physical activity, and smoking. The kappa agreement was strongest among the respondents with 17 or more years of education, while generally it was weaker among the least educated group.

Conclusion:

The test–retest reliability of the diabetes was acceptable and there was no clear pattern of inconsistency in the kappa agreement stratified by age, body mass index, physical activity, and smoking. The study suggests that self-reported diabetes diagnosis from middle-aged women enrolled in the Norwegian Women and Cancer Study is reliable.

Keywords

Type 2 diabetes type 1 diabetes metabolic syndrome kappa test–retest reliability reproducibility questionnaires Norway Norwegian Women and Cancer Kvinner og kreft

Introduction

Epidemiological studies often rely on self-reported information, as this renders the costs of data collection lower than that of clinical studies.¹ However, the validity and reliability of the instruments used for data collection are often not reported.²

Commonly, the Cohen’s kappa coefficient is used to determine inter-rater agreement for disease (or other categorical outcomes) by comparing self-reported information against a gold standard (diagnostic test, medical records, physiological measures, etc.). Previous validation studies of self-reported diabetes diagnosis have indicated that diabetes is reported more accurately than other illnesses or diseases.^3–10

The Cohen’s kappa coefficient can also be used to analyze the test–retest reliability of an instrument. Many studies from Norway have used self-reported information from questionnaires as the principle tool, but few^11–43 of them have provided information on the reproducibility of the individual items and instruments therein. It is important to establish that respondents with different socio-demographic background, and age groups have understood the questions in a similar manner. Test–retest reliability is assessed by measuring the responses of the same study sample to an identical question at two or more points in time.⁴⁴ These responses are then compared to establish the reliability of the instrument. The chi-square (χ²) test for independence is not appropriate for assessing test–retest reliability since it does not take into account that the data are paired (i.e. different measurements for the same individual).

Previous studies using self-reported data from interviews have studied the test–retest reliability of self-reported diabetes diagnosis, with inconsistent kappa agreements.^45–50 Since type 2 diabetes typically affects people aged 40 years and over,^51,52 it is possible to differentiate between the test–retest reliability of self-reported type 1 and type 2 diabetes diagnoses using information on age at diagnosis. No previous study was found that assessed the test–retest reliability for either type 1 or type 2 diabetes separately.

The Norwegian Women and Cancer (NOWAC) Study⁵³ is a prospective cohort study in which women reported diabetes diagnosis and age at diagnosis in three separate questionnaires. If a woman accurately reported her diabetes diagnosis in one study, she is expected to report the same in a subsequent study. This assumption underlies our test–retest reliability analysis. The aim of this study was to assess the test–retest reliability of self-reported diabetes diagnosis, as well as that of type 1 and type 2 diabetes diagnoses separately. Furthermore, the large sample size permits subgroup analyses and sensitivity analysis. We examined whether test–retest reliability varies with age, body mass index (BMI), physical activity, education, and smoking status.

Methods

Study cohort and sampling

The NOWAC Study is a prospective nationwide study which started in 1991,^54,55 and contains data from 170,000 women. Participants were randomly selected from the National Population Register of Norway. The external validity of the study⁵⁶ and validity of some measures^57–59 have been published elsewhere. NOWAC Study participants are assumed to be representative of the female Norwegian population in the corresponding age groups.⁵⁶ The detailed characteristics of the participants are described elsewhere,⁵⁶ and the updated information on the NOWAC Study is accessible on its website.⁵⁴

Of the 170,000 women enrolled in the NOWAC Study, 33,919 women completed all of three questionnaires sent in 1991, 1998, and 2005. The general characteristics of the study sample and the association between BMI and type 2 diabetes in this sample are described elsewhere.⁵²

Questionnaire and classification

Diabetes

Information on diabetes diagnosis was collected by means of the same question in all three questionnaires (1991, 1998, and 2005): “Have you had any of the following diseases?” The list of options included diabetes. Age at diagnosis was measured with the subsequent question, “If yes, at what age was it first discovered?” For the purposes of this study, only participants who reported having diabetes and provided their age at diagnosis were defined as diabetes cases. If participants reported they gave birth to a child either the same year they were diagnosed with diabetes, or in the year preceding child birth, it was assumed that they had gestational diabetes, and they were excluded from the analysis. Final numbers of diabetes cases included in analyses are given in Tables 2 –4. Participants with missing values on diabetes diagnosis and age at diagnosis were excluded.

Using the responses to the questions on diabetes and age at diagnosis, different variables for diabetes diagnosis, and separate variables for type 1 and type 2 diabetes, were created. Since type 2 diabetes typically affects people aged 40 years or over,^51,52 we classified only those aged 40 years or over as having type 2 diabetes. Women who were diagnosed with diabetes at or before age 39 years were categorized as having type 1 diabetes (excluding those with gestational diabetes). Participants with type 1 and type 2 diabetes were classified separately by the above-mentioned criteria for the 1991 test study, the 1998 test study, the 1998 retest study for comparison against 1991 test study, the 2005 retest study for comparison against the 1991 test study, and the 2005 retest study for comparison with the 1998 test study.

Diabetes cases in the 1991 and 1998 test studies were defined as those who reported having diabetes, and their age at diagnosis in the corresponding questionnaires. One respondent to the 1998 questionnaire fulfilled the criteria for both gestational diabetes and type 2 diabetes and was finally classified as having gestational diabetes only.

Diabetes in the 1998 retest study (for comparison against the 1991 test study)

Diabetes cases in the 1998 retest study, for comparison against the 1991 test study were defined as those with diabetes from the 1998 test study, provided they reported a date of diagnosis prior to 1992. The same criteria were applied to women with type 1 or type 2 diabetes. One women in the 1998 retest study fulfilled the criteria both for gestational and type 2 diabetes and was finally classified as having gestational diabetes only.

Diabetes in the 2005 retest study (for comparison against the 1991 test study)

Diabetes cases from the 2005 retest study, for comparison against 1991 test study, were defined as participants who reported a diabetes diagnosis in the 2005 questionnaire, provided they reported a date of diagnosis prior to 1992. The same criteria were applied to women with type 1 or type 2 diabetes.

Diabetes in the 2005 retest study (for comparison against 1998 test study)

Diabetes cases from the 2005 retest study, for comparison against the 1998 test study, were defined as participants with self-reported diabetes in the 2005 questionnaire, provided that they reported a date of diagnosis prior to 1999. The same criteria were applied to women with type 1 or type 2 diabetes.

Covariates

Self-reported information on height and weight from 1998 study was used to calculate BMI (kg/m²). BMI was categorized into three groups: normal weight (BMI: <25 kg/m²), overweight (BMI: 25–29.9 kg/m²), and obese (BMI: ⩾30 kg/m²). Smoking status was derived from the replies to two questions in the 1998 questionnaire: “Have you ever smoked?” (yes, no) and “Do you smoke on a daily basis at the moment?” (yes, no). Women who answered “no” to the former were categorized as “never smokers.” Those who answered “yes” to the former, and “no” to the latter, were categorized as “former smokers,” and those who answered “yes” to both questions were categorized as “current smokers.” A 10-category scale measured the level of self-reported physical activity in the 1998 questionnaire, the validity of which has been reported previously.²¹ Responses to questions about physical activity were used to assign a category of physical activity: low [1–3], medium [4–7], and high [8–10]. Education (duration in years) was categorized into four groups: primary/intermediate (0–9), secondary (10–12), university (13–16), and postgraduate and above (17+). Age (years) was categorized in four groups with 5-year interval.

Statistical analysis

Statistical analysis was performed with SAS version 9.2 and Stata version 13.1. Means (standard deviation (SD)) were estimated for all continuous variables, and the percentage of participants in each category was calculated for all categorical variables. General characteristics of the data are presented as frequencies, percentages, and means with SDs, respectively (Table 1). Variables for all diabetes diagnoses, as well as for type 1 and type 2 diabetes separately, were constructed, and the kappa agreement for the two types of diabetes was calculated for the 1991–1998 test–retest study, the 1991–2005 test–retest study, and 1998–2005 test–retest study, respectively. The kappa coefficients summarize the total agreement beyond that expected by chance. 95% confidence intervals (CIs) for kappa statistic were estimated with analytical method⁶⁰ in Stata.⁶¹ Established benchmarks^62,63 for rating the strength of kappa agreements as poor (<0.20), fair (>0.20 to ⩽0.40), moderate (>0.40 to ⩽0.60), good (>0.60 to ⩽0.80), and very good (>0.80 to ⩽1.00) were used.

Table 1.

General characteristics of the study sample (n = 33,919).

	Cohort n = 33,919
	N (%)	Mean (SD)
Age (years)		47.7 (4.3)
40–44	9926 (29.3)
45–49	11,382 (33.6)
50–54	10,849 (32.0)
55–59	1762 (5.2)
BMI^a		24.4 (3.8)
Normal weight (<25 kg/m²)	21,553 (64.6)
Overweight (25–29.9 kg/m²)	9106 (27.3)
Obese (⩾30 kg/m²)	2709 (8.1)
Education level (duration in years)^a		12.5 (3.2)
Primary/intermediate (0–9)	6736 (20.1)
Secondary (10–12)	12,102 (36.1)
University (13–16)	10,226 (30.5)
Postgraduate and above (17+)	4460 (13.3)
Physical activity level^a		5.6 (1.7)
Low	3686 (11.5)
Medium	24,229 (75.5)
High	4186 (13.0)
Smoking status
Never smoker	13,763 (40.6)
Former smoker	10,582 (31.2)
Current smoker	9574 (28.2)

SD: standard deviation; BMI: body mass index.

Cohort size was 33,919, but because of missing values, the numbers for some variables do not add up to 33,919.

Consistency (%) was calculated as

Consistency (%) = \frac{Number of diabetes diagnosis in both test and retest study}{Number of diabetes diagnosis in test study} \times 100

Sensitivity analysis

Since self-reported age at diagnosis was used as the only discriminative criterion for distinguishing between type 1 and type 2 diabetes, sensitivity analysis was performed by restricting age at diagnosis <35 years for type 1 diabetes and age at diagnosis >44 years for type 2 diabetes (Table 5). Those reporting age at diagnosis 35-44 were excluded for the purpose of assessing sensitivity of the kappa agreements (Table 5).

Subgroup analysis

Subgroup analysis was performed to assess the consistency of the kappa agreement across stratas of the covariates (Tables 6 –10).

Ethical approval

The NOWAC Study was approved by the Regional Committee for Medical and Health Research Ethics. All participating women gave written informed consent.

Results

Table 1 presents the general characteristics of the study sample. Among the 33,919 women participating in 1991, 1998, and 2005 study, the age distribution was between 40 and 59 (mean: 47.7 ± 4.3) in 1998. Majority (64.6%) of the respondents had normal weight (BMI: <25 kg/m²). Almost 40.3% of the respondents had some university education or more. Most (75.5%) of the respondents were classified as having medium level of physical activity. In this study sample, 28.2% were classified as being current smoker, while 31.2% were classified as being former smokers.

Table 2 presents the self-reported diabetes diagnosis in 1991 study, and 1998 study by self-reported age at diagnosis in respective studies. Majority (56%) of the self-reported diabetics reported age at diagnosis as 30 years or over in 1991 study, while over 64.7% reported age at diagnosis as 40 years or over in the 1998 study. This may partly be due to the aging cohort itself.

Table 2.

Self-reported diabetes diagnosis in 1991 and 1998 test studies by self-reported age at diagnosis.

	Age groups	Diabetic in 1991^a	Diabetic in 1998^b
		n (%)	n (%)
Age at diagnosis	0–4	5 (3.4)^c	4 (1.3)^d
	5–9	10 (6.8)^c	6 (1.9)^d
	10–14	18 (12.2)^c	17 (5.4)^d
	15–19	10 (6.8)^c	11 (3.5)^d
	20–24	7 (4.7)^c	12 (3.8)^d
	25–29	15 (10.1)^c	12 (3.8)^d
	30–34	23 (15.5)^c	19 (6.0)^d
	35–39	23 (15.5)^c	30 (9.5)^d
	40–44	25 (16.9)^c	75 (23.8)^d
	45–49	12 (8.1)^c	70 (22.2)^d
	50–54	–	59 (18.7)^d
	Total	148 (100.0)	315 (100.0)

Diabetes cases in the 1991 test study were defined as those who reported having diabetes, and their age at diagnosis in the 1991 study.

Diabetes cases in the 1998 test study were defined as those who reported having diabetes, and their age at diagnosis in the 1998 study. One respondent to the 1998 questionnaire fulfilled the criteria for both gestational diabetes and type 2 diabetes and was excluded.

N and % of respondents reporting age at diagnosis in 1991 study.

N and % of respondents reporting age at diagnosis in 1998 study.

Tables 3 and 4 present the kappa statistics for the test–retest studies. The agreement for all self-reported diabetes diagnoses in the 1991–1998 test–retest study was 0.75 (95% CI: 0.70–0.81), while it was 0.70 (95% CI: 0.66–0.74) in the 1998–2005 test–retest study. The kappa agreement for all self-reported diabetes diagnoses in the 1991–2005 test–retest study was 0.65 (95% CI: 0.58–0.71) (Table 3).

Table 3.

Kappa agreements for self-reported diabetes diagnoses (excluding gestational diabetes).

	Test study	Retest study	Consistency (%)^a	Kappa (95% CI)
Diabetes	Cases in 1991 (n)	Cases in 1998 (n)
	148	151	113/148 (76.4)	0.75 (0.70–0.81)
	Cases in 1991 (n)	Cases in 2005 (n)
	148	130	90/148 (60.8)	0.65 (0.58–0.71)
	Cases in 1998 (n)	Cases in 2005 (n)
	315	282	209/315 (66.3)	0.70 (0.66–0.74)

CI: confidence interval.a

$Consistency (%) = \frac{Number of diabetes diagnosis in both test and retest study}{Number of diabetes diagnosis in test study} \times 100 .$

Table 4.

Kappa agreements for self-reported type 1 and type 2 diabetes diagnoses.

Diabetes type	Test study	Retest study	Consistency (%)^a	Kappa (95% CI)
	Cases in 1991 (n)	Cases in 1998 (n)
Type 1 diabetes^b	111	103	83/111 (74.7)	0.83 (0.76–0.89)
Type 2 diabetes^c	37	48	29/37 (78.4)	0.67 (0.55–0.79)
	Cases in 1991 (n)	Cases in 2005 (n)
Type 1 diabetes^b	111	88	64/111 (57.6)	0.76 (0.68–0.84)
Type 2 diabetes^c	37	42	21/37 (56.6)	0.57 (0.43–0.71)
	Cases in 1998 (n)	Cases in 2005 (n)
Type 1 diabetes^b	111	97	70/111 (63.1)	0.73 (0.66–0.81)
Type 2 diabetes^c	204	185	125/204 (61.3)	0.66 (0.59–0.72)

CI: confidence interval.

$Consistency (%) = \frac{Number of diabetes diagnosisi n both test and retest study}{Number of diabetes diagnosis in test study} \times 100$ .

Type 1 diabetes were classified as those reporting age at diagnosis <40 years.

Type 2 diabetes were classified as those reporting age at diagnosis >39 years.

Table 4 shows the kappa agreement for the three test–retest studies separately for the two types of diabetes. The kappa agreement for type 1 diabetes was very good in the 1991–1998 test–retest study (kappa = 0.83, 95% CI: 0.76–0.89), while it was good in the 1991–2005 test–retest study (kappa = 0.76, 95% CI: 0.68–0.84), and the 1998–2005 test–retest study (kappa = 0.73, 95% CI: 0.66–0.81). The kappa agreement for type 2 diabetes was good in the 1991–1998 test–retest study (kappa = 0.67, 95% CI: 0.55–0.79), and in the 1998–2005 test–retest study (kappa = 0.66, 95% CI: 0.59–0.72), while it was moderate in the 1991–2005 test–retest study (kappa = 0.57, 95% CI: 0.43–0.71) (Table 4). The overall kappa agreement in the 1991–1998 test–retest study was stronger than in the 1991–2005 test–retest study and the 1998–2005 test–retest study (Table 4).

Table 5 presents the sensitivity of the kappa agreements by classifying those reporting age at diagnosis less than 35, as diagnosed with type 1 diabetes. While, classifying those reporting age at diagnosis greater than 44 as diagnosed with type 2 diabetes. The kappa agreements remained moderate to good for type 1 diabetes, while the kappa agreements for type 2 diabetes were fair to good (Table 5).

Table 5.

Sensitivity analysis of kappa agreements for self-reported type 1 (age at diagnosis: <35 years) and type 2 diabetes diagnoses (age at diagnosis: >44 years).

Diabetes type	Test study	Retest study	Consistency (%)^a	Kappa (95% CI)
	Cases in 1991 (n)	Cases in 1998 (n)
Type 1 diabetes^b	88	81	68/88 (77.3)	0.80 (0.65–0.95)
Type 2 diabetes^c	12	15	6/12 (50.0)	0.52 (0.27–0.77)
	Cases in 1991 (n)	Cases in 2005 (n)
Type 1 diabetes^b	88	74	54/88 (61.4)	0.69 (0.51–0.88)
Type 2 diabetes^c	12	12	3/12 (25.0)	0.33 (0.05–0.61)
	Cases in 1998 (n)	Cases in 2005 (n)
Type 1 diabetes^b	81	74	57/81 (70.4)	0.60 (0.38–0.81)
Type 2 diabetes^c	129	123	75/129 (58.1)	0.63 (0.56–0.70)

CI: confidence interval.

$Consistency (%) = \frac{Number of diabetes diagnosis in both test and retest study}{Number of diabetes diagnosis in test study} \times 100$ .

Only those reporting age at diagnosis <35 years were included.

Only those reporting age at diagnosis >44 years were included.

Tables 6 –10 present the kappa agreement for diabetes stratified by age, BMI, physical activity, education, and smoking status. There was no clear pattern of inconsistency in the kappa agreements within different strata of age, BMI, physical activity, and smoking (Table 6 –8 and 10). However, the stratified analysis by the level of education shows that the kappa agreement is strongest among the most educated group (Table 9) in all the test–retest comparisons, while generally it was weaker among the least educated group.

Table 6.

Kappa agreements for self-reported diabetes diagnoses (excluding gestational diabetes) stratified by age groups.

		Test study	Retest study	Consistency (%)^a	Kappa (95% CI)
		Cases in 1991 (n)	Cases in 1998 (n)
Age	40–44	38	31	27/38 (71.1)	0.78 (0.67–0.89)
	45–49	39	38	27/39 (69.2)	0.70 (0.58–0.82)
	50–54	59	65	48/59 (81.4)	0.77 (0.69–0.86)
	55–59	12	17	11/12 (91.7)	0.76 (0.58–0.93)
		Cases in 1991 (n)	Cases in 2005 (n)
Age	40–44	38	30	26/38 (68.4)	0.76 (0.65–0.88)
	45–49	39	35	24/39 (61.5)	0.65 (0.52–0.77)
	50–54	59	54	34/59 (57.6)	0.60 (0.49–0.71)
	55–59	12	11	6/12 (50.0)	0.52 (0.27–0.77)
		Cases in 1998 (n)	Cases in 2005 (n)
Age	40–44	64	57	42/64 (65.6)	0.69 (0.60–0.79)
	45–49	75	66	49/75 (65.3)	0.69 (0.61–0.78)
	50–54	143	131	98/143 (68.5)	0.71 (0.65–0.77)
	55–59	33	28	20/33 (60.6)	0.65 (0.51–0.79)

CI: confidence interval.

$Consistency (%) = \frac{Number of diabetes diagnosis in both test and retest study}{Number of diabetes diagnosis in test study} \times 100$ .

Table 7.

Kappa agreements for self-reported diabetes diagnoses (excluding gestational diabetes) stratified by BMI.

		Test study	Retest study	Consistency (%)^a	Kappa (95% CI)
		Cases in 1991 (n)	Cases in 1998 (n)
	Normal weight (<25 kg/m²)	62^b	62	49/62 (79.0)	0.79 (0.71–0.87)
BMI	Overweight (25–29.9 kg/m²)	44^b	48	35/44 (79.5)	0.76 (0.66–0.86)
	Obese (⩾30 kg/m²)	41^b	41	29/41 (70.7)	0.70 (0.59–0.82)
		Cases in 1991 (n)	Cases in 2005 (n)
	Normal weight (<25 kg/m²)	62^b	59^c	44/62 (80.0)	0.73 (0.64–0.82)
BMI	Overweight (25–29.9 kg/m²)	44^b	35^c	21/44 (47.7)	0.53 (0.40–0.66)
	Obese (⩾30 kg/m²)	41^b	35^c	25/41 (61.0)	0.65 (0.53–0.78)
		Cases in 1998 (n)	Cases in 2005 (n)
	Normal weight (<25 kg/m²)	99^d	89^e	74/99 (74.7)	0.79 (0.72–0.85)
BMI	Overweight (25–29.9 kg/m²)	99^d	83^e	59/99 (59.6)	0.65 (0.56–0.73)
	Obese (⩾30 kg/m²)	114^d	106^e	74/114 (64.9)	0.79 (0.72–0.85)

CI: confidence interval; BMI: body mass index.

$Consistency (%) = \frac{Number of diabetes diagnosis in both test and retest study}{Number of diabetes diagnosis in test study} \times 100$ .

The numbers do not add up to 148 due to missing values on height or weight (consequently on BMI).

The numbers do not add up to 130 due to missing values on height or weight (consequently on BMI).

The numbers do not add up to 315 due to missing values on height or weight (consequently on BMI).

The numbers do not add up to 282 due to missing values on height or weight (consequently on BMI).

Table 8.

Kappa agreements for self-reported diabetes diagnoses (excluding gestational diabetes) stratified by physical activity.

		Test study	Retest study	Consistency (%)^a	Kappa (95% CI)
		Cases in 1991 (n)	Cases in 1998 (n)
	Low	24^b	31^c	19/24 (79.2)	0.68 (0.54–0.82)
Physical activity level	Medium	106^b	101^c	80/106 (75.5)	0.77 (0.71–0.84)
	High	11^b	11^c	8/11 (72.7)	0.73 (0.52–0.94)
		Cases in 1991 (n)	Cases in 2005 (n)
	Low	24^b	27^d	18/24 (75.0)	0.74 (0.56–0.85)
Physical activity level	Medium	106^b	86^d	63/106 (59.4)	0.66 (0.58–0.73)
	High	11^b	9^d	5/11 (45.5)	0.50 (0.23–0.77)
		Cases in 1998 (n)	Cases in 2005 (n)
	Low	62^e	57^f	43/62 (69.4)	0.72 (0.63–0.81)
Physical activity level	Medium	209^e	188^f	139/209 (66.5)	0.70 (0.65–0.75)
	High	26^e	25^f	17/26 (65.4)	0.67 (0.52–0.82)

CI: confidence interval.

$Consistency (%) = \frac{Number of diabetes diagnosis in both test and retest study}{Number of diabetes diagnosis in test study} \times 100$ .

The numbers do not add up to 148 due to missing values on physical activity level.

The numbers do not add up to 151 due to missing values on physical activity level.

The numbers do not add up to 130 due to missing values on physical activity level.

The numbers do not add up to 315 due to missing values on physical activity level.

The numbers do not add up to 282 due to missing values on physical activity level.

Table 9.

Kappa agreements for self-reported diabetes diagnoses (excluding gestational diabetes) stratified by education level.

		Test study	Retest study	Consistency (%)^a	Kappa (95% CI)
		Cases in 1991 (n)	Cases in 1998 (n)
	Primary/intermediate (0–9)	35^b	40^c	28/35 (80)	0.75 (0.64–0.86)
Education level (duration in years)	Secondary (10–12)	63^b	70^c	50/63 (79.4)	0.75 (0.67–0.83)
	University (13–16)	33^b	24^c	21/33 (63.6)	0.74 (0.61–0.87)
	Postgraduate and above (17+)	14^b	15^c	13/14 (92.9)	0.90 (0.78–1.00)
		Cases in 1991 (n)	Cases in 2005 (n)
	Primary/intermediate (0–9)	35^b	32^d	18/35 (51.4)	0.54 (0.39–0.68)
Education level (duration in years)	Secondary (10–12)	63^b	53^d	37/63 (58.7)	0.64 (0.53–0.74)
	University (13–16)	33^b	30^d	24/33 (72.7)	0.76 (0.64–0.88)
	Postgraduate and above (17+)	14^b	12^d	10/14 (71.4)	0.77 (0.59–0.95)
		Cases in 1998 (n)	Cases in 2005 (n)
	Primary/intermediate (0–9)	85^e	78^f	55/85 (64.7)	0.67 (0.59–0.75)
Education level (duration in years)	Secondary (10–12)	133^e	112^f	85/133 (64.0)	0.69 (0.62–0.76)
	University (13–16)	63^e	61^f	45/63 (71.4)	0.72 (0.64–0.81)
	Postgraduate and above (17+)	30^e	27^f	21/30 (70.0)	0.74 (0.61–0.86)

CI: confidence interval.

$Consistency (%) = \frac{Number of diabetes diagnosis in both test and retest study}{Number of diabetes diagnosis in test study} \times 100$ .

The numbers do not add up to 148 due to missing values on education level.

The numbers do not add up to 151 due to missing values on education level.

The numbers do not add up to 130 due to missing values on education level.

The numbers do not add up to 315 due to missing values on education level.

The numbers do not add up to 282 due to missing values on education level.

Table 10.

Kappa agreements for self-reported diabetes diagnoses (excluding gestational diabetes) stratified by smoking status.

		Test study	Retest study	Consistency (%)^a	Kappa (95% CI)
		Cases in 1991 (n)	Cases in 1998 (n)
	Never smoker	51	47	38/51 (74.5)	0.78 (0.68–0.87)
Smoking status	Former smoker	51	50	37/51 (72.5)	0.73 (0.63–0.83)
	Current smoker	46	54	38/46 (82.6)	0.76 (0.67–0.85)
		Cases in 1991 (n)	Cases in 2005 (n)
	Never smoker	51	41	29/51 (56.9)	0.63 (0.51–0.75)
Smoking status	Former smoker	51	40	31/51 (60.8)	0.68 (0.57–0.79)
	Current smoker	46	49	30/46 (65.2)	0.63 (0.52–0.74)
		Cases in 1998 (n)	Cases in 2005 (n)
	Never smoker	108	94	72/108 (66.7)	0.71 (0.64–0.78)
Smoking status	Former smoker	103	93	62/103 (60.2)	0.63 (0.55–0.71)
	Current smoker	104	95	75/104 (72.1)	0.75 (0.68–0.82)

CI: confidence interval.

$Consistency (%) = \frac{Number of diabetes diagnosis in both test and retest study}{Number of diabetes diagnosis in test study} \times 100$ .

Discussion

In this study, we analyzed the test–retest reliability of self-reported diabetes diagnosis in a large sample of middle-aged women in Norway. We observed that the agreement was good for all diabetes diagnoses combined in all three test–retest studies. The weakest agreement was found in the 1991–2005 test–retest study. This was to be expected, as the time interval between these studies was the longest. These results also suggest that other confounding factors may have affected self-reported diabetes diagnosis in the 1991–1998, or 1998–2005 test–retest studies, as the agreement in these periods was expected to be more similar. The fact that diabetes diagnosis may change over time could have contributed to the decreasing agreement observed between the 1991–1998 test–retest study and the 1991–2005 test–retest study. However, looking at the two types of diabetes separately revealed some differences in the kappa agreement. The kappa agreement for type 1 diabetes was weakest in the 1998–2005 test–retest study, which was very close to the kappa agreement for the ~14-year interval in the 1991–2005 test–retest study. In summary, the results show that although the agreement for all self-reported diabetes was weakest in the 1991–2005 test–retest study, this was not the case when analyzing the kappa agreement for the two types of diabetes separately. This suggests that recall problems may not be an important determinant of the accuracy of self-reported diabetes diagnosis.

One possible reason for the higher kappa agreement among women with type 1 diabetes in our study is that these women may have severe complications sooner⁶⁴ than women with type 2 diabetes; this may have contributed the women’s recall of age at diagnosis, resulting in a higher agreement for type 1 diabetes.

Since type 2 diabetes typically affects people 40 years of age and over,^51,52 we classified only women aged 40 years and over as having type 2 diabetes. However, it is still possible that women younger than 40 years of age have developed type 2 diabetes.^65–69 In addition, cases identified as having gestational diabetes were excluded from the type 2 diabetes group, although women who had gestational diabetes may develop type 2 diabetes later in life.^70,71 Women aged 39 years or less who reported a diabetes diagnosis (excluding gestational diabetes) were categorized as having type 1 diabetes. Since type 1 diabetes can occur at any age,⁷² it is also possible that some of the women classified as having type 2 diabetes in fact had type 1 diabetes. Due to the design and self-reported nature of the study, it was not possible to confirm the exact type(s) of diabetes diagnosis. The results from sensitivity analysis restricting type 1 diabetes cases to those reporting age at diagnosis less than 35 years, and restricting type 2 diabetes to those reporting age at diagnosis more than 44 years, were still acceptable.

This study was larger than previous studies, permitting subgroup analyses. No clear pattern of inconsistency in kappa agreements was observed between different strata of BMI, physical activity, and smoking status. Although no formal test of heterogeneity was performed to assess the statistical difference in kappa agreements across the subgroups, there was a pattern across education groups. The kappa agreement was strongest among the most educated group, while generally it was weaker among the least educated group.

Although the NOWAC cohort is representative of Norwegian women in corresponding age groups, the current sample may not be a representative sample since it includes only the women participating in all the three waves of the study. Furthermore, the respondents with missing values were excluded. Some research suggests that those belonging to the low socio-economic strata, and are relatively unhealthy, are likely to have a higher proportion of missing values in observational study.⁷³ Multiple imputation (MI) was not performed, since the kappa statistic⁶¹ is not supported with MI software’s^74–77 in Stata. Therefore, the possibility of selection bias limits the external validity of this study.

The kappa agreement we report here is not comparable to other studies^63,78 due to differences in the proportion of people reporting a certain type of diabetes in different studies, or differences in distribution. We found few studies assessing the test–retest reliability of diabetes diagnosis, and the results of those that were found were not consistent. Most showed very good agreement^45–49,79 between the test and the retest studies, while others showed a good⁵⁰ or moderate⁸⁰ level of agreement. However, most of the studies we found^46–49,80 did not report either the significance probability or the CIs. One possible reason for the higher kappa agreement reported in previous studies^45–50 may be the relatively small time interval between the test and retest studies, as compared to the ~7- or ~14-year interval in our study. The relatively smaller time interval between the test and retest studies may have caused respondents in other populations to remember their previous response more easily, resulting in a higher kappa coefficient.

Another key difference between previous studies^45–50 and our study was their use of interview to collect the information on diabetes diagnosis. As these studies used an interview setting, it is reasonable to assume that the respondent had a chance to ask for questions to be repeated, or for further explanation/clarification, and that the interviewer might have provided it. This may have helped the respondents to understand the question better, and to therefore report more accurately. It is probable that this key difference in the investigation tool increases the kappa agreement for the test–retest reliability of the studies using interviews to collect data.

However, a study from Manhattan (New York)⁸⁰ reported on the test–retest reliability of diabetes diagnosis using telephone interviews. The retest study was conducted within 30 days of the test study, and the kappa agreement between the test and retest studies was found to be 0.48, which is very low considering the short time interval, and despite the use of interviews to collect data. This shows that a short time interval between the test and the retest study and the use of interviews do not necessarily increase the kappa agreement.

The strength of this study is that, it is the first to assess the test–retest reliability of self-reported diabetes diagnosis separately for type 1 and type 2 diabetes. Other strengths of our study include a large cohort size, sensitivity of the estimates by self-reported age at diagnosis, and subgroup analysis within different covariates. This study provides new insights into earlier research by providing the reliability of self-reported diagnosis separately for type 1 and type 2 diabetes.

Strengths and limitation of this study

Large (n = 33,919) longitudinal population-based study.

First to assess the test–retest reliability of self-reported diabetes diagnosis separately for type 1 and type 2 diabetes.

Some women younger than 40 years of age may have developed type 2 diabetes.

Women with gestational diabetes were excluded, although they may develop type 2 diabetes later in life.

Conclusion

In conclusion, this study shows that the reliability of the self-reported information on diabetes diagnosis from a large prospective cohort study with long time interval is satisfactory.

Footnotes

Acknowledgements

The authors are thankful for the anonymous reviewers’ comments and suggestions to improve the quality of the article.

Authors’ contributions

This work was completed as part of M.A.S.’s Master’s in Public Health thesis, supervised by T.B. M.A.S. performed statistical analyses, data interpretation, and drafted the article. T.B. is the principle investigator of this study. E.L. is the principle investigator of the NOWAC study. E.L. and T.B. critically reviewed the article.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Ethics approval

The NOWAC Study was approved by the Regional Committee for Medical and Health Research Ethics.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Informed consent

All participants gave written informed consent.

References

Beckett

Weinstein

Goldman

. Do health interview surveys yield reliable data on chronic illness among older respondents? Am J Epidemiol 2000; 151(3): 315–323.

Feinstein

Horwitz

RI.

Double standards, scientific methods, and epidemiologic research. N Engl J Med 1982; 307(26): 1611–1617.

Goldman

Lin

I-f

Weinstein

. Evaluating the quality of self-reports of hypertension and diabetes. Office of Population Research working paper no. 2002-3, http://westoff.princeton.edu/papers/opr0203.pdf

De Burgos-Lunar

Salinero-Fort

Cardenas-Valladolid

. Validation of diabetes mellitus and hypertension diagnosis in computerized medical records in primary health care. BMC Med Res Methodol 2011; 11: 146.

Robinson

Young

Roos

. Estimating the burden of disease: comparing administrative data and self-reports. Med Care 1997; 35(9): 932–947.

Heliövaara

Aromaa

Klaukka

. Reliability and validity of interview data on chronic diseases. The mini-Finland Health Survey. J Clin Epidemiol 1993; 46(2): 181–191.

Harlow

Linet

MS.

Agreement between questionnaire and medical records. Am J Epidemiol 1989; 129(2): 233–248.

Huerta

José Tormo

Egea-Caparrós

. Accuracy of self-reported diabetes, hypertension, and hyperlipidemia in the adult Spanish population. DINO study findings. Rev Esp Cardiol 2009; 62(2): 143–152.

Midthjell

Holmen

Bjorndal

. Is questionnaire information valid in the study of a chronic disease such as diabetes? The Nord-Trøndelag diabetes study. J Epidemiol Community Health 1992; 46(5): 537–542.

10.

Kriegsman

DMW

Penninx

BWJH

Van Eijk

JTM

. Self-reports and general practitioner information on the presence of chronic diseases in community dwelling elderly: a study on the accuracy of patients’ self-reports and on determinants of inaccuracy. J Clin Epidemiol 1996; 49(12): 1407–1417.

11.

Parr

Veierød

Laake

. Test-retest reproducibility of a food frequency questionnaire (FFQ) and estimated effects on disease risk in the Norwegian Women and Cancer Study (NOWAC). Nutr J 2006; 5(1): 4.

12.

Veierød

Parr

Lund

. Reproducibility of self-reported melanoma risk factors in a large cohort study of Norwegian women. Melanoma Res 2008; 18(1): 1–9.

13.

Tretli

Lund-Larsen

Foss

OP.

Reliability of questionnaire information on cardiovascular disease and diabetes: cardiovascular disease study in Finnmark county. J Epidemiol Community Health 1982; 36(4): 269–273.

14.

Jacobsen

Bønaa

KH.

The reproducibility of dietary data from a self-administered questionnaire. The Tromsø study. Int J Epidemiol 1990; 19(2): 349–353.

15.

Johansson

Solvoll

Opdahl

. Response rates with different distribution methods and reward, and reproducibility of a quantitative food frequency questionnaire. Eur J Clin Nutr 1997; 51(6): 346–353.

16.

Hjemdal

Friborg

Stiles

. Resilience predicting psychiatric symptoms: a prospective study of protective factors and their role in adjustment to stressful life events. Clin Psychol Psychother 2006; 13(3): 194–201.

17.

Solberg

Olsen

J-A

Ingebrigtsen

. Health-related quality of life assessment by the EuroQol-5D can provide cost-utility data in the field of low-back surgery. Eur Spine J 2005; 14(10): 1000–1007.

18.

Holm

Friis

Storheim

. Measuring self-reported functional status and pain in patients with chronic low back pain by postal questionnaires: a reliability study. Spine 2003; 28(8): 828–833.

19.

Svege

Kolle

Risberg

Reliability and validity of the Physical Activity Scale for the Elderly (PASE) in patients with hip osteoarthritis. BMC Musculoskelet Disord 2012; 13(1): 26.

20.

Sørlie

Sexton

HC.

The factor structure of “The Ways of Coping Questionnaire” and the process of coping in surgical patients. Pers Indiv Differ 2001; 30(6): 961–975.

21.

Bjertnaes

Iversen

Kjollesdal

PIPEQ-OS—an instrument for on-site measurements of the experiences of inpatients at psychiatric institutions. BMC Psychiatr 2015; 15: 234.

22.

Nordberg

Moltu

Råbu

Norwegian translation and validation of a routine outcomes monitoring measure: the treatment outcome package. Nordic Psychology. Epub ahead of print 15 September 2015. DOI: 10.1080/19012276.2015.1071204.

23.

Bjørnarå

Hillesund

Torstveit

. An assessment of the test-retest reliability of the New Nordic Diet score. Food Nutr Res 2015; 59: 28397.

24.

Moljord

IEO

Lara-Cabrera

Perestelo-Pérez

. Psychometric properties of the Patient Activation Measure-13 among out-patients waiting for mental health treatment: a validation study in Norway. Patient Educ Couns 2015; 98(11): 1410–1417.

25.

Myr

Bere

Overby

Test-retest reliability of a new questionnaire on the diet and eating behavior of one year old children. BMC Res Notes 2015; 8(1): 16.

26.

Røysamb

Vittersø

Tambs

The relationship satisfaction scale—psychometric properties. Norsk Epidemiologi 2014; 24(1–2): 187–194.

27.

Øverby

Hillesund

Sagedal

. The Fit for Delivery study: rationale for the recommendations and test-retest reliability of a dietary score measuring adherence to 10 specific recommendations for prevention of excessive weight gain during pregnancy. Matern Child Nutr 2015; 11(1): 20–32.

28.

Bjertnaes

Skudal

Iversen

. The Patient-Reported Incident in Hospital Instrument (PRIH-I): assessments of data quality, test–retest reliability and hospital-level reliability. BMJ Qual Saf 2013; 22(9): 743–751.

29.

Johansen

Roe

Bakke

. Reliability and responsiveness of the Norwegian version of the Neck Disability Index. Scand J Pain 2014; 5(1): 28–33.

30.

Nordtorp

Nyquist

Jahnsen

. Reliability of the Norwegian Version of the Children’s Assessment of Participation and Enjoyment (CAPE) and Preferences for Activities of Children (PAC). Phys Occup Ther Pediatr 2013; 33(2): 199–212.

31.

Løchting

Grotle

Storheim

. Individualized quality of life in patients with low back pain: reliability and validity of the Patient Generated Index. J Rehabil Med 2014; 46(8): 781–787.

32.

Iversen

Espehaug

Rokne

. Psychometric properties of the Norwegian version of the Audit of Diabetes-Dependent Quality of Life. Qual Life Res 2013; 22(10): 2809–2812.

33.

Haldorsen

Svege

Roe

. Reliability and validity of the Norwegian version of the Disabilities of the Arm, Shoulder and Hand questionnaire in patients with shoulder impingement syndrome. BMC Musculoskelet Disord 2014; 15: 78.

34.

Aaby

Heimdal

J-H.

The voice-related quality of life (V-RQOL) measure—a study on validity and reliability of the Norwegian version. J Voice 2013; 27(2): 258.e29–258.e33.

35.

Tavoly

Jelsness-Jørgensen

L-P

Wik

. Quality of life after pulmonary embolism: first cross-cultural evaluation of the pulmonary embolism quality-of-life (PEmb-QoL) questionnaire in a Norwegian cohort. Qual Life Res 2015; 24(2): 417–425.

36.

Kapstad

Nelson

Øverås

. Validation of the Norwegian short version of the Body Shape Questionnaire (BSQ-14). Nord J Psychiatry 2015; 69(7): 509–514.

37.

Klokkerud

Grotle

Løchting

. Psychometric properties of the Norwegian version of the patient generated index in patients with rheumatic diseases participating in rehabilitation or self-management programmes. Rheumatology 2013; 52: 924–932.

38.

Amble

Gude

Stubdal

. Psychometric properties of the outcome questionnaire-45.2: the Norwegian version in an international context. Psychother Res 2014; 24(4): 504–513.

39.

Agerup

Lydersen

Wallander

. Maternal and paternal psychosocial risk factors for clinical depression in a Norwegian community sample of adolescents. Nord J Psychiatry 2015; 69(1): 35–41.

40.

Skre

Friborg

Elgaroy

. The factor structure and psychometric properties of the Clinical Outcomes in Routine Evaluation—Outcome Measure (CORE-OM) in Norwegian clinical and non-clinical samples. BMC Psychiatr 2013; 13: 99.

41.

Erdvik

Øverby

Haugen

Translating, reliability testing, and validating a Norwegian Questionnaire to Assess Adolescents’ Intentions to be Physically Active After High School Graduation. SAGE Open. Epub ahead of print 13 April 2015. DOI: 10.1177/2158244015580374.

42.

Bergland

Hofoss

Kirkevold

. Person-centred ward climate as experienced by mentally lucid residents in long-term care facilities. J Clin Nurs 2015; 24(3–4): 406–414.

43.

Østerås

Garratt

Grotle

. Patient-reported quality of care for osteoarthritis: development and testing of the osteoarthritis quality indicator questionnaire. Arthritis Care Res 2013; 65(7): 1043–1051.

44.

Trochim

WMK

. Types of reliability, http://www.socialresearchmethods.net/kb/reltypes.php (2006, accessed 24 December 2011).

45.

Andresen

Malmstrom

Miller

. Retest reliability of self-reported function, self-care, and disease history. Med Care 2005; 43(1): 93–97.

46.

Yount

Wyrwich

Brownson

RC.

The reliability of a questionnaire-based metabolic syndrome surveillance tool. Metab Syndr Relat Disord 2007; 5(3): 282–289.

47.

Brownson

Jackson-Thompson

Wilkerson

. Reliability of information on chronic disease risk factors collected in the Missouri Behavioral Risk Factor Surveillance System. Epidemiology 1994; 5(5): 545–549.

48.

Starr

Grande

Taylor

. Reliability of self-reported behavioural health risk factors in a South Australian telephone survey. Australian N Z J Public Health 1999; 23(5): 528–530.

49.

Bosetti

Tavani

Negri

. Reliability of data on medical conditions, menstrual and reproductive history provided by hospital controls. J Clin Epidemiol 2001; 54(9): 902–906.

50.

Bowlin

Morrill

Nafziger

. Reliability and changes in validity of self-reported cardiovascular disease risk factors using dual response: the behavioral risk factor survey. J Clin Epidemiol 1996; 49(5): 511–517.

51.

Khardori

Type 2 diabetes mellitus, http://emedicine.medscape.com/article/117853-overview (2011, accessed 24 December 2011).

52.

Sheikh

Lund

Braaten

The predictive effect of body mass index on type 2 diabetes in the Norwegian women and cancer study. Lipids Health Dis 2014; 13: 164.

53.

Lund

The Norwegian Women and Cancer study, NOWAC, http://site.uit.no/nowac/; http://uit.no/ (accessed 14 October 2011).

54.

The Norwegian Women and Cancer study (NOWAC). The Norwegian Women and Cancer study. Tromsø: Department of Community medicine, Faculty of medicine, The University of Tromsø, 2008 (updated 4 July 2008), http://site.uit.no/nowac/ (accessed 21 December 2015).

55.

Lund

Dumeaux

Braaten

. Cohort profile: the Norwegian Women and Cancer Study—NOWAC—Kvinner og kreft. Int J Epidemiol 2008; 37(1): 36–41.

56.

Lund

Kumle

Braaten

. External validity in a population-based national prospective study—the Norwegian Women and Cancer Study NOWAC. Cancer Causes Control 2003; 14(10): 1001–1008.

57.

Skeie

Mode

Henningsen

. Validity of self-reported body mass index among middle-aged participants in the Norwegian Women and Cancer study. Clin Epidemiol 2015; 7: 313–323.

58.

Hjartåker

Andersen

Lund

Comparison of diet measures from a food-frequency questionnaire with measures from repeated 24-hour dietary recalls. The Norwegian Women and Cancer Study. Public Health Nutr 2007; 10(10): 1094–1103.

59.

Brustad

Skeie

Braaten

. Comparison of telephone vs face-to-face interviews in the assessment of dietary intake by the 24h recall EPIC SOFT program—the Norwegian calibration study. Eur J Clin Nutr 2003; 57(1): 107–113.

60.

Fleiss

Levin

Paik

MC.

Statistical methods for rates and proportions (ed Shewart

Wilks

). 3rd ed. Hoboken, NJ: John Wiley & Sons, 2003, 800 pp.

61.

Reichenheim

ME.

Confidence intervals for the kappa statistic. Stata J 2004; 4(4): 421–428.

62.

Simon

What is a kappa coefficient? (Cohen’s kappa). Kansas City, MO: Children’s Mercy Hospitals and Clinics, 2008 (updated 14 July 2008), http://www.childrensmercy.org/stats/definitions/kappa.htm (accessed 24 December 2011).

63.

StatsDirect Limited. Kappa and Maxwell. Altrincham: StatsDirect Limited, 2011. (updated 2011), http://www.ukph.org/help/statsdirect.htm#agreement/kappa.htm (accessed 24 December 2011).

64.

Lab Tests Online. Diabetes, http://labtestsonline.org/understanding/conditions/diabetes?start=1 (accessed 24 December 2011).

65.

World Health Organization (WHO). Diabetes. Geneva: WHO, 2011 (updated August 2011), http://www.who.int/mediacentre/factsheets/fs312/en/index.html (accessed 23 December 2011).

66.

Chyun

Wackers

Inzucchi

. Autonomic dysfunction independently predicts poor cardiovascular outcomes in asymptomatic individuals with type 2 diabetes in the DIAD study. SAGE Open Med. Epub ahead of print 24 February 2015. DOI: 10.1177/2050312114568476.

67.

AL-Aboudi

Hassali

Shafie

. A cross-sectional assessment of health-related quality of life among type 2 diabetes patients in Riyadh, Saudi Arabia. SAGE Open Med. Epub ahead of print 9 October 2015. DOI: 10.1177/2050312115610129.

68.

Shaw

Epidemiology of childhood type 2 diabetes and obesity. Pediatr Diabetes 2007; 8: 7–15.

69.

Jenssen

Tonstad

Claudi

. The gap between guidelines and practice in the treatment of type 2 diabetes: a nationwide survey in Norway. Diabetes Res Clin Pract 2008; 80(2): 314–320.

70.

World Health Organization (WHO) Regional Office for Europe. Facts and figures Copenhagen, http://www.euro.who.int/en/what-we-do/health-topics/noncommunicable-diseases/diabetes/facts-and-figures (2011, accessed 24 December 2011).

71.

Ben-Haroush

Yogev

Hod

Epidemiology of gestational diabetes mellitus and its association with Type 2 diabetes. Diabet Med 2004; 21(2): 103–113.

72.

Nasjonalt folkehelseinstituttet (Norwegian Institute of Public Health). Forekomsten av type 1-diabetes har økt med omlag 30 prosent 2008, http://www.fhi.no/eway/default.aspx?pid=233&trg=MainLeft_5565&MainArea_5661=5565:0:15,1212:1:0:0:::0:0&MainLeft_5565=5544:64250::1:5569:1:::0:0 (accessed 24 December 2011).

73.

Sheikh

Abelsen

Olsen

JA.

Role of respondents’ education as a mediator and moderator in the association between childhood socio-economic status and later health and wellbeing. BMC Public Health 2014; 14(1): 1172.

74.

Royston

White

IR.

Multiple Imputation by Chained Equations (MICE): implementation in Stata. J Stat Software 2011; 45(4): 1–20.

75.

White

Royston

Wood

AM.

Multiple imputation using chained equations: issues and guidance for practice. Stat Med 2011; 30(4): 377–399.

76.

Carpenter

Kenward

MG.

Multiple imputation and its application. Sussex: John Wiley & Sons, 2013, 364 pp.

77.

StataCorp. Stata 14 multiple-imputation reference manual. College Station, TX: Stata Press, 2015, http://www.stata.com/manuals14/mi.pdf

78.

Uebersax

Kappa coefficients: John Uebersax Enterprises LLC (updated 18 March 2010), http://www.john-uebersax.com/stat/kappa.htm (2010, accessed 24 December 2011).

79.

Schneider

ALC

Pankow

Heiss

. Validity and reliability of self-reported diabetes in the Atherosclerosis Risk in Communities Study. Am J Epidemiol 2012; 176(8): 738–743.

80.

Kargman

Sacco

Boden-Albala

. Validity of telephone interview data for vascular disease risk factors in a racially mixed urban community: the Northern Manhattan Stroke Study. Neuroepidemiology 1999; 18(4): 174–184.

Test–retest reliability of self-reported diabetes diagnosis in the Norwegian Women and Cancer Study: A population-based longitudinal study (n =33,919)

Abstract

Objective:

Methods:

Results:

Conclusion:

Keywords

Introduction

Methods

Study cohort and sampling

Questionnaire and classification

Diabetes

Diabetes in the 1998 retest study (for comparison against the 1991 test study)

Diabetes in the 2005 retest study (for comparison against the 1991 test study)

Diabetes in the 2005 retest study (for comparison against 1998 test study)

Covariates

Statistical analysis

Sensitivity analysis

Subgroup analysis

Ethical approval

Results

Discussion

Strengths and limitation of this study

Conclusion

Footnotes

Acknowledgements

Authors’ contributions

Declaration of conflicting interests

Ethics approval

Funding

Informed consent

References