Is routine hospital episode data sufficient for identifying individuals with chronic kidney disease? A comparison study with laboratory data

Abstract

Internationally, investment in the availability of routine health care data for improving health, health surveillance and health care is increasing. We assessed the validity of hospital episode data for identifying individuals with chronic kidney disease compared to biochemistry data in a large population-based cohort, the Grampian Laboratory Outcomes, Morbidity and Mortality Study-II (n = 70,435). Grampian Laboratory Outcomes, Morbidity and Mortality Study-II links hospital episode data to biochemistry data for all adults in a health region with impaired kidney function and random samples of individuals with normal and unmeasured kidney function in 2003. We compared identification of individuals with chronic kidney disease by hospital episode data (based on International Classification of Diseases-10 codes) to the reference standard of biochemistry data (at least two estimated glomerular filtration rates <60 mL/min/1.73 m² at least 90 days apart). Hospital episode data, compared to biochemistry data, identified a lower prevalence of chronic kidney disease and had low sensitivity (<10%) but high specificity (>97%). Using routine health care data from multiple sources offers the best opportunity to identify individuals with chronic kidney disease.

Keywords

databases and data mining ehealth electronic health records record linkage secondary care

Introduction

Chronic kidney disease (CKD) has been identified as a worldwide public health problem with a rising incidence and prevalence¹ and is associated with high morbidity (cardiovascular disease, need for renal replacement therapy (RRT)), mortality and health care costs (estimated for England 2009–2010 to be £1.45 billion²). Risk factors for CKD include diabetes, vascular disease, hereditary renal diseases, smoking and hypertension. In 2002, the Kidney Disease Outcomes Quality Initiative (KDOQI) defined and classified CKD based on kidney damage (structural or functional abnormalities of the kidney) with glomerular filtration rate (GFR, a measure of kidney function) ≥60 mL/min/1.73 m² (stages 1–2) or GFR <60 mL/min/1.73 m² alone (stages 3–5), present for at least 3 months.¹ Estimates of prevalence, based on the first part of this definition, in the United States suggest the prevalence of CKD stages 1–4 increased from 10.0 per cent in 1988–1994 to 13.1 per cent in 1999–2004.³ However, other studies have reported varied prevalence rates of CKD (0.6%–42.6%).⁴ In United Kingdom general practices, only 2.9 per cent are registered as having CKD.⁵ Part of the variation in prevalence estimates may be due to how CKD is defined and the data sources used to identify individuals with CKD.

For many conditions, information on disease prevalence is estimated from disease registries, general practitioner (GP) registers and/or coding of hospital episodes (HEs). The use of HE data (recorded in Scotland as the Scottish Morbidity Record (SMR01)), either as single episodes or longitudinally linked episodes to identify comorbidities has been used extensively in research.⁶ For acute events that almost exclusively require hospital admission (e.g. hip fracture), this may be a valid source of information.⁷ For chronic diseases such as CKD, HE data may require supplementation from other sources of data to fully elucidate disease load and facilitate early identification. The United Kingdom government and others internationally have invested in routine health care data (i.e. funding opportunities, investment in digital health systems) since it is thought to be important for health and health care through research, health surveillance and health care planning.^8–12

For individuals with CKD, early detection and management is believed to be important to reduce morbidity and slow progression to RRT.¹³ However, the forum of care may vary with all patients requiring GP care and more advanced patients potentially requiring assessment by nephrology care. In the United Kingdom, there is no standard surveillance system for the identification of people with CKD. Ideally, those with CKD would be identified clinically from a combination of sources including biochemistry testing for estimated glomerular filtration rate (eGFR) and albuminuria; however, this relies on clinicians identifying and noting abnormal results and that these are sustained abnormalities rather than an acute change. This is sometimes difficult to achieve in regions where biochemistry testing is done by multiple providers and where not all results are returned to a single clinician responsible for compiling results. An alternative means of identifying those with CKD would be to flag those who have routine HE data consistent with this CKD diagnosis and subsequently informing GPs for follow-up and confirmation.

Two recent systematic reviews,^14,15 and recent studies,^16–19 have evaluated the degree to which administrative coding accurately identified individuals with kidney diseases, reporting a large variation in sensitivity (3%–88%). Only a few studies have compared hospital administrative data to laboratory data employing the 2002 KDOQI definition of CKD stages 3–5, of at least two eGFR <60 mL/min/1.73 m² at least 90 days apart.^18,20,21 Of these, only Ronksley et al.¹⁸ did so in a community cohort, in Canada. Using a community-based population increases the generalisability of results as opposed to relying on, for example, a selected in-patient population. We did not identify any studies from the United Kingdom that compared HE data to laboratory data.

With the growing emphasis on the use of routine administrative data, validation studies become increasingly important in order to provide information on the accuracy and validity of findings that are based exclusively on these data. As administrative data have the potential to be a rich source of data for population-based research in CKD, we aimed to assess the validity of diagnostic algorithms for CKD in HE data compared to biochemistry data in a large population-based cohort in Grampian, Scotland.

Methods

We carried out a validation study within an existing cohort developed by data linkage of biochemistry, HE and death registry data.

Study population – Grampian Laboratory Outcomes, Morbidity and Mortality Study-II cohort

All in-patient, out-patient and community serum creatinine (isotope dilution mass spectrometry (IDMS) aligned) and urinary protein measurements in the Grampian region, served by a single United Kingdom National External Quality Assessment Service monitored biochemistry service, are contained in the Grampian Laboratory Renal Database for 1999 to 2009. This database was queried to identify the Grampian Laboratory Outcomes, Morbidity and Mortality Study-II (GLOMMS-II) cohort, which comprised all adults (>15 years) with impaired kidney function in 2003; a random sample of individuals with normal or no measure of kidney function in 2003 (but prior and post 2003 sampling); all those with proteinuria but normal kidney function in 2003; and all individuals on RRT at 1 January 2003 (identified from Scottish Renal Registry and local renal system). Where present, the first ‘low’ eGFR <60 mL/min/1.73 m² in 2003 was taken as the index value and date. Where all values in 2003 were normal, the last value and date were taken as the index. Where no samples were taken in 2003, the index date was taken as 31 December 2003 to allow the potential for the individual to be sampled.

Defining CKD from biochemistry data

eGFR was calculated using the four-variable IDMS aligned Modification of Diet in Renal Disease (MDRD) formula (serum creatinine, age, sex and race). CKD was defined and staged according to KDOQI.¹ CKD stages 3–5 (including 3a and 3b) were defined as an index eGFR (<60 mL/min/1.73 m²) in 2003 followed after 90 days by another low eGFR (<60 mL/min/1.73 m²), or if there were no further eGFR values after 90 days post-index, the last eGFR prior to 90 days pre-index also being low, that is, between the start of the database records in 1999 and the index value. CKD stages 1–2 were defined as an index eGFR (>60 mL/min/1.73 m²) with microalbuminuria or macroalbuminuria on urine albumin-creatinine ratio (ACR) or protein-creatinine ratio (PCR) testing. Individuals were categorised as not having CKD if their index eGFR was not measured, was normal or was impaired but not CKD (at least one eGFR <60 mL/min/1.73 m² but with no evidence that this was sustained for at least 3 months).

Defining CKD from HE data

In the United Kingdom, information about an episode of hospital care is recorded following a patient’s discharge. In Scotland, this information is recorded in the SMR01, which is collated nationally by the Information Services Division (ISD), part of NHS National Services Scotland. SMR01 is an episode-based patient record relating to all in-patient and day case discharges. This information contributes to NHS Scotland’s Performance Assessment Framework, clinical governance and performance indicators, and for planning and research purposes.²² Diagnoses are coded using International Classification of Diseases, 10th Revision (ICD-10) and procedures coded using the Office of Population Censuses and Surveys (OPCS) Classification of Interventions and Procedures. We defined CKD for each patient from HE data for two time periods: 2003 (including admission at index) and a 5-year ‘look-back’ period.

To identify potentially relevant codes to define CKD, an experienced nephrologist reviewed all ICD-10 and OPCS codes. Three groups of codes (algorithms) were developed (Table 1): first, a broad definition encompassing most diseases which might include renal complications (‘all codes’); second, an algorithm to define renal disease based on a Charlson comorbidity algorithm²³ (‘renal disease’); and third, an algorithm highly likely to identify CKD (‘chronic kidney disease’).

Table 1.

Renal disease-related ICD-10 and OPCS codes (algorithms).

ICD-10/OPCS code	Definition	Coding algorithm definition
		All codes	Renal disease	Chronic kidney disease
E10.2	Diabetes type 1 with renal complications	•		•
E11.2	Diabetes type 2 with renal complications	•		•
E14.2	Diabetes with renal complications	•		•
I12.0	Hypertensive renal disease with renal failure	•	•	•
I13.1	Hypertensive heart and renal disease with renal failure	•	•	•
M02 (OPCS)	Nephrectomy	•
N00 to N08	Glomerular diseases	•
N03.2–N03.7	Chronic nephritic syndrome: diffuse glomerulonephritis or dense deposit disease	•	•
N05.2–N05.7	Unspecified nephritic syndrome: diffuse glomerulonephritis or dense deposit disease	•	•
N11	Chronic tubule-interstitial nephritis	•
N13	Obstructive and reflux uropathy	•
N13.7	Vesicoureteral-reflux-associated uropathy	•
N18.x	Chronic renal failure	•	•	•
N19.x	Unspecified renal failure	•	•
N20	Calculus of kidney and ureter (includes nephrolithiasis)	•
N21	Calculus of lower urinary tract	•
N22	Calculus of urinary tract in diseases classified elsewhere	•
N23	Unspecified renal colic	•
N25.0	Renal osteodystrophy	•	•
N26	Unspecified contracted kidney	•
N27	Small kidney of unknown cause	•
N28	Ischaemia and infarction of the kidney	•
Q60	Renal agenesis and other reduction defects of the kidney	•
Q61	Cystic kidney disease	•
Q62	Congenital obstructive defects of the renal pelvis and congenital malformation of ureter	•
Q63	Other congenital malformations of the kidney	•
Q64	Other congenital malformations of urinary system	•
Z49.0–Z49.2	Care involving dialysis	•	•
Z90.5	Acquired absence of kidney	•
Z94.0	Kidney transplant status	•	•
Z99.2	Dependence on renal dialysis	•	•

ICD: International Classification of Diseases; OPCS: Office of Population Censuses and Surveys.

Data linkage

The Community Health Index (CHI) number, a unique patient identifier used throughout the Scottish health care system, was used to link GLOMMS-II with HE data using deterministic matching. Patient identifiers were removed after data linkage. The dataset was stored in the Grampian Data Safe Haven allowing secure controlled access for researchers while ensuring data security.²⁴

The flow diagram for generating GLOMMS-II is shown in Figure 1. From the database query, 71,251 individuals were identified. There were 471 excluded from the analysis because of missing information on index date, duplication or death on index date. The 345 people already on RRT at index (thus end-stage renal disease, not just CKD) were excluded from the analysis (74.8% had a ‘CKD’ code from SMR01). Overall, 70,435 individuals were included in this study.

Figure 1.

GLOMMS-II flow diagram.

Statistical analysis

Descriptive statistics were used to describe demographic, proteinuria/albuminuria status, creatinine, eGFR and comorbidity variables stratified by renal risk group (CKD stages 1–5/normal eGFR, impaired eGFR or eGFR not measured). Comorbidity was based on the Charlson comorbidity index,²⁵ which is a weighted index that takes into account the number and seriousness of comorbid disease. The proportion of the cohort with CKD identified by biochemistry data and the proportion of the cohort with CKD identified by HE data were calculated. The validity of HE data–identified CKD was assessed for the three coding algorithms and two time periods: 2003 (including admission at index) and a 5-year ‘look-back’ period.

Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were calculated against the reference standard of CKD (biochemistry data). Kappa values, κ (a measure of agreement between two sets of categorical measurements on the same individuals),²⁶ were calculated. We categorised agreement as poor if κ ≤ 0.20, fair if 0.21 ≤ κ ≤ 0.40, moderate if 0.41 ≤ κ ≤ 0.60, substantial if 0.61 ≤ κ≤ 0.80 and good if κ > 0.80.²⁷

The validity of HE data–defined CKD within specific subgroups was considered, including CKD stage (stages 1–2, 3a, 3b, 4 and 5) and age (<75 or ≥75 years). To explore sensitivity further, analyses were repeated comparing HE data to an alternative definition for biochemistry-defined CKD, which excluded those with impaired eGFR and those with eGFR not measured from the no-CKD definition. Analyses were performed using Stata version 13²⁸ and Microsoft Excel.

Results

A total of 70,435 individuals were included. The characteristics of the study population are shown in Table 2. Based on biochemistry data, 28 per cent (19,694) of the cohort had CKD stages 1–5 (which equates to 4.5% of the adult Grampian population in 2003 (433,109)²⁹). Overall, the median age of the cohort was 63.3 years and 58.4 per cent were female. As expected, those with CKD were older than those with normal eGFR or ‘not measured’. Charlson comorbidity categories for CKD stages 1–5 and impaired eGFR were similarly distributed with more than two-thirds of individuals with a score of zero. Those with normal eGFR or ‘not measured’ in 2003 had the lowest Charlson scores. Of note, there were 63 individuals with macroalbuminuria but no eGFR measured. Of those with CKD identified by biochemistry, 6767 individuals had no hospital admission in the 5 years prior to 2003.

Table 2.

Characteristics of study population.

Characteristic	Total	CKD		Not CKD
		CKD stages 3–5	CKD stages 1–2	Impaired	Normal eGFR	eGFR not measured
	n (%)	n (%)	n (%)	n (%)	n (%)	n (%)
Total	70,435 (100.0)	18,687 (100.0)	1007 (100.0)	10,857 (100.0)	19,834 (100.0)	20,050 (100.0)
Sex
Male	29,322 (41.6)	6580 (35.2)	649 (64.5)	4323 (39.8)	9346 (47.1)	8424 (42.0)
Female	41,113 (58.4)	12,107 (64.8)	358 (35.6)	6534 (60.2)	10,488 (52.9)	11,626 (58.0)
Age at index (years), median (IQR)	63.3 (47.8–75.5)	75.7 (68.5–82.0)	61.0 (49.3–69.8)	71.0 (60.7–79.6)	53.1 (38.9–65.5)	52.0 (39.0–64.8)
15–44 years	15,259 (21.7)	305 (1.6)	195 (19.4)	643 (5.9)	6909 (34.8)	7207 (36.0)
45–54 years	9690 (13.8)	660 (3.5)	169 (16.8)	980 (9.0)	3802 (19.2)	4079 (20.3)
55–64 years	12,375 (17.6)	2201 (11.8)	259 (25.7)	2147 (19.8)	3966 (20.0)	3082 (19.0)
65–74 years	14,788 (21.0)	5630 (30.1)	249 (24.7)	2945 (27.1)	3207 (16.2)	2757 (13.8)
75–84 years	13,404 (19.0)	7119 (38.1)	123 (12.2)	2825 (26.0)	1624 (8.2)	1713 (8.5)
≥85 years	4919 (7.0)	2772 (14.8)	12 (1.2)	1317 (12.1)	326 (1.6)	492 (2.5)
PCR at index (n = 1845), median (IQR)	22 (10–62)	27 (11–72)	114 (73–212)	16 (9–38)	10 (6–18)	17 (8–38)
ACR at index (n = 5439), median (IQR)	1 (1–6)	1 (1–6)	8 (5–17)	1 (1–3)	1 (1–1)	4 (1–10)
Proteinuria status
Untested	63,158 (89.7)	15,412 (82.5)	0 (0.0)	9593 (88.4)	18,602 (93.8)	19,551 (97.5)
Normoalbuminuric	4580 (6.5)	2125 (11.4)	0 (0.0)	942 (8.7)	1232 (6.2)	281 (1.4)
Microalbuminuric	1725 (2.5)	602 (3.2)	768 (76.3)	200 (1.8)	0 (0.0)	155 (0.8)
Macroalbuminuric	972 (1.4)	548 (2.9)	239 (23.7)	122 (1.1)	0 (0.0)	63 (0.3)
Creatinine at index, median (IQR)	85.5 (71.4–103.8)	108.1 (91.9–126.4)	79.0 (68.2–87.6)	102.7 (87.6–115.7)	73.6 (65.0–84.4)	74.7 (65.0–85.5)
eGFR (mL/min/1.73 m²) at index, median (IQR)	66.8 (53.5–85.2)	49.7 (41.6–55.2)	79.8 (71.2–91.2)	55.3 (49.9–58.1)	82.2 (72.7–94.3)	82.3 (71.4–95.2)
Charlson comorbidity index group
0	56,242 (79.9)	12,667 (67.8)	671 (66.6)	7190 (66.2)	17,074 (86.1)	18,640 (93.0)
1–2	11,308 (16.1)	4763 (25.5)	275 (27.3)	2693 (24.8)	2281 (11.5)	1296 (6.5)
3–4	1943 (2.8)	946 (5.1)	45 (4.5)	598 (5.5)	285 (1.4)	69 (0.3)
≥5	942 (1.3)	311 (1.7)	16 (1.6)	376 (3.5)	194 (1.0)	45 (0.2)

CKD: chronic kidney disease; eGFR: estimated glomerular filtration rate; IQR: interquartile range; PCR: protein-creatinine ratio; ACR: albumin-creatinine ratio.

Renal risk groups based on biochemistry data.

As shown in Table 3, based on the reference standard of biochemistry-defined CKD, 28 per cent (19,694) of the cohort had CKD stages 1–5. The proportion of the cohort identified with probable CKD by HE data was substantially lower, ranging from 0.8 per cent to 4.1 per cent over the three coding algorithms and two time periods.

Table 3.

Validity of HE data definition for CKD compared to the reference standard of biochemistry.

Algorithm/time period	Biochemistry+ HE coding+	Biochemistry+ HE coding−	Biochemistry− HE coding−	Biochemistry− HE coding+	Proportion of cohort identified with CKD		HE coding
	True positive	False negative	True negative	False positive	Biochemistry	HE coding	PPV (%)	NPV (%)	Sensitivity (%)	Specificity (%)	Kappa*
					n (%)	n (%)
					19,694 (28.0)
All codes
2003	840	18,854	50,249	492		1332 (1.9)	63.06	72.72	4.27	99.03	0.0461
2003 plus 5-year look-back	1689	18,005	49,508	1233		2922 (4.1)	57.80	73.33	8.58	97.57	0.0831
Renal disease
2003	595	19,099	50,527	214		809 (1.1)	73.55	72.57	3.02	99.58	0.0368
2003 plus 5-year look-back	1022	18,672	50,378	363		1385 (2.0)	73.79	72.96	5.19	99.28	0.0625
Chronic kidney disease
2003	441	19,253	50,643	98		539 (0.8)	81.82	72.45	2.24	99.81	0.0291
2003 plus 5-year look-back	676	19,018	50,583	158		834 (1.2)	81.06	72.68	3.43	99.69	0.0441

HE: hospital episode; CKD: chronic kidney disease; PPV: positive predictive value; NPV: negative predictive value; ICD: International Classification of Diseases; OPCS: Office of Population Censuses and Surveys.

Biochemistry definition of CKD/no-CKD: stages 1–5/normal, impaired or not measured (see ‘Methods’ section).

HE coding (SMR01) definition of CKD/no-CKD: ICD and OPCS codes as detailed in Table 1/no coding or no admission (see ‘Methods’ section).

Interpretation of kappa: agreement poor if κ ≤ 0.20, fair if 0.21 ≤ κ ≤ 0.40, moderate if 0.41 ≤ κ ≤ 0.60, substantial if 0.61 ≤ κ ≤ 0.80 and good if κ > 0.80.

HE data–identified CKD was generally less common compared to biochemistry-defined CKD and varied across coding algorithms and time periods (Table 3). The sensitivity of HE coding compared to biochemistry for identifying CKD was low, ranging from 2.2 per cent to 8.6 per cent. Specificity of coding was >97 per cent for all coding algorithms and time periods. All algorithms improved by adding a 5-year look-back period in addition to just SMR01 records from 2003, showing higher sensitivities. The very inclusive ‘all codes’ algorithm was most sensitive but least specific, followed by the ‘renal disease’ and ‘chronic kidney disease’ algorithms, which were most specific. Overall, the agreement between HE data- and biochemistry-defined CKD was very poor (kappa values <0.1) because of low numbers identified with HE data, despite excellent specificity.

Sensitivity analyses were carried out comparing HE data to an alternative definition for biochemistry-defined CKD, excluding those with impaired eGFR and those with eGFR not measured from the no-CKD definition. However, this, as expected, only improved the PPV further and reduced the NPV further for the HE data; for the “chronic kidney disease” algorithm defined CKD using 2003 plus 5-year look-back data, the PPV was 99.56 per cent (vs 81.06%) and the NPV was 51.05 per cent (vs 72.68%).

The validity of the ‘renal disease’ and ‘chronic kidney disease’ coding algorithms for 2003 plus 5-year look-back period were assessed within age and CKD stage subgroups (Table 4). Among those with biochemistry-identified CKD, the ‘renal disease’ algorithm identified similar but slightly more individuals than the ‘chronic kidney disease’ algorithm. Worse CKD stage was associated with better identification (sensitivity) using both HE-based algorithms (4.8% of stage 3b compared to 56.9% of stage 5 CKD, for the ‘chronic kidney disease’ algorithm). For biochemistry-identified CKD stages 3b to 5, younger age (<75 vs ≥75 years) was associated with a higher sensitivity using the HE recording algorithms.

Table 4.

Validity of HE coding definition (2003 plus 5-year look-back) for CKD compared to the reference standard of biochemistry by stage and age group.

	Biochemistry+		PPV (%)	NPV (%)	Sensitivity (%)	Specificity (%)
	HE coding+	HE coding−
Renal disease
Stage 5	88	56	19.5	99.9	61.1	99.3
Age <75 years	53	29	25.4	99.9	64.6	99.6
Age ≥75 years	35	27	14.5	99.7	56.5	97.5
Stage 4	330	916	47.6	98.2	26.5	99.3
Age <75 years	137	254	46.8	99.4	35.0	99.6
Age ≥75 years	193	662	48.3	92.4	22.6	97.5
Stage 3b	388	4563	51.7	91.7	7.8	99.3
Age <75 years	176	1553	53.0	96.5	10.2	99.6
Age ≥75 years	212	3010	50.6	72.9	6.6	97.5
Stage 3a	207	12,139	36.3	80.6	1.7	99.3
Age <75 years	107	6487	40.7	86.7	1.6	99.6
Age ≥75 years	100	5652	32.6	58.9	1.7	97.5
Stages 1–2	9	998	2.4	98.1	0.9	99.3
Age <75 years	7	865	4.3	98.0	0.8	99.6
Age ≥75 years	<5	133	1.0	98.4	1.5	97.5
CKD
Stage 5	82	62	34.2	99.9	56.9	99.7
Age <75 years	53	29	50.0	99.9	64.6	99.9
Age ≥75 years	29	33	21.6	99.6	46.8	98.7
Stage 4	254	992	61.7	98.1	20.4	99.7
Age <75 years	111	280	67.7	99.3	28.4	99.9
Age ≥75 years	143	712	57.7	92.0	16.7	98.7
Stage 3b	239	4712	60.2	91.5	4.8	99.7
Age <75 years	118	1611	69.0	96.3	6.8	99.9
Age ≥75 years	121	3101	53.5	72.5	3.8	98.7
Stage 3a	100	12,246	38.8	80.5	0.8	99.7
Age <75 years	59	6535	52.7	86.6	0.9	99.9
Age ≥75 years	41	5711	28.1	58.9	0.7	98.7
Stages 1–2	<5	1006	0.6	98.0	0.1	99.7
Age <75 years	<5	871	1.9	98.0	0.1	99.9
Age ≥75 years	0	135	0.0	98.4	0.0	98.7

Biochemistry definition of CKD/no-CKD: stages 1–5/normal, impaired or not measured (see ‘Methods’ section). HE coding (SMR01) definition of CKD/no-CKD: ICD and OPCS codes as detailed in Table 1/no coding or no admission (see ‘Methods’ section).

Discussion

We used a large United Kingdom community cohort to demonstrate whether the use of coding algorithms to identify renal disease, in particular CKD, from HE data was a useful alternative should biochemistry data be difficult to access. We found that HE data coding algorithms were very specific for CKD; however, sensitivities were very poor (at best only 8.6% identified), as was agreement. Of interest, the proportion of those with CKD identified through biochemistry data, who were also identified with HE coding, was higher at more advanced CKD stages and in those under 75 years of age.

CKD is recorded poorly in HE data. This may be because CKD is often not the main reason for admission. This is likely to be similar for other chronic diseases such as diabetes and hypertension, unlike acute events such as hip fracture. Also, the recognition of CKD in the time prior to eGFR reporting (2008) was poor and may have improved in the time since then. Those with more advanced renal disease are also more likely to be frequent in-patients as a result of the higher comorbidity load³⁰ and increased complications of their renal disease, thus the more likely that renal disease will be recognised during the admission episode coding.

Comparison with existing literature

Few studies^18,20,21 have validated hospital administrative data compared with a reference standard of biochemistry data employing the KDOQI definition of CKD, of at least two eGFR <60 mL/min/1.73 m² at least 90 days apart, and none included CKD stages 1–2 (those with proteinuria). In keeping with our findings, where reported, sensitivities are low and specificities are high for HE data compared to biochemistry-defined CKD.^14,15,18,19 We also found high PPVs, which means that individuals who are identified as having CKD from HE coding do have CKD according to biochemistry data, thus any diagnosis based on coding should be accurate using the algorithms outlined, although very unsensitive. The range of PPV values reported in other CKD validation studies has been broad (29%–100%).^15,18

Our study used a very large population-based cohort. Only one other study has used a community-based population.¹⁸ However, Ronksley et al. looked for HE data after the biochemistry identification of CKD. Therefore, they were assessing whether those with CKD were being identified at their next hospital admission, not whether a prevalence cohort with CKD was identifiable equally from biochemistry or HE coding.¹⁸ This use of a 3-year window after biochemistry-identified disease would perhaps identify patients too late for intervention, thus our method is perhaps more applicable for identifying those with disease.

We have demonstrated that those with more advanced CKD are more likely to be captured by HE data, also reported by others.^18,21 This is in keeping with the fact that at the time of this study, eGFR reporting had not been instigated in the United Kingdom, and as such, the identification of CKD would be expected only in those with more advanced CKD, both by clinicians and SMR01 coders. Ronksley et al.¹⁸ reported that estimates of sensitivity were higher when eGFR <30 mL/min/1.73 m² was used as the reference standard compared with using <60 mL/min/1.73 m². Ferris et al.²¹ reported a similar pattern in in-patients.

It has been reported that older age was not significantly associated with a greater likelihood of being labelled with CKD.²¹ However, this was a study of in-patients, therefore the risk profile identified with biochemistry might have been different. Our finding that younger individuals with CKD were identified better on HE data than older individuals has been previously reported.¹⁸ For younger individuals, CKD is likely to be more of a significant problem than for those who are elderly with CKD with the same degree of renal impairment. It may also reflect that those with CKD at younger ages are likely to have fewer comorbidities when admitted to hospital and therefore have this recognised when discharge coding is carried out.³¹

Denburg et al.¹⁷ looked at the recording of biochemistry results at a general practice level compared to the recognition of CKD on general practice coding, which again found low sensitivity but excellent specificity and high PPV. It is unclear, however, how many of the biochemistry results had been entered into GP systems manually.

Strengths and limitations

This study has many strengths. It is one of only a few studies assessing agreement between biochemistry-defined CKD that was required to be present for greater than 3 months compared to HE data.^18,20,21 It is a very large population-based cohort, not limited to a specific patient group, and since ICD-10 coding is used, we might expect these findings to be potentially generalisable to other chronic diseases, for example, diabetes, and across the world. The universal nature of the biochemistry service to the region ensures that those living within the region who have testing of renal function would have results available for consideration, and where repeated these would be available, assisting in the identification of those with truly chronic kidney disease.

There are, however, limitations to this study. Calculating eGFR using the MDRD equation is reflective of current United Kingdom practice and thus the individuals currently identified as having CKD; however, there are others outside the United Kingdom who support the use of the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation. It would be expected that both eGFR equations would identify similar individuals with CKD, particularly at more advanced stages, and it is unlikely that the results would be significantly different.³² The use of only HE data as a source of confirmatory CKD recording, although fulfilling the aim of this article to ascertain its validity, meant that other routine sources of such data such as GP coding were not assessed. Although this would be a useful additional source of data, it was not available to us, would require assessment in its own right and has been explored at least at a GP biochemistry recording level before.¹⁷ Our biochemistry definition of no-CKD was all-inclusive, including impaired eGFR (at least one eGFR <60 mL/min/1.73 m² but not sustained) and eGFR not measured. However, we performed sensitivity analyses, defining ‘no-CKD’ as those with normal eGFR only and found that this only improved PPV and worsened NPV. Sensitivity and specificity were similar. As noted previously, the recognition of CKD in the time prior to eGFR reporting (2008) was poor and may have improved in the time since then. However, this is unlikely to change the greater sensitivity of eGFR reporting over SMR01.

Implications for future research or clinical practice

As mentioned in the introduction, HE data may be sufficient for acute hospital care requiring events. However, for chronic conditions, as illustrated here with CKD, the use of corroborating additional data when admissions are due to another event or comorbidity may be necessary.

As demonstrated, HE coding data are very specific with high PPV for the identification of individuals with CKD. This has implications for both clinical practice and future research. With clinical practice, it is insufficient to use HE data alone to identify those with CKD, and access to current and historical biochemistry data is essential to identifying CKD appropriately. However, the use of HE data as an additional flag is potentially useful for identifying high-risk individuals. Another issue for clinical practice is patient safety, particularly with the prescribing of drugs that are either nephrotoxic or with significant renal clearance. The use of both systems of identification should improve patient safety issues related to this. This also applies to preparation for surgical, radiological and oncological procedures.

For research, we have demonstrated that biochemistry data are crucial for describing the prevalence of CKD, and therefore, the health care burden associated with it, not just the few identified through HE data. Historically, CKD identified through HE coding described high RRT initiation rates. However, in cohorts identified through biochemistry more recently, the rates reported have been lower.³³ Whether this is due to the severity of CKD identified being different, or due to the disease processes being different, is not clear and requires further research. There are also implications for clinical trials, in that the event rate that sample sizes are based on may differ depending on the source of CKD identification.

The ideal for the future would be a unifying electronic patient health care record containing information on previous hospital identified events, general practice and also biochemistry results, to ensure accurate and timely identification of those with CKD.

Conclusion

The findings of this study suggest that routine HE data have limited value in the routine identification of individuals with CKD. However, where those with CKD have been identified using HE data, this information is highly specific. Other sources of routine health care data such as routine biochemistry data, including historical data, and not just that pertaining to a given event, should be available to clinicians caring for patients and are an important source for further research into clinical outcomes, including hospitalisations. The most important uses of this data are for planning, surveillance, screening and for research.

Footnotes

Acknowledgements

We thank the Information Services Division, Scotland, who provided the SMR01 data, and NHS Grampian, who provided the biochemistry data. We also thank the University of Aberdeen’s Data Management Team.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Ethical approval

The study protocol was reviewed by the Privacy Advisory Committee for Information Services Division (ISD), NHS Grampian, Caldicott Guardian. The North of Scotland Research Ethics Service reviewed the project and felt it was audit rather than research. The College Ethics Review Board of the University of Aberdeen, College of Life Sciences and Medicine also reviewed the protocol. There were no concerns.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Chief Scientists Office for Scotland (grant no. CZH/4/656).

References

National Kidney Foundation. K/DOQI clinical practice guidelines for chronic kidney disease: evaluation, classification, and stratification. Am J Kidney Dis 2002; 39(Suppl. 1): S1–S266.

Kerr

Bray

Medcalf

. Estimating the financial cost of chronic kidney disease to the NHS in England. Nephrol Dial Transplant 2012; 27(Suppl. 3): iii73–iii80.

Coresh

Selvin

Stevens

. Prevalence of chronic kidney disease in the United States. JAMA 2007; 298: 2038–2047.

McCullough

Sharma

Ali

. Measuring the population burden of chronic kidney disease: a systematic literature review of the estimated prevalence of impaired kidney function. Nephrol Dial Transplant 2012; 27: 1812–1821.

Walker

Bankart

Brunskill

. Which factors are associated with higher rates of chronic kidney disease recording in primary care? A cross-sectional survey of GP practices. Br J Gen Pract 2011; 61: 203–205.

Leal

Laupland

. Validity of ascertainment of co-morbid illness using administrative databases: a systematic review. Clin Microbiol Infect 2010; 16: 715–721.

Hudson

Avina-Zubieta

Lacaille

. The validity of administrative data to identify hip fractures is high – a systematic review. J Clin Epidemiol 2013; 66: 278–285.

Department of Health, NHS Improvement & Efficiency Directorate, Innovation and Service Improvement. Innovation health and wealth: accelerating adoption and diffusion in the NHS, http://webarchive.nationalarchives.gov.uk/20130107105354/http://www.dh.gov.uk/prod_consum_dh/groups/dh_digitalassets/documents/digitalasset/dh_134597.pdf (2011, accessed September 2014).

Medical Research Council. Funding opportunities – E-Health Informatics Research Centres (EHIRCs) call, http://www.mrc.ac.uk/Fundingopportunities/Calls/E-healthCentresCall/index.htm (2011, accessed May 2013).

10.

Medical Research Council. Strategic framework for health informatics in support of research, http://webarchive.nationalarchives.gov.uk/20130502104716/http://mrc.ac.uk/Utilities/Documentrecord/index.htm?d=MRC006669 (accessed December 2014).

11.

Medical Research Council. UK e-health records research capacity and capability, http://webarchive.nationalarchives.gov.uk/20130502104716/http://mrc.ac.uk/Utilities/Documentrecord/index.htm?d=MRC007896 (accessed December 2014).

12.

The White House, Office of Science and Technology Policy. Big data is a big deal, http://www.whitehouse.gov/blog/2012/03/29/big-data-big-deal (2012, accessed September 2014).

13.

Black

Sharma

Scotland

. Early referral strategies for management of people with markers of renal disease: a systematic review of the evidence of clinical effectiveness, cost-effectiveness and economic analysis. Health Technol Assess 2010; 14(21): 1–184.

14.

Grams

Plantinga

Hedgeman

. Validation of CKD and related conditions in existing data sets: a systematic review. Am J Kidney Dis 2011; 57: 44–54.

15.

Vlasschaert

Bejaimal

Hackam

. Validity of administrative database coding for kidney disease: a systematic review. Am J Kidney Dis 2011; 57: 29–43.

16.

Chase

Radhakrishnan

Shirazian

. Under-documentation of chronic kidney disease in the electronic health record in outpatients. J Am Med Inform Assoc 2010; 17: 588–594.

17.

Denburg

Haynes

Shults

. Validation of The Health Improvement Network (THIN) database for epidemiologic studies of chronic kidney disease. Pharmacoepidemiol Drug Saf 2011; 20: 1138–1149.

18.

Ronksley

Tonelli

Quan

. Validating a case definition for chronic kidney disease using administrative data. Nephrol Dial Transplant 2012; 27: 1826–1831.

19.

Fleet

Dixon

Shariff

. Detecting chronic kidney disease in population-based administrative databases using an algorithm of hospital encounter and physician claim codes. BMC Nephrol 2013; 14: 81.

20.

Kern

EFO

Maney

Miller

. Failure of ICD-9-CM codes to identify patients with comorbid chronic kidney disease in diabetes. Health Serv Res 2006; 41: 564–580.

21.

Ferris

Shoham

Pierre-Louis

. High prevalence of unlabeled chronic kidney disease among inpatients at a tertiary-care hospital. Am J Med Sci 2009; 337: 93–97.

22.

Information and Statistics Division. NHS Scotland data quality assurance report on acute inpatient/day case data 2000–2002. Edinburgh: NHS Scotland, http://www.isdscotlandarchive.scot.nhs.uk/isd/files//SMR01%20National%20Report.pdf (accessed December 2014).

23.

Quan

Sundararajan

Halfon

. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care 2005; 43: 1130–1139.

24.

University of Aberdeen. Grampian Data Safe Haven, http://www.abdn.ac.uk/iahs/facilities/grampian-data-safe-haven.php (2013, accessed September 2014).

25.

Charlson

Pompei

Ales

. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis 1987; 40: 373–383.

26.

Landis

Koch

. The measurement of observer agreement for categorical data. Biometrics 1977; 33: 159–174.

27.

Petrie

Sabin

. Medical Statistics at a Glance. Oxford, England: Blackwell Science Ltd., 2000.

28.

StataCorp

. Stata statistical software: release 13. College Station, TX: StataCorp LP, 2013.

29.

National Records of Scotland. Revised mid-2003 population estimates, council and health board areas, http://www.gro-scotland.gov.uk/statistics/theme/population/estimates/mid-year/archive/2003/index.html (2011, accessed October 2013).

30.

James

Quan

Tonelli

. CKD and risk of hospitalization and death with pneumonia. Am J Kidney Dis 2009; 54: 24–32.

31.

Soo

Robertson

Ali

. Approaches to ascertaining comorbidity information: validation of routine hospital episode data with clinician-based case note review. BMC Res Notes 2014; 7: 253.

32.

White

Polkinghorne

Atkins

. Comparison of the prevalence and mortality risk of CKD in Australia using the CKD Epidemiology Collaboration (CKD-EPI) and Modification of Diet in Renal Disease (MDRD) Study GFR estimating equations: the AusDiab (Australian Diabetes, Obesity and Lifestyle) Study. Am J Kidney Dis 2010; 55: 660–670.

33.

Marks

Black

Fluck

. Translating chronic kidney disease epidemiology into patient care – the individual/public health risk paradox. Nephrol Dial Transplant 2012; 27(Suppl. 3): iii65–iii72.