Abstract
This study examined the coding validity of hypertension, diabetes, obesity and depression related to the presence of their co-existing conditions, death status and the number of diagnosis codes in hospital discharge abstract database. We randomly selected 4007 discharge abstract database records from four teaching hospitals in Alberta, Canada and reviewed their charts to extract 31 conditions listed in Charlson and Elixhauser comorbidity indices. Conditions associated with the four study conditions were identified through multivariable logistic regression. Coding validity (i.e. sensitivity, positive predictive value) of the four conditions was related to the presence of their associated conditions. Sensitivity increased with increasing number of diagnosis code. Impact of death on coding validity is minimal. Coding validity of conditions is closely related to its clinical importance and complexity of patients’ case mix. We recommend mandatory coding of certain secondary diagnosis to meet the need of health research based on administrative health data.
Keywords
Introduction
Administrative health data including hospital discharge abstract database (DAD) have been widely collected and analyzed for various purposes, including disease surveillance, case-mix costing, tracking healthcare system performance, policy-making and research.1,2 The Public Health Agency of Canada (PHAC) has created the Canadian Chronic Disease Surveillance System (CCDSS) to conduct disease surveillance for 11 chronic conditions using administrative health data, such as physician claims data, hospital DAD. 3 CCDSS provides valuable information on national prevalence of chronic conditions and shows comparable trend with results from national survey. However, it should be cautious to interpret the results as underreporting of condition or misclassification of condition might underestimate or overestimate the disease prevalence. Under-coding of conditions, especially for asymptomatic conditions, has been identified as a major issue for administrative health data. 4 Hypertension, diabetes, obesity and depression were found to have no or protective effects on hospital death when administrative health data were used to conduct the risk adjustment for hospital mortality. 5 Use of administrative health data could result in underestimating prevalence for certain conditions, such as obesity. 6
In Canada and many other countries, administrative hospital data are produced by health information professionals through review, abstraction and coding of data from inpatient charts following hospital discharge. According to Canadian coding standards, codes for the main diagnosis, any pre-admission or post-admission comorbidities and service transfer are mandatory while codes for secondary diagnoses not requiring clinical evaluation, therapeutic treatment, or increased nursing care and monitoring are optional. 7 It has been suggested that this could lead to incompleteness for asymptomatic conditions or conditions mainly treated in primary care settings. 8 Furthermore, coders are also subject to time constraints (usually 15–20 min for one medical chart) due to a high volume of work. 9 This could also impact the number of codes in each record.
In this study, we focused on four commonly under-coded secondary conditions (hypertension, diabetes, obesity and depression) in DAD. We hypothesized that when diagnosis information was transferred from chart to coded data, coding validity for asymptomatic conditions with modest clinical acuity in DAD could be impaired if its associated conditions are coded. Based on chart review data, we used logistic regression to identify any conditions listed in Charlson and Elixhauser comorbidities indices that are documented together with the four study conditions.10,11 We examined the coding validity of the four study conditions related to whether their co-existing conditions were coded, whether the patient died in hospital and the total number of diagnosis codes recorded in a DAD record.
Methods
Data source
We randomly selected around 4000 records for patients aged ⩾ 18 years and discharged between 1 January 2003 and 30 June 2003 from the four adult teaching hospitals in Alberta, Canada. There were at least 1000 records selected from each hospital, 4007 records in total. Each record was coded by the professionally trained health record coders using the Canadian coding standard for International Statistical Classification of Diseases and Related Health Problems (ICD), 10th Revision, Canada (ICD-10-CA). Canadian coding standard is maintained and developed by Canadian Institute of Health Information (CIHI) based on the ICD-10 developed by World Health Organization (WHO). Since 2001, Canada has used the ICD-10-CA coding standard for coding the DAD data. Minor amendments of coding standard were developed including wording or example changes and modification to reflect or clarify new directions. In DAD, there are 25 diagnosis code fields and 12 coding types. All diagnoses or conditions coded in the DAD must be assigned a diagnosis type. 7
Based on the validated algorithms, we identified the conditions listed in Charlson and Elixhauser comorbidity indices. 12 The Charlson and Elixhauser comorbidities include 31 conditions and are two commonly used instruments for risk adjustment analyses.
Chart review
Two professionally trained reviewers reviewed all 4007 medical charts including a thorough review of the chart cover page, discharge summaries, narrative summaries, pathology reports (including autopsy reports), trauma and resuscitation records, admission notes, consultation reports, surgery/operative reports, anesthesia reports, physician daily progress notes, physician orders, diagnostic reports and transfer notes to check whether conditions listed in Charlson and Elixhauser comorbidities indices were documented. The process took approximately 1 hour for each chart. Detailed description about the process of chart review can be found in our previous publication. 4
Statistical analysis
Identification of the co-existing conditions for the four conditions in the chart
Based on the chart review data, we developed logistic regression models via least absolute shrinkage and selection operator (LASSO) to identify any other conditions that were documented together with the four study conditions. The LASSO is a shrinkage and selection method for regression models that can be described as a constraint on the sum of the absolute values of the modeling parameters. 13 The LASSO allows for accurate estimation of model parameters and shrinks estimates of non-important parameter to zeros for automated variable selection. The independent variables used in the model are age, sex, and the remaining 30 conditions. Any conditions with nonzero estimates in the model were deemed as associated conditions.
Assessing validity
Sensitivity and positive predictive value (PPV) were used to assess the validity using the condition defined by chart review as the gold standard. Sensitivity indicates the probability of a condition being coded when a patient has the condition documented in chart; PPV indicates the probability of a condition being documented in the chart when a patient has the condition coded in the DAD.
Results
The four conditions of primary interest were under-coded in the DAD compared to chart review data (Table 1). All the co-existing conditions identified from chart data were clinically related to their corresponding conditions. Prevalence of the four conditions is three- or four-fold high if their co-existed conditions were coded. For all the four conditions, sensitivity was improved if their co-existed conditions were coded in the DAD. Coding of co-existing conditions in the DAD had negligible impact on PPV for the four conditions. Overall, diabetes and hypertension had high sensitivity while obesity and depression had low sensitivity in hospital DAD (Table 1).
Validity for the four study conditions with and without its co-existing conditions.
PPV: positive predictive value; CI: confidence interval; DAD: hospital discharge abstract database; CeVD: cerebrovascular disease; MI: myocardial infarction; CHF: congestive heart failure; COPD: chronic pulmonary disease; ⩾1 related: at least one of the co-existed conditions coded in the DAD record.
The co-existed conditions for hypertension are CeVD, diabetes, MI, obesity and renal failure; for diabetes, they are CHF, hypertension, MI, obesity, renal failure; for obesity, they are COPD, diabetes and hypertension; for depression, they are alcohol abuse, COPD, drug abuse, dementia, fluid and electrolyte disorder, hypothyroidism and psychoses.
Death was a severe outcome of hospitalization with average number of diagnosis code of 9.79 (vs 5.06 for the alive cases) in the dataset. There were 105 cases of death with 11 cases of missing status. Prevalence of the four study condition is high if the status of death was recorded. Status of death in the DAD has non-significant impact on the sensitivity and PPV for the four conditions (Table 2).
Validity for the four study conditions related to the status of death.
CI; confidence interval; DAD: hospital discharge abstract database; PPV: positive predictive value.
There were 105 records with status of death and 3891 records without status of death.
The total number of diagnosis code coded in the DAD ranged from 1 to 25 with a median number of 4 (interquartile range (IQR): 2–7). The sensitivity increased with an increase of the total number of diagnosis codes (Figure 1). Difference of sensitivity between records with 2 diagnoses and ⩾8 diagnoses was 53 percent for hypertension, 35 percent for diabetes, 29 percent for obesity and 27 percent for depression. PPV was not related to the number of diagnosis codes in the DAD.

Sensitivity and positive predictive value of hypertension, diabetes, obesity and depression related to the number of diagnosis codes in hospital discharge abstract database.
Discussion
Coding validity of conditions in the DAD was related to its clinical significance and complexity of patients’ case mix. Hypertension, diabetes, obesity and depression are generally secondary diagnosis and their validity is affected by the coding of their co-existing conditions. The sensitivity for the four conditions increased as the total number of diagnosis codes in the record increased. Impact of death status on coding validity for the four conditions was minimal.
Coding validity is closely related to the clinical significance of a condition and its influence on length of stay, care received or therapeutic treatment during hospitalization. The four study conditions are generally secondary diagnosis during hospitalization, which are optional for coding according to the current coding standard in Canada. 7 Coding of co-existing conditions in the DAD was found to improve coding validity of the four study conditions. This provided a way to re-identify under-coded patients based on their comorbidities and improve the diseases surveillance. Lix et al. 14 found that inclusion of osteoporosis fracture and other fracture diagnosis in the administrative health data generally resulted in improved sensitivity of osteoporosis case-detection algorithm without loss of specificity.
Overall, hypertension and diabetes codes demonstrated better validity metrics than obesity and depression in the DAD. Hypertension and diabetes are the major risk factors for circulatory system diseases, which is the leading cause of hospitalization in Canada. 15 As a consequence, it seems more likely that hypertension and diabetes would be documented and coded in the DAD as shown in this study. Obesity was dramatically under-coded and underreported in the chart. The prevalence of obesity in chart review and inpatient DAD was 8.3 percent and 1.8 percent, respectively, much lower than 23.1 percent reported for the general population. 16 Furthermore, obesity had the lowest sensitivity among the four conditions. This is likely reflective of the fact that obesity generally fails to draw the physicians’ attention or care on evaluation, treatment and management of main diseases during hospitalization. 17 Our previous study found that the higher the body mass index of patient, the more likely a diagnosis of obesity coded in the hospital DAD. 6 Depression was also under-coded in the DAD. It was noted that more than 90 percent of patients identified as having depression were receiving their care exclusively from a family physician. Under-coding of depression in DAD and poor documentation in medical chart could be related to the fact that treatment of depression during acute hospitalizations is suboptimal. 18
The number of diagnosis codes in a record reflects the complexity of patient’s case mix and quality of documentation of discharge summary. Data validity improved as the total number of diagnosis codes in DAD records increased. Increasing the number of diagnosis fields allowed in hospital data coding could enhance the completeness of coded clinical information in administrative health data. The WHO ICD, 11th version (ICD-11) topic advisory group on quality and safety recommended at least 15 secondary diagnosis fields to fully characterize clinical outcomes during hospitalization. 19 To fully describe a patient’s health conditions, especially for chronic conditions, it might require more than one DAD record or the records collected over a specified time period (e.g. a few years). For example, it has been found that using a 1-year look-back period to identify comorbidity enabled better estimation of post-hospitalization mortality while using a look-back period longer than 1 year could help to accurately predict the readmission outcomes. 20
Whether the patient died in hospital had minimal impact on coding validity for the study conditions. This is encouraging as administrative health data have been used to develop a series of indicators to calibrate the performance of hospitals and hospital mortality rate is one of the most important indicators. 21 To properly estimate this rate, it is required to conduct risk adjustment to account for the difference of patients’ characteristics. This study provided evidence to support the use of administrative health data in development of health indicators related to mortality. However, it should be noticed that our study has a small number of records with patients died during their hospital stay.
The number of research studies based on administrative health data has been dramatically increasing in the recent years. Administrative health data have unique advantage, such as population coverage, low cost and timeliness. Administrative health data play a critical role in community health assessment, disease surveillance, strategic planning, policy-making, service quality control and research. However, data validity remains questionable as the data collection priorities remain exclusively on billing or administrative purpose and not research. The current coding guidelines/practices hinder the completeness of inpatient data due to the focus on clinically significant reasons for the patient’s admission or stay in hospital. Coding validity could be dramatically improved if all the conditions were coded regardless of whether those conditions are clinically implicated in the hospitalization. However, coding is a time-consuming and cost-intensive process. It is impossible to code all conditions, particular for complicated cases within the limited amount of time given to coders for each chart. It is suggested that some important chronic or modifiable conditions, such as hypertension and diabetes, should be coded as long as it was documented in the chart.
Limitations
This study has some limitations. First, we only examined the validity of four conditions with high prevalence. Other conditions having different clinical implication and resource use during hospitalization and prevalence might have different relationships between validity and existence of its associated comorbidities and patients’ status of death. Second, we conducted our study based on the data from teaching hospitals. Teaching and nonteaching hospitalities vary in terms of severity and complexity of disease and case volume. Iezzoni et al. 22 reported that the validity of administrative health data varies between teaching and nonteaching hospitals. Third, we conducted our study based on a dataset from 2003. However, the fact that the dataset contained over 4000 records with their medical charts reviewed could be viewed as a strength. The coding guideline remains unchanged in the last 10 years. The process of chart review is costly and time-consuming as it provides more complete information on health records. So, despite these limitations, we believe that this study provides important insight into the data quality of the DAD and offers suggestions to potentially improve that data quality.
Conclusion
Coding validity of conditions is closely related to their clinical importance and the complexity of the patients’ case mix. Hypertension, diabetes, obesity and depression are generally secondary diagnosis and optional for coding in hospital DAD according to the current coding standard. However, hypertension and diabetes, being common complications related to the leading cause of hospitalization in Canada, had better validity than obesity and depression. Furthermore, coding validity improved as the number of diagnosis codes in the record increased. We recommend the mandatory coding of certain secondary diagnosis to meet the increasing need of health service research conducted based on administrative health data. Use of hospital DAD only for surveillance faces the problem of underestimating the prevalence and incidence due to under-coding.
Footnotes
Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.
