Abstract
Keywords
Introduction
Health classification has been used for disease surveillance since the mid-17th century, well before the establishment of the International Classification of Diseases (ICD) in the late 19th century (Bowker and Star, 2000). Created to “coordinate [standardised] information and resources about mortality and morbidity globally,” the ICD has been revised approximately every decade, initially by the League of Nations and, since 1948, the World Health Organization (WHO) (Bowker and Star, 2000: 21). The ICD classification remains a core component of 21st century international, national and local health and medical bureaucracies and health information systems (Bowker and Star, 2000). Translated into over 43 languages, it is used in over 177 countries (World Health Organization (WHO), n.d.).
Australia adopted the tenth revision of the ICD (ICD-10) in 1997 for mortality (Medical Certificate of Cause of Death) coding. Similarly, to several other countries, in 1998 Australia’s National Centre for Classification in Health developed a country-specific modification (the ICD-10-AM (Australian Modification)) for morbidity coding and an Australian procedural classification (The Australian Classification of Health Interventions (ACHI)) (Roberts et al., 2002). The ICD-10-AM replaced the ICD-9-CM and was designed to accurately reflect the Australian clinical environment (Roberts et al., 1998, 2002). It was implemented gradually across all states and territories during 1998–1999 (Innes et al., 2000). Therefore, ICD-10 and ICD-10-AM are deeply embedded in the Australian health landscape. They underpin a vast cache of coded data for use in research.
In Australia, the application of the ICD extends across the whole of the health system. Consistent with practice in many other countries, the classification also functions extensively beyond its early purposes for mortality and population health surveillance, specifically: as a classification standard for mortality and morbidity data analysis, and research in the clinical, epidemiology, healthcare safety and quality and health administration arenas; as a clinical diagnostic tool; in compilation of risk prediction algorithms and for measuring the social determinants of health (Deschepper et al., 2019; Hay et al., 2017; Moriyama et al., 2011).
In addition to these original purposes, Australia’s ICD coded data are used to describe and contribute towards the pricing of public hospital services (Independent Hospital and Aged Care Pricing Authority [IHACPA], 2022). Since the second half of the 20th century, ICD data have contributed substantially to the undergirding, in Australia and several other countries, of hospital casemix- and other activity-based funding models in the public and private healthcare sectors, based upon patients’ clinical profiles and resource utilisation (Byron and McCathie, 1998; Wiley, 1992). The ICD coded data produced in the latter context are depended upon by provider organisations and health insurance funds.
The ICD classification has inherent limitations. For example, Liu et al.’s (2022) systematic review of Canadian ICD-10-based coding of sepsis for studies published from inception of the databases until September 2021 demonstrated the underreporting of sepsis in administrative databases, with a pooled sensitivity of only 35% (extensive under-reporting). The authors were optimistic that the future introduction of ICD-11, which has substantially more codes, may help to ameliorate this problem. In Australia, the construct and utility of ICD-10 and ICD-10-AM coded data are constrained in context of the classification’s internal standards and jurisdictional advisories that permit code allocation only to diagnoses, per Australian Coding Standard 0002, for “conditions that are significant in terms of treatment required, investigations needed and resources used in each episode of care” as opposed to describing “the complete disease status of the inpatient population” (Independent Hospital Pricing Authority, 2019: 4). Variability and shifts in coding errata, and in editions and versions of the classification, complicate the data collection, interpretation and analysis for identifying longitudinal statistical trends and for use in research studies.
In the past two and a half decades, the application in Australia of ICD coded data for funding and reimbursement purposes has dominated the health classification discourse and the performance of clinical coding in the hospital sector. It has also spurred the introduction of a professional sub-specialty of clinical coding auditing that is primarily focused on revenue assurance (i.e. the legal optimisation or, in the case of health insurance funds, the legal minimisation, of reimbursement). Australia’s activity-based funding environment has informed and, arguably, driven applications and interpretations of the classification. Anecdotally, it appears to have underpinned a shift in hospital-based coding practice from an emphasis on clinical coding for surveillance to a narrower focus. This, alongside the complexities brought by changes to editions and versions of the classification, appears to contribute to the limitations of reliable, coded data for research. The complexities were demonstrated by Ryan et al. (2021) in their multi-hospital study of ICD-10-AM stroke coding, which revealed substantial gaps in the quality of the coded data. For instance, these researchers found that when compared with diagnoses recorded by clinicians in the Australian Stroke Clinical Registry, the sensitivity of hospital coding ranged from 50.8% to 86.7% for different stroke types, and 1 in 10 stroke/transient ischaemic attack diagnoses had not been coded at hospital level (Ryan et al., 2021). These findings prompted the current investigation of the research-only applications of Australia’s ICD-10 and ICD-10-AM coded data, as the first step in a wider exploration of the effects of the financial imperative on coding practice and the ICD-coded data available for research.
Research questions
The purpose of this study was to undertake and present a scoping review of the literature exploring the use of ICD-10 and ICD-10-AM Australian-coded data in published research. It sought to address the following research questions:
(1) What were the applications of ICD-10 and ICD-10-AM Australian-coded data in peer-reviewed research published in 2012–2022?
(2) What were the purposes of ICD-10 and ICD-10-AM coded data within this research context, as classified according to a pre-existing, modified taxonomy of data use framework (Riley et al., 2022)?
(3) What was the extent of expert health classificatory involvement in informing the researchers (authors of the published research)?
(4) What was the extent of the researchers’ knowledge and understanding of the classification, codes and coded data?
The current article reports on the scoping review and the findings in relation to the first two research questions.
Method
Study design
A scoping review of the literature was conducted drawing upon the five-stage framework for scoping studies as constructed by Arksey and O’Malley (2005) and further informed by other authorities including the best practice guidelines from the Joanna Briggs Institute evidence synthesis methodology (Peters et al., 2020) and PRISMA-ScR (PRISMA Extension for Scoping Reviews) checklist (Tricco et al., 2018). The intent in adopting a scoping review was to optimise the known strengths of this method in mapping the literature within the area of interest (Arksey and O’Malley, 2005) and synthesising previously unexplored research evidence (Mays et al., 2001; Peters et al., 2015). A pre-existing, modified Taxonomy of Data Use Framework (Riley et al., 2022) was used to systematically frame and examine the within-scope articles and their author-researchers’ uses of the ICD-10 and ICD-10-AM classifications.
Search strategy
Following development of the study aim and research questions, a protocol (non-registered) was established. The research team collectively established search terms in context of the study purpose and research questions. These were mapped according to the population, context and concept elements suggested by Peters et al. (2020) (see Box 1). Articles were sourced from the Medline, Scopus and Cumulative Index to Nursing and Allied Health Literature (CINAHL) electronic databases. Following advice from a specialist librarian, a systematic, independent search of the electronic databases was undertaken (MR) using the agreed search terms. The search terms were applied to the keywords, title and abstract of each article discovered in the search.
Key search terms.
Inclusion and exclusion criteria
The research team collaboratively developed the inclusion and exclusion criteria. The 11-year period, 2012–2022, was selected to represent the most recently published research use of the focal classifications. Only peer-reviewed research papers published in English were included. The criteria applied during the screening processes are shown in Box 2.
Inclusion and exclusion criteria.
Eligibility screening
Step 1. In the pre-screening process, the within-scope articles identified in the database search were imported (MR) into Covidence, an online reference manager. Three hundred and fifty-five duplicates were removed using the Covidence tool; a further seven were manually identified by reviewers during the various screening stages.
Step 2. Title and abstract screening were undertaken by two researchers (JL and MR) who separately and sequentially applied the predetermined inclusion and exclusion criteria to each within-scope article. The outcomes of the searches were compared. Intra-researcher discrepancies were reviewed by a third, independent reviewer (KR), who made a final decision.
Step 3. In light of the broad scope and usage of ICD-10 and ICD-10-AM throughout the Australian health care system and to enhance the robustness of the review, it was decided to include a 5% random sample of the within-scope articles’ references lists, where those references met the criteria of English language, peer-review and publication within the 2012–2022 study timeframe. A manual reference search was undertaken (MR) and the abstracts of all potential titles were examined (MR). If the terms “ICD-10” and “Australia” or any Australian state or territory were found in the article, a URL link was transcribed to an Excel spreadsheet. Each article derived from the references sample was then cross-referenced against the existing within-scope articles to ascertain if it had been produced by the original search. Eligible articles were manually added to the within-scope pool as determined in the initial database searches. The inclusion and exclusion criteria were applied (MR, JL). Verification, via consensus, was undertaken independently by another member of the research team (KR).
Step 4. A full-text review was undertaken of each published article that was determined in Steps 2 and 3 to have met the inclusion criteria. This process was undertaken independently and sequentially by two researchers (JL and MR). The outcomes of their searches were compared. Discrepancy resolution was undertaken including discussion and then independent assessment by a third reviewer (KR).
Data extraction
Step 5. An extraction template facilitated the charting of results. This was created (SG) to include collaboratively determined (MR, JL, SG, SR and KR) requisite data items. The following bibliometric and research details were extracted from each within-scope, full-text article: study identification number; title; name of lead author; affiliation/organisation of lead author; contact details of lead author; (first) year of publication (online or journal issue); title of journal; identification of Health Information Manager (HIM) or Clinical Coder (CC) in the author line-up, text or acknowledgements; data source (state and/or country); study aims; study setting; study design; study participants; focus of the coded condition; study timeframe; data source; ICD-10 version; ICD-10 or ICD-10-AM codes; other classifications represented in the study; purpose of the ICD use; purpose category; comments regarding the authors’ demonstrated “coding knowledge” beyond code abstraction and other comments deemed to be relevant.
Step 6. Subsequent to the full-text review, data extraction was undertaken on each within-scope article. This involved four reviewers (MR, JL, SG and SR) working sequentially, in independent pairs. During extraction, the purposes for which the ICD-10 or ICD-10-AM coded data were used in each study were categorised. Up to three purposes per article were recorded using the modified version of the Taxonomy of Data Use developed by Riley et al. (2022) (Supplemental Table S1). A consensus process was undertaken systematically, whereby each article and the reviewers’ comments were further reviewed, amended and verified (KR).
Data analysis
Descriptive analyses, supported by IBM SPSS Version 28.0 (IBM Corp., 2021), were utilised to summarise the results. The articles were grouped according to the broad ICD-10(-AM) chapter titles (per the classification’s Tabular List) to determine the foci of the codes used by the authors of the reported research studies. Articles that reported studies with more than one disease focus were grouped into either a “Multi-morbidity” or “Mortality” category. For example, a study that focused on burns associated with gastrointestinal disease was classified as “Multi-morbidity” as it could fit equally under either of two ICD chapter titles, specifically “Diseases of the Digestive System” and “Injury, Poisoning and Certain other Consequences of External Causes.” Studies that focused on multiple causes of death were classified under “Mortality.” In a situation where more than one condition could be classified into the same disease chapter, this took precedence over the “Multi-morbidity” and “Mortality” categories (i.e. stroke and myocardial infarction would be classified as “Diseases of the Circulatory System”). The following hierarchy was devised (MR) following discussion and agreement amongst the research team. It was used to summarise the ICD purpose-categories and ensure that each study was counted only once:
Studies assigned one purpose-category were allocated to that purpose-category.
Studies assigned two purpose-categories, where one of those categories was “5a Research-observational-codes to select patients,” were allocated to the additional purpose category, given that all studies required the selection of codes.
Studies assigned two or more purposes categories, excluding the scenario in (2) above, were either listed by the combination of those categories if there was a sufficient number of cases (at least five), or allocated to a new category labelled “Multiple.”
Studies assigned two purpose-categories, where one of the categories was “Other,” were allocated to “Multiple” or assigned their own category subject to a sufficient number of cases (at least five).
In the analyses of characteristics of the uses of the ICD classification, articles were grouped for presentation according to the main categories of overall study purpose that arose thematically from the review of the reported research studies. The categories were determined by consensus amongst the full research team (see Supplemental material).
Results
Article selection
The screening of the titles and abstracts of 2103 articles imported from the selected three databases (Scopus, Medline and CINAHL) resulted in 611 articles being accepted for inclusion from the searches. An additional 25 papers were identified through the manual review of the 5% random sample of the references lists of the within-scope articles, resulting in 636 articles for extraction (Figure 1).

PRISMA (Preferred Reporting Items in Systematic Reviews and Meta-Analysis) flowchart of scoping review (Covidence generated).
Characteristics of studies reported
Table 1 summarises the key characteristics of the studies reported in the within-scope articles. Over 54% of the eligible articles were published from 2019 onwards. The states with the largest populations, New South Wales and Victoria, collectively generated the largest proportion, almost 47%, of the studies. Of the few studies reported in the articles that involved international collaboration, the most commonly associated countries, aside from New Zealand, were the United Kingdom, the United States of America (USA), Canada and several European countries.
Summary of key characteristics of studies reported in within-scope articles.
RCTs: randomised control trials.
Includes RCT (n = 2), mixed methods (n = 2), qualitative (n = 1), time series (n = 1), diagnostic accuracy (n = 1), quasi experimental (n = 2), predictive modelling (n = 7), mapping studies (n = 2), plus eight studies of indeterminate design.
Most of the study designs utilised by the researcher-authors of the within-scope articles were observational, predominately comprising descriptive (50.6%) and cohort (34.6%) study designs (Table 1). Authors who were identified by the research team as “health information managers” (HIMs) (i.e. possessing a professional degree in health information management), and authors known to be “clinical coders” (CCs), were identified in 24 (3.8%) of the within-scope articles. The author-researchers acknowledged HIM, CC, or health information service involvement, either by name or in general, anonymous terms, in 36 (5.7%) of the articles.
Each within-scope study was classified into themes based on the Riley et al. (2022) modified Taxonomy of Data Use and agreed by consensus amongst the research team. These themes were mortality studies only; studies of ICD-10 or ICD-10-AM coding quality; medical research studies; environmental and public health studies; patient safety, including risk prediction; clinical or practice guidelines; quality of care; health economics and administrative and other study purposes. For ease of presentation, the following selected variables are provided (Supplemental Table S2): First author and year of publication; state/territory and country of study cases/participants; foci of the coded data; ICD version used in the study and primary purpose of the ICD-code(s). Table 2 provides the frequency of each of the major themes of study purpose and shows medical research as the primary purpose in 35% of the studies.
Study purposes classified according to major themes based on modified “taxonomy of data use” theme.
ICD: International Classification of Diseases.
Foci of the ICD codes
Table 3 highlights the foci of ICD-codes used in the research studies reported in the within-scope articles. Over 60% (n = 405/636) of the studies fell into 5 of the 22 categories (i.e. the “top five”): “Injury, poisoning and certain other consequences of external causes”; “Diseases of the circulatory system”; “Mental and behavioural disorders”; “Multi-morbidity”; and “Neoplasms.” Studies within the “injury” category focused on conditions such as fractures, adverse drug events and injuries resultant upon trauma, while studies within the circulatory categories predominantly focused on stroke and cardiovascular diseases. Articles that reported studies classified within the mental and behavioural disorders category often focused on multiple mental health conditions or the impact of alcohol and drugs on mental health, rather than on singular conditions. The “Multi-morbidity” category included studies that examined reasons for hospital admissions, hospital acquired complications, reasons for frailty or determination of the relationship between two conditions (e.g. depression and cardiovascular disease). Fifty percent (n = 203/405) of the studies that fell within these “top five” categories were published after 2019 and 22% (n = 89/405, 14% of articles retrieved from the 11-year study period) were categorised within the “Mental health and behavioural” category.
Foci of studies using ICD-coded data.
ICD: International Classification of Diseases.
Use of ICD versions reported by the study authors
Table 4 summarises the ICD-10 version as described by the authors of the within-scope articles. Most reported having used data coded according to a modification of the ICD-10 (60.3%, n = 322/534). The Australian Modification (i.e. ICD-10-AM or the non-existent “ICD-9-AM”) were specified in 99% (n = 319/322) of these articles. The remaining 1% of articles reported studies that used data coded according to a Clinical Modification (ICD-10-CM or ICD-9-CM). The ICD tenth revision, without specification of modification, was the only descriptor provided in 42% (n = 225/534) of the articles. Twenty-one percent (n = 48/225) of these research publications that used ICD-10 described the use of an additional revision (ICD-8 or ICD-9). “ICD” was the only descriptor provided in seven articles (1%, n = 7/636). Almost 13% of authors (n = 82/636) used any combination(s) of ICD-10 or ICD-10-AM with specialised terminologies or classifications (e.g. the Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT), Diagnostic and Statistical Manual of Mental Disorders (DSM)). These studies were not reported in any of the other ICD category breakdowns. The classifications were not within scope, but are reported here to provide context in relation to their usage alongside the ICD data. The ACHI, International Classification of Diseases-Oncology (ICD-0), SNOMED CT and Australian Refined-Diagnosis Related Groups (AR-DRG) were the most commonly used, specialised classification systems and terminology. Eighty-four percent (n = 533/636) of all articles included the ICD codes used in the research studies.
ICD versions used and presence of ICD codes.
ACHI: Australian Classification of Health Interventions; AR-DRG: Australian Refined-Diagnosis Related Groups; ICD: International Classification of Diseases; SNOMED CT: Systematized Nomenclature of Medicine-Clinical Terms.
As stated by the authors of the relevant articles.
Other specified classifications include ACHI, SNOMED CT, ICD-0, ICD-9-BPA, AR-DRG and more.
Purposes of the ICD coded data
Table 5 shows the purposes for which the ICD codes were used within each study. ICD codes were used solely for identifying cases in 44.2% of the within-scope studies. Selection of cases and the corresponding classification of outcome variables such as co-morbidities or mortality were demonstrated in a further 10% of the studies. “Monitoring and surveillance” for public health purposes was the second most frequent use of ICD codes (14.3% of the published studies) and included studies that investigated toxic exposures or evaluated preventive public health measures. The quality of coded data, in terms of both case ascertainment and/or accuracy, was the focus in 11.3% of the published studies.
ICD purpose-categories from within-scope studies.
ICD: International Classification of Diseases.
Discussion
This scoping review sought to identify the research applications of Australian-coded ICD-10 and ICD-10-AM data published in the peer-reviewed literature in 2012–2022, and to classify the purposes of these data according to a modified taxonomy of data use framework. Over half of the within-scope articles were published within the last 4 years of the 11-year study timeframe. This trend is consistent with the findings from Bornmann et al. (2021) bibliographic data-based investigation that revealed a very substantial and increasing growth in scientific publications within the same timeframe as the current study.
Publications by authors in the states with the highest populations, New South Wales (NSW) and Victoria, accounted for the largest representation, but were lower, particularly in the case of NSW, than their respective proportions of the national population. In contrast, 16.4% of the articles reported studies by authors in Western Australia (WA), which accounted for 11% of the nation’s population at the end of the study period (Australian Bureau of Statistics, 2023); this was possibly due in part to WA’s long-standing health data linkage infrastructure. The inter-state proportional differences could be accounted for by the fact that almost 19% of the articles had authors from multiple states.
Application of the modified taxonomy of data use framework revealed that just under half of the studies (44.2%) used ICD codes solely for identification of cases/subjects. The finding that over 60% of the articles reported research underpinned or informed by classification to categories of injury and poisonings, circulatory or mental health diagnoses, multiple comorbidities or neoplasms, cannot necessarily be considered to reflect contemporary topical issues in Australian healthcare. Rather, it should be interpreted with caution because descriptive and cohort studies dominated the reported study designs. This imbalance of study types demonstrated a bias that is likely due to the amenability of ICD coded data to these types of medical and epidemiological research study types when compared, for instance, with randomised control trials (RCTs), which use individual subject- and control-level data and are highly prevalent in medical research. It is useful to consider Zhao et al.’s (2022) findings that revealed substantial increases in the volume of published clinical research studies over the past three decades. They found that collectively, cohort, cross-sectional and case–control studies constituted 49% of all clinical study types. They also reported an 18% growth rate by 2020 of cohort and case–control studies, and that by 2018 the number of cross-sectional studies had surpassed the number of RCTs.
The absence of reference to the modification in 42% of the articles reflects mortality studies, but for morbidity studies suggests the relevant authors’ lack of familiarity with the classification. The ICD version and other classifications reported by the authors included some non-existent “classifications” or “versions,” or modifications that were highly improbable. For example, three discrete studies undertaken by hospital-based clinician–authors reported on ICD-10-CM data. During the study timeframe, Australian hospitals used ICD-10-AM whereas hospitals in the USA used ICD-9-CM and ICD-10-CM; furthermore, the research team members’ professional knowledge and consultations with senior HIM-Coders in those institutions indicated that the authors would have been provided with ICD-10-AM coded data. These discrepancies suggest that some researchers had insufficient understanding of the classification and, potentially, of nuances of the coded data that underpinned or informed their research.
The findings that 14% of articles retrieved from the 11-year study period were categorised within the “Mental health and behavioural” category and 39% of this group were published between 2020 and 2022 are possibly reflective of the Australian Government’s mental health policy and national reform priorities during the past decade (Australian Government, Department of Health 2021; Australian Government, Department of Health, National Mental Health Commission, 2017).
Strengths and limitations
The strengths of the review included the number of articles retrieved and analysed, and the inclusion of the 5% random sample of articles from the references lists. One possible limitation was the exclusion from scope of articles that reported research on health service funding or reimbursement; they may have contributed to a more comprehensive picture of the research uses of Australian-coded ICD-10 and ICD-10-AM data. Inevitably, some potentially eligible articles were missed from the search owing to the absence of search terms such as the name of the classification from the title, key words or abstracts. Incorrect or confused reporting by authors of the names of the classifications also created a dilemma. Another potential limitation related to the identification of HIMs’ or CCs’ involvements with the reported research studies. The health information management and clinical coding workforces are relatively small, and all of the research team members have very extensive, Australia-wide professional networks; however, it is possible that some HIM or CC authors may not have been identified during data extraction.
Conclusion
The current review has enriched our understanding of the applications and importance of coded data in the medical and wider health research environments. The findings demonstrate a diverse utility of Australian-coded ICD and ICD-10-AM data in peer-reviewed research studies. Medical and other health researchers’ usage of coded data is extensive and robust. The increasing volume in the past decade of published research that has relied upon clinical codes points to a corresponding, escalating demand for accurate and timely ICD-10 and ICD-10-AM data. This demand will be driven further by the increasing number of cohort and cross-sectional study types in the medical literature which, together with the expanding milieu of health data linkage, foreshadows substantial increases in researchers’ requirements for coded data. This combination of factors will inevitably drive a corresponding need by researchers for informed advice from experienced HIM-Coders and CCs on the applications and interpretation of the classification, the codes, and the coded data, to ensure and enhance research credibility and replicability. An examination of these aspects, in the context of the findings of the current scoping review, will be reported in a future publication.
Supplemental Material
sj-docx-1-him-10.1177_18333583231198592 – Supplemental material for The applications of Australian-coded ICD-10 and ICD-10-AM data in research: A scoping review of the literature
Supplemental material, sj-docx-1-him-10.1177_18333583231198592 for The applications of Australian-coded ICD-10 and ICD-10-AM data in research: A scoping review of the literature by Merilyn Riley, Jenn Lee, Sally Richardson, Stephanie Gjorgioski and Kerin Robinson in Health Information Management Journal
Footnotes
Acknowledgements
The authors thank Hannah Buttery, Librarian at La Trobe University Library, Melbourne, for expert advice during the database searches.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
ORCID iDs
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
