Abstract
Background/Aims:
The demand for simplified data collection within trials to increase efficiency and reduce costs has led to broader interest in repurposing routinely collected administrative data for use in clinical trials research. The aim of this scoping review is to describe how and why administrative data have been used in Australian randomised controlled trial conduct and analyses, specifically the advantages and limitations of their use as well as barriers and enablers to accessing administrative data for use alongside randomised controlled trials.
Methods:
Databases were searched to November 2022. Randomised controlled trials were included if they accessed one or more Australian administrative data sets, where some or all trial participants were enrolled in Australia, and where the article was published between January 2000 and November 2022. Titles and abstracts were independently screened by two reviewers, and the full texts of selected studies were assessed against the eligibility criteria by two independent reviewers. Data were extracted from included articles by two reviewers using a data extraction tool.
Results:
Forty-one articles from 36 randomised controlled trials were included. Trial characteristics, including the sample size, disease area, population, and intervention, were varied; however, randomised controlled trials most commonly linked to government reimbursed claims data sets, hospital admissions data sets and birth/death registries, and the most common reason for linkage was to ascertain disease outcomes or survival status, and to track health service use. The majority of randomised controlled trials were able to achieve linkage in over 90% of trial participants; however, consent and participant withdrawals were common limitations to participant linkage. Reported advantages were the reliability and accuracy of the data, the ease of long term follow-up, and the use of established data linkage units. Common reported limitations were locating participants who had moved outside the jurisdictional area, missing data where consent was not provided, and unavailability of certain healthcare data.
Conclusions:
As linked administrative data are not intended for research purposes, detailed knowledge of the data sets is required by researchers, and the time delay in receiving the data is viewed as a barrier to its use. The lack of access to primary care data sets is viewed as a barrier to administrative data use; however, work to expand the number of healthcare data sets that can be linked has made it easier for researchers to access and use these data, which may have implications on how randomised controlled trials will be run in future.
Keywords
Introduction
Randomised controlled trials (RCTs) are considered the gold standard study design for determining safety and effectiveness of new medical treatments. However, RCTs are often resource-intensive and may take years to complete. 1 The demand for simplified data collection within trials to increase efficiency and reduce costs has led to broader interest in repurposing routinely collected administrative data for use in clinical trials research. 2
The Australian Bureau of Statistics 3 defines administrative data as information that government departments, businesses and other organisations collect for reasons other than research, such as registrations, billing and record keeping. Examples of healthcare administrative data sources include state-based admitted patient (hospitalisations) and emergency department admission databases; national-level databases such as the Medicare Benefits Schedule (MBS) and Pharmaceutical Benefits Scheme (PBS); and the Australian Institute of Health and Welfare data sets such as the National Death Index and the Australian Immunisation Register.
Linked administrative data are seen to have more comprehensive information about individuals and communities 4 and it is viewed as the answer to monetary and resource constraints when conducting research. 5 However, administrative data are also known to have some challenges, such as missing or limited data, delays in receiving data and difficulties when linking data cross jurisdictionally. 4
The aim of this scoping review is to describe how and why administrative data have been used in Australian RCT conduct and analyses, specifically the advantages and limitations of their use as well as barriers and enablers to accessing administrative data for use alongside RCTs.
Methods
The review was conducted according to Joanna Briggs Institute methodology to identify the available evidence, 6 and reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping reviews (PRISMA-ScR). 7 The PRISMA-ScR Checklist was completed and can be found in Supplemental file 2. The review protocol was previously registered on Open Science Framework (https://osf.io/cj9bd).
Eligibility criteria
Articles were included if they reported an RCT, accessed one or more Australian administrative data sets, where some or all trial participants were enrolled in Australia and where the article was published between January 2000 and November 2022, so as to focus on more recent data linkage practices. Articles were excluded if they were an editorial, protocol, summary or a review. Articles reporting on studies which accessed only the following data sets were excluded: (a) clinical quality registries where reporting to the registry was not mandatory, and (b) electronic health records held only at the point of care, such as practice records, as they are not routinely available or easily accessible. In addition, articles reporting on studies where administrative data were only used to facilitate recruitment, for example, through electoral rolls or disease registries, were excluded as there was no linkage to clinical trial data. Observational studies embedded within an RCT were also excluded from the review.
Information sources and search strategy
The database search was performed in May 2022 and updated in November 2022. Six databases, namely, MEDLINE (OVID), EMBASE (OVID), CINAHL, Cochrane Controlled Register of Trials (CENTRAL), PsycInfo and the Maternity and Infant Database (OVID), were searched. The search strategy (Supplemental file 1, section 3) combined three groups: (a) study type, that is, RCT; (b) participants, that is, Australian; and (c) data set/data type, that is, mention of terms such as ‘administrative data’, ‘routinely collected data’ or ‘record linkage’. The data search also listed several administrative databases, such as the MBS, PBS and state-admitted patient care databases.
Hand searching was conducted, including study protocols found in the database search, Google Scholar, the Australian New Zealand Clinical Trials Registry, publication listings on state data linkage unit websites, as well as reference lists and author searches of relevant studies.
Screening
All identified citations were collated and uploaded into Covidence, a systematic review software (Veritas Health Information, Melbourne, Australia), and duplicates were removed. Titles and abstracts were independently screened by two reviewers (S.F. and N.A.) according to the eligibility criteria. The full texts of selected studies were assessed against the eligibility criteria by two independent reviewers (S.F. and N.A.). Reasons for exclusion of full text citations were documented, and any disagreements during the screening and full text review were resolved through discussion with a third reviewer (K.B.). Where more than one article was found for a single study, both were included in the scoping review if they had a different focus. If the articles had the same focus, the earliest published article was selected.
Data extraction and analysis
All data were extracted from included articles by one reviewer (S.F.) using a data extraction tool developed for this review (Supplemental file 1, section 4). Select data (denoted by an asterisk (*) in the data extraction tool) for all articles were extracted by a second reviewer (N.A.). These fields were selected on the basis that they would have a higher likelihood of disagreements. The tool was developed by all authors and regularly revised to ensure all appropriate information was tabulated. Data extracted included the author, title and year, geographic state/s of participants, disease area of trial, purpose of linkage, administrative database/s that were accessed, and barriers and enablers to linkage. Discrepancies between reviewers were discussed and solved by consensus, and extracted data were also reviewed through regular discussions and sharing of included articles with all co-authors (N.A., K.B., C.L., R.L.M.) throughout the extraction process. Where indicated, data were classified according to the options provided in the data extraction tool; otherwise, relevant passages of text were copied into the template.
Data synthesis
Data were downloaded from Covidence into Microsoft Excel and synthesised using quantitative and narrative approaches. Where options could be selected, data were tabulated in excel, according to the trial characteristics and data linkage information. Passages of text were extracted for the collection of methods, advantages/limitations and barriers/enablers, and these were organised according to themes.
Results
Study selection
A total of 2070 records were identified through the database search and imported into Covidence and a further 17 records were identified through hand searching. After removing duplicates, 1422 abstracts were screened. Of these, 90 abstracts progressed to full-text review. Following full-text review, there were 41 articles from 36 RCTs that met the inclusion criteria, with 4 RCTs having more than one article with a different focus included in this scoping review (Figure 1).

Search and screening results (PRISMA flow diagram).
Overview of RCTs
Table 1 provides a summary of all RCTs (n = 36) included in this study. The RCTs varied across disease types and health areas. The number of articles published per year steadily increased from the beginning of the search period in 2000 to 2022, with an average of 0.6 articles published per year between the years 2000 and 2004, compared with an average of 6.5 articles published per year between the years of 2020 and 2022 (Figure 2).
Summary characteristics of included studies.
RCTs: randomised controlled trials.

Average number of articles published per year using linked administrative data, 2000–2022.
Twenty-six of the 36 RCTs (72%) included participants randomised from a single state, while 7 (19%) included participants randomised across two or more Australian states. Three trials (8%) were international trials with Australian participants. RCTs involving participants from either New South Wales or Western Australia made up half of all included trials (n = 18; 50%).
The intervention for just over half of the RCTs (n = 21; 58%) was education and counselling, and 9 trials (25%) involved a drug intervention.
The median sample size of the RCTs was 402 (interquartile range (IQR) = 124–1044). Of note, 16 RCTs (44%) had a sample size of greater than 1000 participants.
Eight RCTs (22%) were cluster trials, where administrative data were linked at the general practitioner or community area level rather than the participant level.
Reason for linking to administrative data
There were multiple reasons for linking administrative data (Table 2) including to track disease or health outcomes, including the participants’ survival status (n = 23; 56%); to monitor levels of health service use (n = 23; 56%); for economic evaluation (n = 7; 17%) and for monitoring adherence to drug treatments (n = 3; 7%).
Purpose of data linkage.
aMore than one purpose could be selected.
Administrative data set use
Of the 36 RCTs, 17 (47%) linked to government reimbursed claims data, 16 (44%) linked to hospital admissions data sets and 13 (36%) linked to a birth/death registry (Table 1).
Eleven RCTs linked to a single administrative data set (31%) and a further 9 linked to two administrative data sets (25%). Five trials linked to 3 administrative data sets (14%), 6 trials linked to 4 data sets (17%) and 5 trials linked to 5 or more administrative data sets (14%) (Supplemental file 1, section 1).
Of the 36 trials, 13 (36%) linked all of their participants to administrative data, and a further 14 trials (39%) linked 91%–99% of their participants. In most of the trials reviewed (n = 27 (75%)), linkage was required to ascertain the primary outcome. Of these trials, 11 (41%) linked all participants to administrative data, 10 (37%) linked 91%–99% of participants, and 6 (22%) linked 50%–90% of participants (Supplemental file 1, section 1). Of note, there was one trial where less than half of participants were linked to administrative data due to participants not consenting to linkage. Reasons why some participants were not linked are presented in Table 3 and included participants withdrawing from the trial, having key data missing for analysis involving the linkage, and not being found through linkage.
Reason participants linked less than original sample.
RCTs: randomised controlled trials.
aMore than one reason could be selected.
Method of linkage
Of 36 RCTs, 15 (42%) specified a state or commonwealth data linkage unit was used to provide linked data sets. Nine of these utilised the Western Australian Data Linkage Service, 8 four utilised the New South Wales and Australian Capital Territory Centre for Health Record Linkage, 9 and the remaining two utilised a data linkage unit from another state/territory. Another 15 RCTs (42%) accessed data sets directly from a single custodian, such as Services Australia for MBS and/or PBS data, or a national register. The remaining 6 RCTs (16%) accessed linked data sets directly from multiple data custodians.
Evaluation of linked data use in RCTs
Advantages and limitations of using linked administrative data in trials were identified in all articles, and some of the more commonly mentioned factors are described below.
Advantages of linked administrative data
A frequently reported advantage to using linked administrative data alongside an RCT was that the data may be more reliable than self-reporting of treatments and events by participants, as there is potential to eliminate participant recall bias.10–20 A number of articles reported the use of administrative data reduced the proportion of participants who were lost to follow-up15,16,21,22 with one reporting almost near complete follow-up over a 4-year period. 12 This ease of follow-up was seen as particularly useful in RCTs involving participants where attrition rates were likely to be higher, such as ex-prisoners15,23 and participants being treated for drug and alcohol addiction.22,24 In a 5-year follow-up of a trial investigating alcohol intervention in participants with mental health disorders, outcomes for 91% of participants were able to be determined through data linkage, compared with a similar previously reported trial, where only 53% of participants were identified 12 months later through traditional follow-up methods. 24 Long-term follow-up is further facilitated where participants provided consent to access administrative data even following withdrawal from the study. 22 Another reported advantage was data could be collected without inconveniencing the participant to attend a clinic visit or provide the required information. 14
Another key advantage of administrative data use is that it can serve to provide more accurate and complete data about the participant. Two articles specifically assessed the validity of the linked administrative data compared to ‘within trial’ collected data from case report forms.22,25 In one study, data linkage detected nearly twice as many hospital admissions (1.71-fold), and over three times as many emergency admissions (3.09-fold), 22 and another study comparing the accuracy of the data from the National Death Index and national cancer database with known participant deaths through trial follow-up reported over 90% of known deaths could be found through these administrative data sources. Limitations were seen where deaths had occurred outside of Australia or where there were variations in participant first or surnames. 25
Administrative data sets were useful in measuring interventions in cluster RCTs, both at a community level17,26 and practitioner level.27–29 In these trials, data linkage provided evidence on trends to assess effectiveness of the intervention. Administrative data use in cluster RCTs can allow for extensive coverage of participants without requiring extensive participant-level data collection, as reported in the Virtual Infant Parenting Programme, which recruited 57 schools in Western Australia (n = 2834 participants) and attained 98% coverage over a 5-year follow-up period. 21
In health economic analyses, administrative health care claims data such as the MBS and PBS provide comprehensive identification of scheduled fees and benefits paid.11,14 These data sets can also give specific dates that participants visited health professionals 10 which may not be accurate through case report form data collection.
Limitations of linked administrative data
However, several limitations to administrative data use alongside RCTs were also reported. Missing data were an issue, and it was particularly difficult to distinguish between missing data or the absence of events.22,24 Also, the jurisdictional nature of administrative data sets means data will be unavailable for participants who move interstate, if data sets from other jurisdictions were not requested. 30 Some populations were noted as more or less likely to move interstate. Older populations were seen as less likely to move, 20 while the availability of hospitalisation data was limited in a study involving Indigenous participants from Western Australia, as this population may move across borders or receive care in capital cities in other states. 30
Another reported limitation was the limited coverage of administrative data across all types of healthcare providers, specifically primary care databases, which are not centrally accessible outside of government reimbursed claims data (i.e. Medicare). With increasing multimorbidity, administrative data linkage had limited success in capturing the full picture of patient care,21,22,31 if primary care were excluded. Similarly, including data from private clinic databases would require requesting access to the individual databases within the relevant geographical locations if required for trial outcomes. 21
Several limitations were specific to PBS and MBS data, which are two of the more commonly used administrative data sets in Australia. PBS data do not capture medicines not prescribed (i.e. purchased over-the-counter), or medicines purchased privately. 32 Medicines prescribed in public hospitals or obtained privately are also not recorded in PBS data sets, and the PBS data set only measures dispensation and not whether the medicine was actually taken. 13 Some treatments were not captured in the PBS, such as opioid substitution treatment. 15 Prior to July 2012, the PBS did not capture medicines which fell under the Medicare co-payment, 33 which is a government-initiated medication subsidy available to eligible concession card holders. This was a limiting factor for studies conducted prior to or overlapping this period.34,35
Similarly, MBS data did not include all procedures and excludes those covered by compensation systems 27 and services delivered to patients in public hospitals.32,35 Also, MBS expenditure can’t necessarily be pinpointed to a particular condition, especially in the case of general pathology tests or services. 34 Appropriate use of administrative data to measure disease prevalence needs to be considered, as some diseases may not require any medical care and, therefore, administrative data may not provide additional information. 15
There are considerations as to how administrative data might be analysed to avoid bias, which may be viewed as a limitation for researchers. For example, to avoid bias the proportion of participants giving consent to have their trial records linked to administrative data should be balanced between trial treatment groups,10,18 and considerations should be made to adjust analysis methods where the sample size is relatively small, while multiple events are recorded per participant. 16 Also, analysis of diagnostic tests and procedures provides information on volume but cannot assess appropriateness of ordering. 28
Barriers and enablers to data linkage
In addition to the recorded advantages and limitations of using linked administrative data, there were also barriers and enablers to accessing the data.
One barrier was the time delay in receiving access to the linked data. This can be a critical barrier depending on the data required, such as needing to provide real-time safety information. In one article, it was suggested linked data should be used to supplement self-reported data rather than completely replace traditional methods of patient follow-up, as it can impact the monitoring and protection of patient safety. 22
Consent was frequently mentioned as a potential barrier to data linkage, mainly in terms of timing of consent and participant refusal to data linkage. The timing of consent to accessing the linked data appeared to be an important factor to obtaining close to 100% coverage of the participant sample. There were greater proportions of linked participants in trials where consent to data linkage was obtained alongside consent to trial participation, rather than later or with the trial extension (Table 4). Refusal to provide access to linked data was frequently cited as a reason for the linked sample being less than the total patient sample, and the disease area can be a factor for whether consent is provided, as those studies which track very personal information will likely result in lower participation rates, such as a participants’ reproductive history. 21 In terms of participant populations, one article suggested that consent to accessing linked data was further limited among the Indigenous population, but reasons for this were not provided. 34
Percentage of participants linked to administrative data by consent timing. a
IQR: interquartile range.
Three trials excluded where either a waiver was obtained, or timing of consent was not specified.
Consent waivers to enable access to linked data were mentioned in four of the trials, although waivers were provided only under specific circumstances. In a trial which investigated family caregivers of older people, caregivers were required to provide consent, but patients only provided consent if they were well enough to do so, and a waiver was granted where they were not able to provide consent. This consent included access to administrative data collected by the state Department of Health. However, patients were provided the opportunity to ‘opt-out’ of the trial if and when their health improved. 19 In a long-term follow-up study of participants already enrolled in a renal replacement therapy trial, a waiver was granted to link to state and national registries provided the patient had initially consented to participate in the study. 12 In the Passports Study of ex-prisoners, a waiver of consent was granted to access hospital, emergency and ambulance data from the state health department, as well as National Death Index data; however, consent was required to link to the PBS. The waiver of consent was cited as being approved under the Queensland Public Health Act (2005). 15 Finally, in a cluster RCT investigating imaging requests in primary care, general practitioners were not required to provide consent as the intervention was low risk, participants’ privacy could be protected, and it would have been more onerous to obtain participant consent than to provide the intervention. Also, participants did not realise they were in a trial. 28
Access to support from data linkage units, such as the Western Australian Data Linkage Service and the Centre for Health Record Linkage, was frequently cited as enablers to the use of data linkage, as linkage is performed by the unit, data are updated regularly, and researchers can more easily access multiple data sets.16,22,24 Also mentioned was the availability of hospital data, which can vary between states and can be captured inconsistently across the states. 36
Discussion
Our scoping review included 41 articles reporting on 36 RCTs that linked trial participant data to an Australian administrative data set. The characteristics of the trials were varied in terms of the number of participants, disease area, population, and intervention. RCTs are most commonly linked to government reimbursed claims data sets, hospital admissions data sets and birth/death registries, and the most common reason for linkage was to ascertain disease outcomes or survival status, as well as to track health service use. In the majority of included trials, linkage was achieved in more than 90% of participants; however, this can be a limitation where linkage is necessary to address the primary outcome. Consent to link to administrative data and participant withdrawals were cited as common limitations to participant linkage. The main reported advantages of the linked administrative data were the reliability and accuracy of the data, the ease of long-term follow-up and the use of established data linkage units to facilitate the use of administrative data for research purposes. The greater representation of trials from New South Wales and Western Australia reflects their long-established data linkage units, showing increasing potential for research through engaging data linkage units across Australia. Common reported limitations were locating participants who had moved outside the jurisdictional area, missing data where consent was not provided, and unavailability of certain healthcare data.
The number of publications of RCT articles reporting linkage to administrative data sets increased significantly since 2000, reflecting increasing use of administrative data for this purpose, and the seen value to the research community. A possible reason for the increased use is the increasing number of data sets available for linkage, and the increased ease of access when compared to older RCTs. 37 However, there are healthcare data sets which are not yet available to researchers, such as clinic and primary care data sets, which would provide potential for an even broader scope of research.
An apparent drawback to using linked administrative data is the knowledge required to understand the nuances of the various data sets, requiring in-depth knowledge of how to work with the data. As is widely known, administrative data are not intended for research and has led to guidance being developed by government bodies and research centres to aid researchers wishing to use the data.33,38 Having a deep understanding of the data sets and their limitations can save time and ensure the question can be answered with administrative data; however, these nuances can be a deterrent for researchers who lack the experience to request, link and analyse the data.
Obtaining participant consent to data linkage at the time of consenting was particularly important, as this was shown to have the highest participation, mostly because participants were more difficult to locate during the later stages of the trial. Privacy may also be of concern to some participants as refusal to access administrative records was a reason for fewer linkages across multiple trials. Further to this, the time delay in accessing the data can deter researchers from using administrative data outside of the more common purposes of disease outcomes and service use, and clearer guidelines on the applicability of data access waivers will assist researchers in understanding when they can apply for them.
Although there have been previous reviews of administrative data in clinical trials,39–42 they have either included administrative data sets outside of Australia, non-randomised trials, or had a specific focus, such as trial stage, disease area or data set. This review provides an overview of how administrative data have been linked to RCT data in Australia, across all disease areas and stages of an RCT.
While there may be other advantages and limitations to using administrative data, our review of available information was limited to the articles included in the scoping review. We acknowledge that there are other uses for administrative data alongside RCTs which were not included in this scoping review, such as for recruitment, as this review was limited to RCTs which linked to case report form data only. Furthermore, this review did not include linkage to clinical quality registries, as reporting to many of these registries is not mandated; however, any future changes to registry reporting might warrant their inclusion. Future research will involve investigating specific and practical uses of linked administrative data alongside RCTs, including recommendations on whether administrative data sources might replace or supplement traditional trial data collection. Studies which validate case report form data collection with a broad range of administrative data sets are needed to have a better understanding of when these data sets can be incorporated into a trial design and planning.
Conclusion
Linking RCT data to administrative data is performed across a broad range of disease areas, with specific uses in community health and interventions involving education and counselling. The advantages include reliable and accurate data, as well as ease of follow-up. Limitations include tracking participants who have moved interstate or overseas, and the potential concern over data privacy resulting in missing data from those who do not consent. As the data are not intended for research purposes, detailed knowledge of the data sets is required by researchers, and the time delay in receiving the data is viewed as a barrier to its use. The lack of access to primary care data sets is viewed as a barrier to administrative data use; however, work to expand the number of healthcare data sets that can be linked has made it easier for researchers to access and use these data, which may have implications on how RCTs will be run in future.
Supplemental Material
sj-docx-1-ctj-10.1177_17407745231225618 – Supplemental material for The use of linked administrative data in Australian randomised controlled trials: A scoping review
Supplemental material, sj-docx-1-ctj-10.1177_17407745231225618 for The use of linked administrative data in Australian randomised controlled trials: A scoping review by Salma Fahridin, Neeru Agarwal, Karen Bracken, Stephen Law and Rachael L Morton in Clinical Trials
Supplemental Material
sj-docx-2-ctj-10.1177_17407745231225618 – Supplemental material for The use of linked administrative data in Australian randomised controlled trials: A scoping review
Supplemental material, sj-docx-2-ctj-10.1177_17407745231225618 for The use of linked administrative data in Australian randomised controlled trials: A scoping review by Salma Fahridin, Neeru Agarwal, Karen Bracken, Stephen Law and Rachael L Morton in Clinical Trials
Footnotes
Contributions of each author
The review was conceived by S.F., K.B., S.L. and R.L.M. S.F. performed the database searches. S.F. and N.A. performed eligibility checking. S.F. and N.A. extracted the data from included studies. S.F. wrote the first draft of the manuscript. S.F., N.A., K.B., S.L. and R.L.M. reviewed and refined the manuscript and approved the final manuscript.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Rachael L Morton is supported by a University of Sydney, Robinson Fellowship, and NHMRC Investigator Fellowship.
Protocol registration
Data sharing statement
Data are available on request from the corresponding author (
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
