Abstract
Objective:
Understanding the health and health service utilization of the population is critical for Regional Health System’s (RHS) population health management (PHM) initiatives in Singapore. The RHS database is a collaborative effort toward developing a national architecture for healthcare utilization data across diverse clinical systems with disparate data models. This manuscript describes the setup of an RHS database which would facilitate big data analytics for proactive population health management and health services research.
Materials and methods:
The RHS database is a conglomeration of four isolated databases from the three RHSs. It contains linked National Healthcare Group (NHG) polyclinic visit records, specialist outpatient clinic visit records, hospital discharge records from Tan Tock Seng Hospital (TTSH), National University Hospital (NUH) and Alexandra Hospital (AH), chronic disease management system (CDMS) records and mortality records from local registries. The data linkage process was conducted using the unique identification number (NRIC) as the linking variable. The final anonymized database has multiple interconnected tables that includes patient demographics, chronic disease and healthcare utilization information.
Results:
Over 2.8 million patients had contact with the three RHSs from 2008 to 2013. The database facilitated risk stratification of patients based on their past healthcare utilization and chronic disease information. This database aids in understanding the cross-utilization of healthcare services across the three RHSs and can help address the challenges of setting up a distinct geographical boundary for individual RHSs.
Conclusions:
The RHS database has been established with the intention to support the secondary use of administrative health data in health services research and proactive PHM in Singapore.
Keywords
Introduction
Singapore has one of the most successful health care systems in the world; it was ranked 6th in the World Health Organization’s ranking of the word’s health systems in 2000, 1 and was ranked as the most efficient health care system by Bloomberg in its second annual ranking of in 2014.2,3 Singapore’s healthcare system is a mix of public and private care financing: 80% of the primary healthcare services are provided by private practitioners while the government polyclinics provide the remaining 20%. However, for more costly hospitalization care, 80% is provided by the public sector and the remaining 20% by the private sector. 4
Healthcare delivery in Singapore has been largely focused on episodic care in acute hospitals. 5 While acute care must remain a significant part of all healthcare delivery systems, the growing demands of an aging population have made the current hospital-centric model unsustainable. 6 New population-oriented care models are emerging and they seek to provide care for patients across the entire healthcare spectrum, from the patients at risk of presenting at the hospitals to the patients with severe disabilities.7,8 In 2010, Ministry of Health (MOH) moved from the hospital-centric patient care model to a Regional Health System (RHS) framework for effective and efficient population health management (PHM). Currently there are six regional health clusters, Alexandra Heath (AH), Eastern Health Alliance (EHA), Jurong Health Services (JHS), National Healthcare Group (NHG), National University Health System (NUHS) and SingHealth (SH). Each regional healthcare system is anchored by a regional acute care hospital working with a variety of primary, intermediate and long-term care providers and support services to deliver patient-centric care. 9 The catchment area for each RHS, primarily defined through the Development Guide Plan (DGP) as the boundaries surrounding the anchoring hospital, was provided to the individual RHS from the MOH.
Recently, the MOH prioritized PHM and chartered the decentralized RHSs to develop PHM initiatives for better patient care within their health systems. Information technology (IT) has been identified worldwide as a critical element for supporting new models of patient care and for reducing errors and improving quality. 10 IT, and in particular a health information exchange (HIE), has the capacity to enhance the management of the health of populations by promoting the sharing of health information across independent healthcare organizations. 6 Studies worldwide have shown the usefulness of big data analytics for effective and efficient PHM.11–13 The NHG got together with NUHS and Alexandra Hospital (under JHS) to harness the power of big data in healthcare by building a database (known as the RHS database), linking healthcare utilization data from these three health systems for better patient care and health outcomes. This manuscript describes the development of an RHS database which would facilitate big data analytics for proactive PHM and health services research.
Methods
The RHS database, commissioned for use in 2014, was developed by the Health Services and Outcomes Research (HSOR) department of NHG with the support of the Integrated Health Information Systems (IHIS) department to facilitate PHM by studying the chronic disease distribution, health service utilization and cross-utilization of health services across the RHSs. The database was built on a Microsoft Structured Query Language (SQL) Server version 8.0 and loaded onto a data server (4CPU, 500 GB storage space and 16 GB memory), which resides in IHIS. The client access for research purposes was granted to HSOR. The database cost S$100,000 to develop and requires an additional S$25,000 annually for maintenance, inclusive of the hardware and software. The RHS database was planned for an annual update by populating current data for continued research and viability of longitudinal health services research studies.
RHS database
The RHS database is a relational data warehouse, which is a conglomeration of data marts from four sources: (a) hospital operational data source (ODS), (b) National Healthcare Group Polyclinic (NHGP) operational data source, (c) chronic disease management system (CDMS), 14 and (d) death registry (Figure 1). The primary care ODS was from NHGP, the acute care ODS from National University Hospital (NUH), Tan Tock Seng Hospital (TTSH) and Alexandra Hospital (AH), CDMS data from NHG and death data from local disease registry (Table 1). The database was approved by the respective institutions’ management, custodians and the ethics review board and permission were granted to the NHG HSOR team to use the administrative data for population health research after de-identifying patient information. The RHS database includes all patients who had sought care at the three regional hospitals (RH), NUH, TTSH and AH, or the nine NHGP in the period 2008–2013. Initial analyses showed that the data subset will contain less than 10 GB of data. A patient is included in the database if he/she sought care from a physician, a clinical visit without consultation of a physician, sought care at the NHGPs, day surgery (DS), specialist outpatient clinic (SOC) visit, emergency department (ED) visit or inpatient (IP) admission to any of the three hospitals. Patients included citizens, permanent residents and foreigners.

RHS data architecture.
Characteristics of the three core data marts included in the RHS database.
Data linking and anonymization protocol
All patient data across the healthcare providers mentioned above were stored in separate tables and were linked using patients’ National Registration Identity Card (NRIC). Since the database contains sensitive patient information, we de-identified all patient information, in accordance with Singapore’s personal data protection act (PDPA). A separate ‘patient key’ was generated for each patient for anonymization. Subsequently, NRICs were removed from the dataset before it was made available to HSOR for research purposes. Other patients’ data like detailed residential addresses were masked and available to the principal investigator (PI) only. Any update to the individual patient data was done by linking back to NRICs at the back end after approval from the PI (IHIS would facilitate this). Once the update was complete, patient data would be de-identified again before it is made available to the research team for analysis purposes.
Data governance and retrieval
HSOR is a custodian of the data, which at source, is owned by institutions (NUH, TTSH, AH, HPB, NHGP) and their representatives who are research collaborators. Even within HSOR, access to the database is limited to investigators who work on specific projects approved by the data owners (NUH, TTSH, AH, HPB, NHGP) and who are on the ethics board-approved list (based on the names mentioned during the Institution Review Board (IRB) application). The PI ensures that data is accessed for the purpose intended. HSOR analysts who answer PHM questions are given access to non-identified data.
Cross-functional development, analytics and maintenance
A cross-functional team of IT experts from IHIS as well as HSOR’s multi-disciplinary team, which comprises operations research specialists, epidemiologist, physicians and statisticians, was involved in the various stages of the RHS database development. The project involved meetings with key research stakeholders, data owners and system users to get the necessary approvals. IT experts from IHIS designed and developed the data architecture for the database (Figure 1) and were also involved in the testing and maintenance of the database system. HSOR data analysts carried out analyses using mathematical, statistical algorithms and visualization methods.
The RHS database is refreshed annually with IT support, funding for IT maintenance is provided by RHS office, NHG. Intermediate tables are created and maintained by a team of about five staff with skill sets in SQL, data management and statistics as and when needed. The PI directs the work of both progress and maintenance. Health services researchers with expertise in healthcare analytics and public health are needed to drive this initiative forward, working closely with clinicians and healthcare administrators.
Results
The data extraction for the RHS data mart was completed in 2013. The extracted data which were stored in nine tables in 297 columns, included general patient information, patient demographics, chronic disease data, diagnosis, ED attendance, inpatient and day surgery data, polyclinic attendance and SOC attendance. The RHS database provided an enumeration of the patient population who had utilized the services of the RHS institutions. Over 2.8 million unique patients had contact with the NNJ institutions (NHG, NUH, JGH) over a 6-year period from 2008 to 2013. There were a higher proportion of females in the 45–54 (16.8% vs. 14.8%), 55–64 (14.8% vs. 13.4%), 65–74 (8.8% vs. 7.4%), 75–84 (4.2% vs. 2.9%) and 85+ (1.0% vs. 0.4%) age groups. NHG had a higher proportion of patients over 45+ years and the average number of chronic diseases per patient was higher in NHG as compared with NUHS and AH (Table 2). Patients utilizing the NUH, TTSH and NHGP services have increased over the years (Figure 2). In 2013 alone about 300,000–400,000 patients had contact with the respective RHS.
Patient demographics for the year 2013.
includes TTSH, AMK, HOU, TP **includes NUH, BBK, CCK, CLM, JUR ***includes all three hospital and nine NHG polyclinics.

Patients growth rate from 2008 to 2013.
Healthcare utilization
By linking the primary and acute care utilization of the patients, we could study the patients who used both services for related or unrelated conditions. A detailed analysis of the research questions are under consideration for publication elsewhere.
Cross-utilization
The cross-utilization of patients was defined as utilization of healthcare service outside of the MOH boundary for the cluster. The RHS database aids in understanding the cross-utilization of healthcare services across the RHSs and the challenges of setting up distinct geographical boundaries for the RHSs, as patients from one region utilize the services of an RHS outside their residential region. This showed that the physical boundaries had simplified counting, but complicated attribution of care to RHS. Close to 3.8% of the patients cross-utilized inpatient services across the three hospitals from the year 2011 to 2013 (Figure 3(a)), while approximately 4.9% of the patients visited at least two hospital emergency departments during 2011–2013 (Figure 3(b)). Among the frequent admitters (FAs), 17.9% of the patient’s cross-utilized inpatient services from 2011 to 2013. These FAs (N=10,920) accounted for 15,788 cross-utilization episodes from 2011 to 2013. The top diagnosis-related group (DRG) for the FA cross-utilization was chronic obstructive airways disease, accounting for 838 inpatient episodes (Table 3).

Hospital inpatient cross-utilization from 2011 to 2013.

Hospital emergency department cross-utilization from 2011 to 2013.
Top primary diagnoses of frequent admitters who cross-utilized inpatient services, 2011–2013.
No. of frequent admitters: 10,920.
No. of FA who cross-utilize: 1952
No. of cross-inpatient episodes: 15,788.
Risk stratification
Although the database had only 6 years of data, it allowed for some retrospective longitudinal observation of patients’ healthcare utilization and mortality post-hospital discharge. The RHS database facilitated risk stratification of patients based on their past healthcare utilization and chronic disease distribution, and the implementation of targeted interventions for the various risk groups. This risk stratification will aid in developing integrated care services, which is one of the main intentions of PHM initiatives in Singapore.
The section below documents the data extraction methods and maps the process.
Extraction, transformation and loading mapping
Extraction, transformation and loading (ETL) mapping refers to the process of extracting data from source systems and bringing them into the data warehouse. The operational Data Store (ODS) server obtains data from the Systems, Applications and Products (SAP) system in the form of delimited text files. The SAP system sends the data files every day via a file transfer protocol (FTP). ETL jobs in ODS fetch these files and load them into the ODS data store. The data flows from SAP to ODS and then to the respective downstream data marts. Polyclinics store data in Oracle tables. The required data is directly fetched from their databases. SAP tool for Business Objects Data Services XIR4 (BODS) was used for ETL.
The raw data originate from the various hospitals and polyclinics. The ODS is an Oracle database that collects these raw data and stores them in a single storage place. The hospitals have a SAP system that sends IHIS the required data in the form of delimited text files. Transfer of files from SAP server to IHIS ETL server is done using a Secure File Transfer Protocol (SFTP). Polyclinics, on the other hand, have a mirror database of their daily transactional data. The IHIS ETL server connects to this mirror database to extract the required data.
Issues in data merging
We encountered a few issues during the data-merging exercise. The Singapore national identification number is tagged to the patients’ visa/residential status; the patient will have a different identification number should his residential status change from holding an employment pass to being a permanent resident. The ETL process was designed to identify and merge such cases. Various constraints were used to identify such cases, namely first and last name, and date of birth. After identifying such cases, a global patient key (alpha, numeric 10 digits) was assigned to each patient; this key was used for linking records with other tables in the data mart.
User acceptance testing
Extensive validation and verification testing was performed by IHIS on the RHS data mart, and each data column was tested for consistency. Extensive data profiling was done to come up with rules for merging patient data. A random sample of patient data (n=1000) was identified and tested for data accuracy by looking up their individual data in the raw files.
Discussion
The growing use of administrative data has increased the capacity of health services researchers in Singapore to provide evidence to inform healthcare administrators and policy makers on PHM strategies. However, the current health care data architecture in Singapore does not permit the study of healthcare utilization and healthcare services across the six RHSs in Singapore as a whole. This was the motivation to develop the RHS database, which merged all healthcare data from three RHS to get a holistic profile of the patient’s health and healthcare utilization and to understand the hospitals’ role within the RHSs and for the facilitation of PHM initiatives.
In addition, the following are some of the health services research questions that could be answered using the database:
Profile of patients in TTSH, JHS, NUH, NHGP by: (a) demographics, (b) existing chronic diseases and comorbidities (primary and secondary diagnoses), (c) geographical distribution, (d) health service utilization in primary care, ED, SOC, DS and inpatient, and (e) cost from the providers’ perspective.
The extent of cross-utilization by patients between the three RHSs (NHG, NUHS, JHS), especially among patients with chronic conditions.
Hospital–polyclinic affinity and its impact of patient flow and referrals.
Predictors of risk of hospital re-admission/death following discharge (re-admissions and death are competing risks).
Factors associated with the likelihood of becoming a frequent admitter.
What proportion of the frequent admitters return as frequent admitters in the following year and what are the primary reasons for their return?
Having a centralized database, though relatively expensive, is superior to standalone linking and analysis due to reduced errors and variations in analyses by researchers. The researchers from the three RHSs moved from silo databases to system-wide databases to capture all elements of health and healthcare utilization of the population in their respective catchments. A cross-functional team with considerable experience in using administrative data developed the information system architecture for the RHS database. The database has standardized data definitions and analysis processes. It provides a platform for researchers to extract data using SQL to conduct their own independent studies, post ethics approval. The database also allows interfacing capability for various statistical and visualization tools. However, linking data from various sources had technical, ethical and privacy issues, and the time and resources needed to gain approval from data custodians and a research ethics board to access the data should not be underestimated.
Studies worldwide2,15–18 and in Singapore19,20 have documented the usefulness of administrative data for applied health research, for studying health services utilization, program evaluation and social services research. Data collected for administrative purposes at the population level generally contain large numbers of records, include all events for a specific purpose and is more likely to be cost-effective relative to primary data collection.15,16 Their use in health and social services research has enhanced the ability of administrators and policymakers to obtain information, which can be used to evaluate the impact of services and programs on their populations.21–24 The RHS database derives data from various administrative datamarts; it would be a useful information source for studying population characteristics of the RHSs and their healthcare utilization among RHSs.
Singapore has one of the fastest aging populations in the world, with close to 20% of the resident population being older than 65 years by 2030. Elderly patients with an increasing number of chronic conditions and comorbidities often see a number of different providers of both social and health care services. Efforts to reduce unnecessary healthcare utilization by one provider may shift the resource use to another provider. The RHS database in its current form would aid in studying the healthcare utilization and cross-utilization comprehensively along the care continuum within the RHS. The knowledge thus gained would be useful in coordinating care across the health services and for risk stratifying and planning effective health interventions, health prevention and promotion programs that target both the sick and healthy patients. Data integration across RHSs would help get a holistic perspective of health service utilization in Singapore. Such a perspective would help project and plan health services and resources for future health needs of the Singapore.
Limitations
We acknowledge that this relational database has its limitations. Firstly, the RHS database is a conglomeration of data from various administrative databases, and the primary purpose for its collection is to support service delivery or program. Hence the RHS database would have some of the inherent limitations of using administrative data such as a lack of clinical information, test results, procedures conducted, detailed treatment regimens and socio-demographic information. Secondly, the RHS database in its current form includes only 10% of the primary care providers and 40% of acute care providers and covers only three out of the six regional health clusters; this may limit its power to analyse the cross-utilization of healthcare services across the RHSs. Hence, coverage is not nationwide. Nevertheless, over the 6-year period, this database has captured information for over 50% of Singapore’s population. Thirdly, administrative data in general includes potential misclassifications, under-reporting. Also, with the conversion of foreign patients to permanent residents, there is a possibility of double counting a few patients, although this was addressed as best as possible by the IT team handling the patient data merger.
Conclusion
Technology offers the opportunity to link the various healthcare administrative databases from silos to integrated systems, and to provide a systems-wide perspective of healthcare. This RHS database initiative is a step closer toward the national architecture for distributed healthcare utilization data across diverse clinical systems, it offers an opportunity for proactive PHM of the RHSs. The database development required a cross-functional team and the database provided new insights in understanding chronic disease distribution, healthcare utilization and cross-utilization of healthcare services across the RHSs.
Footnotes
Conflict of interest
None declared.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
