Abstract
Keywords
Introduction
An estimated 233,900 new cancer cases and 85,100 cancer deaths are expected in Canada in 2022. 1 In addition to its impact of the disease on the health of the population, cancer brings a significant financial burden, with cancer-related costs in Canada reaching an estimated CAD 26.2 billion in 2021. The province of Newfoundland and Labrador (NL) poses key demographic challenges, such as a rapidly aging and dispersed population over a larger geographic area.2,3 NL has some of the country’s highest rates of chronic disease, with more than 60% of individuals having at least one chronic illness. 4 These challenges contribute to the burden of cancer in the province, with NL having the highest incidence and cancer-related mortality rates in Canada. 5 The province also spends more per capita on health care than any other province; an estimated CAD 3.5 billion will be spent in 2022, representing approximately 38% of the entire provincial budget. 6
To provide optimal care for all these individuals and cope with the financial and population health burden, collected data must be used to observe how interventions and treatments impact patient outcomes; assess our health system performance, and translate information in a timely and reliable manner. As individuals working in the health system and specifically the Provincial Cancer Care Program (PCCP) we have recognized the importance of having a health system capable of such assessments requires integrating information systems that are valid, reliable, comprehensive, and accessible. Currently, cancer care data sources in NL are dispersed among multiple data systems in varying formats, structures, and data schemas. Data are collected through various means, from manual input to fully automated processes. Many of the PCCP data systems connect and communicate with one another through application programming interfaces (APIs); however, this architecture is somewhat complex, cumbersome, and, in extreme cases, prone to errors. As individuals who rely on the data we recognized the need to coordinate the data sources into a single, scalable, reliable and secure system was essential to improve patient care in NL.
The objectives of this article are to highlight and describe the implementation of a new health information technology system that is harmonizing cancer care data in NL. The Health Connect system we describe was identified and built within the PCCP to directly impact care in NL. In doing so we outline mid-level technical details of this technology; provide concrete examples of how this technology is helping to improve cancer care in the province and to discuss its future expansion and implications. The processes we outline may be beneficial to other jurisdictions that currently deal with their own data challenges both in cancer care and the health system more broadly.
Background
NL, the easternmost province in Canada, is home to a population of roughly 525,000 people 3 distributed over 405,000 square kilometers. NL has two distinct geographical regions, the island of Newfoundland, where approximately 95% of the population resides, the remaining 5% live in the larger, more remote region of Labrador. NL is largely sparse and contains a significant rural and remote population. Approximately 50% of its population resides within 100 km of its capital and largest city, St. John’s. The remaining proportion lives in smaller communities, many with a population of 1000 people or less, scattered primarily around the province’s coastline. Labrador is home to the remote region of the province; communities along its Northern and part of the Southern coast are only accessible by boat or plane, and access can be limited during the winter due to inclement conditions.
Similar to the rest of Canada, the NL healthcare system is a publicly funded system designed to ensure all residents have access to medical services and is overseen by the provincial Department of Health and Community Services (DHCS). The delivery of healthcare services in NL is currently managed by a single autonomous health authority, Newfoundland and Labrador Health Services (NLHS). NLHS is further divided into five regional zones: Eastern Urban (EUZ), Eastern Rural (ERZ), Central (CZ), Western (WZ) and Labrador Grenfell (LGZ; Figure 1). EUZ is the largest RHA and manages healthcare services for approximately 50% of the population.
2
NLHS is also responsible for managing and delivering cancer care services through the PCCP. Regional zone boundaries.
The PCCP provides systemic therapy, radiation therapy, population-based cancer screening, clinical trials, and patient outreach services. 7 Four clinics are located throughout the province, one in EUZ, two in CZ, and one in WZ. All clinics provide systemic therapy services to patients, while the EUZ clinic is currently the only location to deliver radiation therapy services; however, a future radiation therapy clinic is currently under construction for the WZ site. As part of the screening programs offered, PCCP offers population-based breast, colon, and cervical screening programs. Additional services PCCP provides its patients are nutritional services, social work, pharmacy, and consultative care services. 7 The variety of services provided by PCCP contributes to a rich patient information environment.
NL cancer care registry
Newfoundland and labrador cancer care registry and related data sources.
Some repositories within the registry are disaggregated and stored in varying formats and locations within the information ecosystem, posing challenges to data linkage and assessment of data quality, completeness, accessibility, and timeliness. These ‘Data silos’ create inefficiencies in how information is used for reporting and evidence-based decision-making 11 which impacts patient care directly, as information from separate repositories must be gathered individually, and time-consuming chart reviews must be performed for health care providers to consolidate unstructured patient information for care delivery. 12 Furthermore, this also impacts our health system by placing an unnecessary workload on providers and preventing the linkages and information required for health system surveillance. 13
Unstructured data, which are not arranged according to a pre-set data model or schema, make up a significant proportion of the data within the NLCCR in forms such as scanned documents and free text, making it challenging to analyze. Information stored in clinical findings reports, physician notes, and screening results make it challenging to analyze without resource-intense manual data entry and coding and, hence, prevents robust and timely health system surveillance. This issue is not unique to the PCCP and poses a challenge to many organizations within and outside the healthcare sector.14,15 It is estimated that as much as 80% of global healthcare data remain stored in an unstructured format 15 and, therefore, cannot be included in analyses aimed at improving patient outcomes, assessing operational performance, or contributing to academic literature.
Solution
In 2019, as a first step towards integrating its data repositories, the PCCP partnered with the consulting firm DeloitteTM to modernize the information technology systems that enabled the province’s three population-based cancer screening programs. Deloitte’s proprietary information technology solution, ConvergeHEALTH™ Connect was used to create the PCCP’s Health Connect system (HC). HC is designed to empower healthcare providers to transform health systems that provide multi-channel access to care across a continuum and provide real-time insights into healthcare programs that can be used to proactively address population health problems. HC became operational in 2020 and has integrated the three population-based screening data repositories (PBSD) of the NLCCR: BSD, CSD and CvSD with the PCR, a system of record for population demographics. The implementation of HC has vastly improved the integration and reporting of cancer screening outcomes. The system is also integrated into the provincial Electronic Health Record (EHR), which allows for the flow of screening test results into HC, and continues to grow and build on its current structure.
With HC in place, the harmonization of both structured and unstructured data has begun. The harmonization of data is of paramount importance,12,16 and has allowed for efficient, automated processes that monitor progress and set the best course of action in a timely and meaningful manner. Integrating data sources into a centralized repository (i.e., a Data Lake [DL])17,18 will allow for improved accessibility. Structuring data in this manner also allows for flexibility to use cutting-edge, comprehensive data modeling strategies, such as natural language processing (NLP) and other artificial intelligence (AI) algorithms, to glean insights that otherwise would be restricted given traditional data systems.19,20 Over the past decade, NLP and AI methods have become increasingly important in the healthcare sector.21–23
Methods
Health connect architecture
As HC is an operational database, it uses several components and integration patterns to consolidate data securely. HC comprises several cloud-based technology components, including but not limited to (1) The Salesforce Health Cloud (SFHC) Patient Relationship Management platform and the Deloitte Digital ConvergeHEALTH for Public Sector (CHC) managed package. SFHC provides the core case management functionality that could be applied to various healthcare services. CHC layers on solution components specific to disease screening programs. SFHC and CHC manage package support in the delivery of screening services and the collection of cancer data for registry purposes. (2) An array of computing, storage, and networking services that are provided by Amazon Web Services (AWS); (3) Mirth Connect (MC) by NextGen Healthcare for interoperability with the PCR and EHR Figure 2. Solution architecture for health connect.
Integration of the data sources within HC
Demographic information in HC is currently managed by a daily batch extract, transform, and load process. This process identifies records in SFHC that need to be created, modified, or deactivated. A comma-separated values file is dropped daily to a secure file transfer protocol (SFTP) with an advanced encryption standard (AES) using a 512-bit key length onto a server on AWS containing the single-best view of demographic information available within the PCR. A set of scripts written in Python hosted on an AWS virtual machine using a Linux operating system, compares the file with that from the previous day. Based on this daily delta, an upsert of records into the Salesforce platform.
Building an integrated view of the patient history was aided by defining a standard data schema within SFHC. The SFHC data model contains numerous database objects and record types, but most critically, for integrated screening program delivery, it includes: 1. Person account: represents patients and health care providers; 2. Participant screening episode (PSE): represents the complete view of a participant’s interaction with a screening program and the outcome. The PSE is created when screen eligibility is determined, and a screening pathway is opened. The screening pathway monitors a screening client’s state at different points through the screening episode. The pathway states offer a snapshot of the complete view of the episode at any point in time. Individual clinical transaction are captured through diagnostic screening reports (DSRs-that are linked to either screening or diagnostic pathways). Since screening episodes are linked to clinical pathways and pathways are linked to individual diagnostic screening reports, the screening episode captures the complete view. Screening episodes with pathways states and DSR’s can be reported upon in the aggregate to capture key performance indicators such as abnormal call rates, cancer detection rates and positive predictive values. 3. Clinical pathways: represents discrete workflows and steps associated with the delivery of screening and diagnostic services; 4. Diagnostic/screening results: represents laboratory, diagnostic imaging, and endoscopy data pertinent to each screening program.
Once the standard data schema was established in SFHC, the migration of data from legacy systems involved the following steps: 1. Analysis of the legacy data schemas and to the new standard data model; 2. Mapping data fields between legacy systems and SFHC, defining a methodology for accurately matching records across systems and managing any exceptions; 3. Extraction and cleansing of data from legacy systems; 4. Upsert of cleansed records into SFHC via its built-in data loading tools; and, 5. Conformance and validation testing, review, and resolution of any exceptions.
Diagnostic/Screening Results are created via integration with the provincial EHR utilizing the HL7v2 format, a common health interoperability standard to support ongoing screening operations. MC receives these raw messages, and transforms by extracting data relevant to the PCCP and maps it to corresponding fields in SFHC. MC then delivers an outbound message to SFHC via REST API, creating a clinical record on the corresponding patient account record.
Analytics, NLP and ML algorithms
Amazon Comprehend Medical (ACM) is an NLP service that uses machine learning (ML) to extract relevant outcomes from the unstructured medical text.
24
ACM also delivers robust mapping to standard medical ontologies such as ICD-10-CM or RxNorm. The PCCP utilizes ACM to assist in the secondary processing of pathology results to identify those associated with a cancer diagnosis. HC includes components specifically designed for advanced analytics and including (1) Tableau, an end-to-end analytics and data visualization platform, hosted either natively within SFHC or on a AWS-based virtual machine; (2) Integrations to pre-built NLP and ML algorithms available on AWS; and (3) use of various AWS tools for networking, data ingestion, database, security, identity and access management (Figure 3). Mammography diagnostic/screening results: machine learning proof-of-concept solution architecture.
AWS Appflow is used to extract records from SFHC, creating staging databases. Batch analysis jobs are created from the ACM Service to identify medical entities and infer ICD-10-CM codes from within the source input data. For each detected potential medical entity, the ML algorithm lists the matching ICD-10-CM codes and descriptions listed in descending order of confidence, along with the confidence scores. The scores indicate confidence in the accuracy of the entities matched to the concepts. The analysis output is sent to a separate output staging database, which can be queried and analyzed in Tableau. Within Tableau, the ML interpretation of the unstructured clinical data is compared to a human interpretation to determine the sensitivity and specificity of the algorithm at various confidence thresholds and any requirements for secondary processing. Analysis results are then reviewed with program and clinical leadership to refine the methodology and determine whether the algorithm can and should be incorporated into PCCP service delivery or population health surveillance workflows.
The specifics of incorporating an ML algorithm into PCCP workflows depend on the requirements of each use case and involve creating an API endpoint within the PCCP Virtual Private Cloud on AWS using a Private Link service. The endpoint allows the SFHC to query the ACM Service in real-time securely. Creation of event-based outbound API calls from within SFHC (e.g., when a new record that requires classification is created). Development and testing of workflow automation within SFHC to action the results of the ML algorithm (e.g., modifying records, flagging exceptions requiring human review).
Results
Achievements of HC systems
To date, HC has been implemented for approximately 2 years. During this time, there have been several examples and applications to signify its ability to transform cancer data, harmonize data systems, and improve cancer care services and efficiency of operations, we highlight two of the many benefits HC has brought forth.
Colon screening dashboard
In 2020 the colon screening program worked with Canadian Partnerships Against Cancer (CPAC) on a project that focused on developing and implementing strategies to enhance and increase population-based colon cancer screening participation. The project was divided into two phases; HC played an essential role in Phase 1. HC was used to build a Tableau dashboard highlighting colon screening uptake in geographical areas within NL. It allowed screening programs to pinpoint exact areas whereby screening uptake was low, analyze the information using dashboard analytical tools such as summary statistics and heat maps, to better understand why uptake was lower in those regions; set forth best practices through education, and provide necessary interventions to help remedy a solution. The dashboard has been used to monitor colon screening trends, set benchmarks, and implement continuous quality assurance and improvement procedures for the PCCP. Figure 4 highlights some of the specific measures and methods used for the dashboard. Once such improvement process involved addressing an issue with test kit non-compliance in a specific area of the province that was significantly higher than the provincial average. For the first time, the analytic dashboard allowed the colon screening program to analyze at the micro-level and identify barriers and challenges to screening participation. The analytic work led to community engagement and a change in process, such as the development of picture-based, wordless instructions, which have reduced the non-compliance from a pre-intervention observed rate of 13% to a post-intervention rate of 6% (results unpublished). Newfoundland and labrador colon screening program analytics dashboard example.
Natural language processing – breast mammography reports
HC’s AI and ML engines have been used to automate workflows and provide AI-assisted decision support for breast mammography services in the province. An NLP model was trained whereby text from approximately 7000 templated reports was analyzed to predict an appropriate follow-up action. Program clerks, who had received internal training by radiologists to read and codify reports, reviewed each report to assess clinical findings and label each report into three distinct categories: (1) Normal 2 years – result returned no suspicious findings, and the patient is recommended to re-screen within 2 years; (2) Normal 1 year – result returned a possible suspicion or patient deemed higher risk and patient is recommended to re-screen within 1 year, and (3) abnormal finding and further diagnostic testing are required. Based on the expert panel review, each report was tagged with the appropriate label. The labels were considered the ‘gold-standard’ outcome in which the NLP algorithm was trained. Breast mammography reports were then structured and standardized into a dataset containing two fields: (1) unstructured summary section (i.e., the feature) and (2) labelled outcomes. The mammography reports were randomly split into approximately 90% training dataset (6466 records), and the remaining 10% was reserved for validation (808 records). ACM was used to build a supervised NLP algorithm on the training data to ‘learn’ how to correctly classify each report into one of the three categories listed above based primarily on information entered into the unstructured clinical findings report. The algorithm was then tested against the remaining 10% and assessed its overall ability to predict correct classification groups using standard metrics, such as accuracy and recall. Furthermore, testing was completed by running the algorithm on a novel dataset containing 5185 records. Similar metrics were run on this set to evaluate the robustness of the NLP algorithm. Preliminary results showed the overall accuracy to predict the proper classification was 99.48%, and the average recall was approximately 94% and ranged from 92% for the Normal 1-year group to 98% for the abnormal group (results currently unpublished).
Discussion
HC is a constantly evolving health information system that aims to connect all cancer care data sources and pull key data elements from each to create a holistic view of patient care throughout the cancer care spectrum, from screening and prevention to treatment, follow-up, and survivorship. Although HC was initially set up to collect and analyze population-based screening data, current work is ongoing to add other pertinent cancer data into the system. PCCP is currently working to migrate electronic pathology reporting data into the HC ecosystem. Having this information pipelined into HC will allow for up-to-date pathology reports on patients, timely access to this information, and in conjunction with other cancer related information, will help form a comprehensive electronic health record of a patient’s cancer health information, increasing productivity of workflows and offer safer ways to care for patients. 25
The long-term vision of HC is to integrate all cancer care data holdings from screening, registry, and treatment data to build a comprehensive electronic health record for patients who use cancer care services to aid in evidence-based reporting in the province. Further data integration will involve migrating solid tumour registry data in HC, such as case ascertainment, tumour characteristics data (e.g., morphology, histology, and stage), vital status, and other demographic information such as date of diagnosis and age at diagnosis. Additional integration will allow for up-to-date, standardized cancer surveillance reporting and continue to build on existing analytics within the HC environment. Recently, the Canadian Partnership Against Cancer (CPAC) and the Canadian Cancer Society (CCS) released a pan-Canadian Cancer Data Strategy (pCCDS) . 26 This strategy highlighted the importance of ensuring high-quality, integrated data to better support decision making in cancer care nationwide. A critical gap identified in the pCCDS was the timeliness of case ascertainment and staging and how efforts are needed to improve in these areas. HC is a technology that can used to positively contribute to the goals set forth by the pCCDS.
Cyberattacks are of severe nature and can have damaging effects on the healthcare system.27,28 The health sector has become 29 a target for cyberattacks, and the number of attacks has been increasing in recent years with a broader impact on the overall system. 30 On October 30, 2021, EH experienced the most significant cyberattack in Canadian history.31,32 This was a ransomware attack whereby an attacker(s) infiltrated the network security protocols and permanently blocked access to the system unless a ransom was paid. The outcome was crippling as many IT systems where personal health information (PHI) resided were compromised. Many systems were taken offline for an extended period so that a comprehensive investigation could be undertaken to assess the degree of the attack. 33 The attack profoundly affected the healthcare system in NL; many patient services, such as clinic visits, surgeries, and treatments, were postponed for an extended period. Other services (including PCCP) had to resort to paper-based charts to continue treating patients and not delay further treatment. Data sources within NLCCR, such as TSR, STDB and RTDB, were offline and could not be used until it was deemed safe. However, due to its cloud-based system and layers of data security protocols, HC was one of the only data systems unaffected by the attack, as no information was compromised. Highlighting that using health assessment technology tools such as HC will provide a means to safely and securely store PHI and that appropriate mechanisms are in place to ensure the security of PHI into the future. 33
HC is a unique health technology assessment tool, and, to the best of our knowledge, few solutions exist from a Canadian context that is similar in scope and scale. Alberta Health Services (AHS) is a fully integrated health system serving the province of Alberta and is in the mid-stages of implementing a single province-wide EMR. 34 As of November 2022, the implementation had already occurred in Cancer treatment facilities (Cancer Care Alberta) and most large acute care facilities. This EMR was the impetus for developing a central cancer data warehouse, the Data Environment for Cancer Inquiries and Decisions (DECIDe). 34 DECIDe utilizes the provincial EMR and integrates it with the previous cancer-specific EMR and Cancer Registry with regular expansion to other data systems in the future, ensuring efficient access, continuity of data, timeliness to allow researchers and stakeholders for improving cancer outcomes and improve experiences for people facing cancer. DECIDe will connect multiple cancer data systems such as treatment data, cancer surgeries, lab., tumour registry, and patient-reported and relevant treatment data from outside cancer care (e.g. surgery). DECIDe will allow cancer analysts to fulfill data requests and support planning, performance measurement, quality improvement, and research 34 DECIDe’s current phases will not include cancer screening information and do not contain AI and ML packages to perform advanced analytics for reporting and research purposes. Since Alberta’s new EMR is automated and provincial minimal requirements for AI/ML are not relevant to obtain clinical information.
Processes established through HC will increase documentation and access efficiency and enable knowledge translation due to the reporting tools built into HC. Some process changes have been discussed above, and with further implementation of HC, the PCCP will be able to continue creating new efficient processes. For example, with the integration of the key solid tumour registry data fields, PCCP can link patient screening information with registry data to establish a history of a cancer diagnosis. This information will build on developing a comprehensive health profile for patients using cancer care services, better understanding potential risks at the patient level, and developing targeted screening strategies to improve patient outcomes.35–37
This project has natural scalability and extensibility, meaning that the groundwork completed for the current project has the potential to benefit other data repositories within our system; this can extend to further applications such as assessing mental health based on unstructured records, detecting and predicting adverse medical events 38 and predicting cancer prognosis. 39 NLP algorithms developed under HC are built within an agile framework and can be directly applied to other unstructured data, such as provincial treatment data, to further enhance the integration, harmonization and sustainability of cancer care data in the province.
Study limitations
Currently, HC contains PBSD and electronic pathology data only. HC does not store other critical cancer data such as chemotherapy and radiation treatment data, tumour registry, pharmacy, and patient-reported outcomes. In addition, data sources such as cancer-related surgeries and primary health information fall outside the purview of PCCP, and processes to obtain this data to be stored inside HC may either be convoluted or not feasible due to data governance and custodianship limitations. Access to such data sources would be essential to building a comprehensive cancer care data repository for reporting and quality assurance purposes.40–42
The inability to capture gender and other vital socio-economic factors (e.g., income, ethnicity, education level, etc.) is a limiting factor. This technology can provide an opportunity to be proactive in using external data sources, such as census data, to link and capture socio-economic factors. Future plans for HC algorithms to possibly capture gender-based characteristics to help address gender-based differences that may exist for cancer care pathways. 43
While HC provides world-class, user-friendly solutions for ML and AI algorithms, the methods employed are somewhat a ‘black-box’, 44 meaning the user does not have complete control or understanding of the mechanisms used by the algorithms to ‘learn’ from the data. Increasing interest to better explain and understand AI’s inner workings has emerged recently.44–47 This improved understanding can translate to the practical usage of AI models for more comprehensive analysis.
This research demonstrated how HC has harmonized data and provided operational benefits at a Canadian provincial cancer agency; however, the benefit was not quantified in the work. Future empirical studies are warranted to comprehensively test it effectiveness.
Conclusion
In this manuscript we have outlined the building and application of HC within the NL PCCP. In our experience, HC’s ability to replace three isolated screening application systems into a single, functional consolidated system has streamlined operations; improved the efficiency of the workflow; allowed for real-time reporting and benchmarking; and created a standardized process for data collection, harmonization, and ensuring quality and accuracy of cancer screening data holdings. HC is a world-class, integrated health systems solution for collecting, harmonizing, and analyzing cancer screening data for practical decision-making. Our purpose was to provide a description of a novel technology implemented in operational use at a provincial cancer agency, to not formally test/assess a research question (i.e. hypothesis), rather to demonstrate improved effectiveness of this technology.
Footnotes
Author contributions
(1) JJD contributed to the overall drafting, review and editing of the manuscript; (2) RWP JMS, and MD assisted in drafting the manuscript, reviewing, editing and providing feedback; (3) GD, SA, and DB assisted in editing the manuscript and providing feedback.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This work was funded by the Canadian Cancer Society Data Transformation Grant #707597.
