Abstract
Cancer is responsible for approximately 7.6 million deaths per year worldwide. A 2012 survey in the United Kingdom found dramatic improvement in survival rates for childhood cancer because of increased participation in clinical trials. Unfortunately, overall patient participation in cancer clinical studies is low. A key logistical barrier to patient and physician participation is the time required for identification of appropriate clinical trials for individual patients. We introduce the Trial Prospector tool that supports end-to-end management of cancer clinical trial recruitment workflow with (a) structured entry of trial eligibility criteria, (b) automated extraction of patient data from multiple sources, (c) a scalable matching algorithm, and (d) interactive user interface (UI) for physicians with both matching results and a detailed explanation of causes for ineligibility of available trials. We report the results from deployment of Trial Prospector at the National Cancer Institute (NCI)-designated Case Comprehensive Cancer Center (Case CCC) with 1,367 clinical trial eligibility evaluations performed with 100% accuracy.
Keywords
Introduction
Cancer is one of the leading causes of death in the world with 7.6 million cancer deaths reported in 2008 by a GLOBOCAN survey. 1 Furthermore, the World Health Organization reports that the incidence of cancer cases is increasing.2,3 Clinical trials are essential to make advances in cancer treatment and improve patient outcomes. Leading national organizations, such as the American Cancer Society and National Comprehensive Cancer Network, recognize clinical trials as a key component of high-quality care.4,5 Clinical trials must enroll an adequate number of patients in order to evaluate the effectiveness of new treatments. 6 A recent report by Stiller et al. 7 found that survival rates for childhood cancer at the population level have dramatically improved in Britain from 1978 to 2005 because of increased participation in clinical trials. Unfortunately, the participation of adult cancer patients in clinical trials is very low; in the United States, estimates suggest that less than 5% of patients take part.8,9
Multiple barriers to clinical trial enrollment have been described with one major barrier being the time and effort required by physicians to identify active trials, review eligibility criteria, and then match them with specific patient data.6,10,11 Many studies have characterized patient and provider barriers to clinical trial participation, and recommended measures to improve patient awareness, access to trial information, and attitudes toward participation in clinical studies.8,9,11 In addition, difficulty for physicians in accessing trial information and protocol availability can also impede patient enrollment.
12
Although National Cancer Institute (NCI)-designated comprehensive cancer centers have high levels of provider motivation and study access, trial enrollment in these centers also rarely exceeds 15% for racial and ethnic minorities.
13
An important practical barrier that is being increasingly reported by physicians is a lack of time available for patient ascertainment and consent for clinical trials.
14
A common clinical practice for a physician is to manually screen potential clinical trials for a patient with additional time spent by research staff to collect patient data from multiple sources and coordinate enrollment.12,15,16 This is an extremely labor-intensive process that
increases the overall cost of clinical trials, whether conducted in community settings or academic medical centers; fails to leverage electronic health record (EHR) systems to automate different phases of patient screening; and introduces errors or leads to missed opportunities for enrollment as patient load and the number of trials increase.
Informatics-based approaches that incorporate formal representation of eligibility criteria, data from EHR systems, and accurate trial matching algorithms are important for automating many phases of the trial recruitment workflow and efficiently matching trials with patients.15,17 In addition, default matching of all visiting patients to available trials by an automated matching tool can help improve the participant diversity across economic and social categories.12,18 Automated matching tools can also be integrated with existing data management systems used in health care institutions (eg, patient scheduling systems) to leverage available resources while streamlining the trial recruitment process.16,19
In this paper, we introduce a scalable clinical trial eligibility-screening tool called Trial Prospector that has been deployed at the University Hospitals Seidman Cancer Center (UHSCC) of the NCI-designated Case Comprehensive Cancer Center (Case CCC) at Case Western Reserve University (CWRU), Cleveland, OH, USA.
Related Work: Computational Approaches to Match Clinical Trial Eligibility Criteria with Patient Data
Automated tools for clinical trial recruitment have been used in many medical specialties, including cardiology, 16 family medicine, 20 and Parkinson disease. 21 Several trial matching systems have been implemented in the cancer domain, such as caMatch, 22 the Agreement on Standardized Protocol Inclusion Requirements for Eligibility (ASPIRE), 23 and the Virginia Commonwealth University clinical trial system based on the OnCore tool (Forte Research Systems, Madison, WI, USA). 18 Many of these tools have been developed for accessing the institutional data warehouse to screen patients for clinical trials 24 or for creating an alert system for physicians.16,17 The OncoLink and Cancer.gov software tools have used the Web infrastructure to facilitate greater accessibility of trial information with additional features to connect potential recruits with the study investigators. 25 OncoLink is based on the National Colorectal Cancer Research Alliance (NCCRA) database and matched patients based on their demography and related information, 25 but it entailed significant effort in terms of manual entry of patient characteristics. The ASPIRE system used a set of disease codes, such as breast cancer and diabetes, to augment trial eligibility data to facilitate “keyword”-based search for matching trials at the City of Hope National Medical Center. 23
However, the existing systems have limited access to the complete patient information, such as the latest laboratory test results, and are not integrated with the clinical systems used in routine patient care. 26 Further, existing tools have limited support for structured entry of trial information, interactive user interfaces (UIs) that allow clinicians to review the matching results and re-execute the matching process with updates to patient records. In addition to development of trial matching tools, there has been extensive research in creating formal, computable representation of eligibility trial specifications that can be used together with electronic representation of patient data in EHR systems. 15 A review by Weng et al. 15 considers three aspects of formal representation for trial information: (a) expression language for trial criteria 27 (eg, Arden Syntax 28 and GELLO 29 ), (b) encoding of diagnosis information in trials (eg, Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT) 30 and Logical Observation Identifier Names and Codes (LOINC) vocabulary 31 ), and (c) representation of patient information (eg, Health Level 7 Reference Information Model 32 ). Recent work has found development of template set of eligibility criteria is feasible after comparing the representation of eligibility criteria in ClinicalTrials.gov and existing industry standards. 33 An important challenge for computational representation of trial information is the lack of suitable interfaces for entering eligibility criteria. The Trial Prospector system, which is described in the next section, includes the “Trial Builder” visual interface to facilitate structured entry of trial information.
Methods
Trial Prospector has been designed for easy integration with the existing clinical workflow and to facilitate patient recruitment in ongoing clinical trials using:
structured entry of clinical trial eligibility criteria using an intuitive visual interface; automatically integrated patient data from multiple existing data sources (eg, laboratory test results, diagnosis information, and demographic data); and a scalable matching algorithm for near real-time evaluation of patient data against active clinical trials. An interactive visual interface to review matches results with detailed description of exclusion conditions (both satisfied exclusion criteria and unsatisfied inclusion criteria) that disqualify a patient for specific trials.
Trial Prospector was developed in close collaboration with medical oncologists using an agile software engineering approach for rapid prototyping and iterative implementation of features based on user feedback. Using the Ruby-on-Rails technology stack, Trial Prospector was developed as a Web-based integrated environment (Fig. 1 illustrates the Trial Prospector architecture and its integration with external data sources). The Trial Prospector tool was reviewed by the UHSCC information technology support (ITS) team for data management practices to ensure compliance with the appropriate UHSCC policies on patient data access, privacy, and security. The following sections describe the four modules of Trial Prospector that have been developed to support the patient recruitment workflow.

Architecture of Trial Prospector illustrating interface with three external data sources.
Trial Builder: Intuitive Interface for Composing Clinical Trial Eligibility.
A key component of Trial Prospector is creation and maintenance of a clinical trial eligibility database. Formal representation of clinical trial eligibility criteria is an active area of research and has led to development of various approaches, including the use of clinical trial authoring tools.34–36 However, there are no standard data entry tools for composing executable eligibility criteria that include simple expressions as well as complex nested eligibility expressions. A complex eligibility criterion expression may include conditional statements, for example, “aspartate aminotransferase (AST/SGOT) to be less than five times of upper limit of normal if diagnosis is hepatocellular carcinoma (HCC).” These expressions may be composed of simple constructs that are connected by different types of connectives, such as “if, then” or “if, then, else.” The Trial Builder module has been developed by extending an existing visual UI called the Visual Aggregator and Explorer (VISAGE) 37 to support structured entry of eligibility criteria.
A Library of Eligibility Criteria Variables.
The Trial Builder module is prepopulated with a library of eligibility variables to represent various kinds of values used in defining an eligibility criterion. The current version of the library consists of 32 data elements that are classified into five categories:
Each variable in the eligibility library includes the variable identifier, the set of valid values, the minimum and maximum values, the unit of measurement, description about the subject of an eligibility criterion, and the “variable type.” Trial Builder uses four categories of variable types, namely, boolean (eg, gender), categorical (eg, race), dynamic categorical (eg, primary diagnosis), and continuous (eg, white blood cell (WBC) count). Figure 2A illustrates an example eligibility variable called “WBC count” that is defined to be a continuous value type with minimum 0 and maximum 100,000 units per microliter.

The Trial Builder module: (
Trial Builder UI.
A user (clinical trial investigator) can compose an eligibility checklist for each clinical trial using a multistep process that is supported by the subcomponents of Trial Builder UI (Fig. 2):
The output of Trial Builder is stored in a structured format to facilitate easier matching with patient data.
Real-Time Extraction and Integration of Multisource Patient Data.
A key requirement for an effective trial matching tool is access to the most recent patient data by effectively interfacing with existing hospital data management systems. The trial prospector clinical data extractor (CDE) module retrieves patient demography information, primary diagnosis, TNM classification, metastasis status, stage group, and the three most recent laboratory reports in real time from the following existing systems in UHSCC:
The CDE module ignores invalid data elements, such as unreported or cancelled tests, to retrieve the three most recent test results for each patient together with metadata for the test results, such as test date, test code, test order number, ordering physician, test result, test range, and test unit. The CDE module includes a dashboard interface to display both new requests and status of all past requests (Fig. 3). The extracted patient data and trial protocol information are used as inputs to the matching module.

The CDE module interface implemented by extending the Caisis tool interface.
Scalable Matching of Patients with Research Studies.
Trial Prospector uses a scalable matching algorithm in the patient study matching (PSM) module to evaluate the eligibility of a patient for a given trial (Fig. 4 illustrates the matching steps implemented in the algorithm). The algorithm includes conversion of laboratory test values to an appropriate unit that allows comparison of patient data with trial eligibility values. For example, the LIM system may store the WBC count per liter (L), while the trial protocols may specify the WBC count per microliter (μL). After data conversion, the algorithm parses and stores the trial constraints as inverted indices indexed on trial identifiers, which also include nested eligibility criteria such as “conditional trial constraints.” Conditional trial constraints define an appropriate value for a laboratory test in terms of patient diagnosis or demography information. For example, the alkaline phosphatase value for a phase II trial is defined to be less than 2.5 or 5 times the upper limit of normal alkaline phosphatase test value if the primary diagnosis for the patient is HCC. The use of customized data structures enables the PSM algorithm to perform fast lookup of multiple trial constraints and enables both lexical as well as numerical comparisons with patient data.

The scalable matching algorithm for identifying clinical trials for a patient.
The algorithm identifies a patient to be eligible for a clinical trial if the following two conditions are satisfied:
The patient demography, primary diagnosis, TNM, metastasis, and stage group information match the eligibility criteria. All the values of at least one laboratory test report, among the three most recent reports of a patient, satisfy all the eligibility criteria of the trial.
Thus, the algorithm ensures that the maximum possible set of eligible clinical trials is identified for a given patient. The physician makes the final decision whether to offer trial enrollment to the patient or not. The algorithm keeps track of the laboratory test results that do not match the trial criteria and makes them available for easy review by the physician. This feature ensures that patients are not excluded from participating in a trial based on one specific test result, which may be a transient event. To ensure scalability of the PSM module with increasing number of trials and patients, the algorithm uses a “greedy” approach to prune the search space by removing trials that fail to match a patient record from all subsequent comparisons with the patient data.
Interactive UI for Match Results.
A Web browser-based UI module was developed in close consultation with the end-user physicians. The physicians can search for patients by their name (first name, last name) or Medical Record Number (MRN) (Fig. 5A) shows a Screenshot of the initial search feature) and select an appropriate patient record. The results of the matching algorithm for the selected patient record with links to trial protocols are displayed in the UI module using four visual facets (Fig. 5B) displays Screenshots of the four facets):

The Trial Prospector UI. (
Explanation of Ineligibility of a Patient for Specific Trials.
Many decision support systems, reasoning tools, and question answering systems include a feature for representing explanations for a given decision or result, which is also called as “proof metadata.” 41 Unlike many existing matching tools, the PSM module of Trial Prospector not only enumerates the eligible and ineligible trials for a patient but also describes the specific reasons for exclusion from a trial. This allows physicians to efficiently review the match results and identify transient reasons for trial ineligibility. For example, a previously healthy patient may have been recently hospitalized for dehydration and acute kidney injury with an elevated creatinine of 2 mg/dL, which would lead to the patient's exclusion from a trial based on the elevated creatinine. However, when the physician reviews the report describing the last three laboratory test results in Trial Prospector, they would recognize that the elevated creatinine is an “outlier value” compared to the patient's baseline value. Thus, the “exclusion reasons” section of the UI module will allow the physician to still consider the patient for the trial (Fig. 6).

The detailed explanation for exclusion of a patient from specific trial.
Results
In order to assess the clinical utility of Trial Prospector, it was deployed at UHSCC and used by physicians in gastrointestinal oncology subspecialty clinics. We report the results of a pilot deployment of Trial Prospector between December 2012 and January 2013. During this time period, a Trial Prospector report was generated for each new patient visit and integrated into the clinician workflow. The UH Institutional Review Board (IRB) approval was obtained prior to performing the pilot deployment, and all data were managed in a HIPAA-compliant manner. Prior to the deployment of Trial Prospector, a control survey was administered to physicians to assess baseline characteristics. Once the control group was completed, physicians were provided training on the use of Trial Prospector. They were then given web-based access to Trial Prospector and asked to utilize it during new patient evaluations. Physicians were asked to complete a survey after each patient encounter and a summary-experience survey at the end of the pilot program. The surveys were created and administered using the research electronic data capture (REDCap) application. 42
Patient Eligibility.
During the development and pilot phase, a total of 85 Trial Prospector reports were generated. There were 15 GI/Phase I clinical trials that were included for matching purposes. A total of 1,367 clinical trials matching computations were performed by Trial Prospector, and a mean of 7.19 ± 3.41 eligible trials was identified per patient. Likewise, a mean of 8.89 ± 3.87 ineligible trials was identified per patient. Figure 7A displays the list of conditions that resulted in a patient being excluded from a clinical trial. The most common reasons for ineligibility were primary diagnosis and labs. A physician performed manual review of each report, and evaluation found that the matching algorithm was 100% accurate for the eligibility criteria included in this pilot study.

The results from deployment of Trial Prospector at the Seidman Cancer Center. (
User Evaluation Survey.
A total of 11 medical oncologists (6 attending physicians and 5 oncology fellows) participated in the pilot program. Trial prospector was deployed at the point of care for 60 new patient visits in the gastrointestinal oncology subspecialty clinics. During the study, a total of 14 control surveys, 60 patient-specific surveys, and 11 summary-experience surveys were completed with the physicians and oncology fellows. The results of the surveys are depicted in Tables 1 and 2 (these data were presented in part (poster presentation) at the Annual Meeting of the American Society of Clinical Oncology in June 2013).
Physician survey results for the control and Trial Prospector groups.
A summary of the user experience survey.
Oncologists reviewed the Trial Prospector report at the point of care during 95% of the new patient visits. Trial Prospector complemented the existing workflow to assess patient eligibility with 70% of participating oncologists spending 0–5 minutes, about 20% of the oncologists spending 6–10 minutes, and 10% of the oncologists spending 11–15 minutes. Furthermore, physicians reported that it saved time in identifying trials during 57.1% of the patient visits. Trial Prospector was a useful tool in the clinic with 72.7% of participating oncologists stating that Trial Prospector made it easier to find clinical trials for their patients. The majority of physicians found Trial Prospector to be accurate (75.9%), visibly pleasing (90.9%), and easy to use (100%). Clinical trial enrollment was similar between the two groups with 12.5% of the control group and 10.5% of the Trial Prospector group enrolling in a trial. Meaningful comparison is limited by the small sample sizes. Overall, 81.8% of the participating physicians would recommend Trial Prospector to other physicians for clinical trial eligibility screening.
Free text comments were solicited from users. The physicians reported that they liked the ease of use, auto-population of data, and detailed report of why a patient was ineligible for certain trials. 43 Several areas for improvement were also suggested, including the use of more eligibility criteria and stricter matching criteria by the matching algorithm. It was also suggested that phase I trials be automatically excluded from the report when the patient is being considered for adjuvant therapy. Overall, these results indicate that Trial Prospector is a feasible, accurate, and effective means to identify clinical trials for individual patients in a busy outpatient oncology clinic.
Performance Scalability Evaluation.
Trial Prospector is a scalable tool that efficiently computes matching reports over increasing number of both clinical trials and patients. Figure 7B shows the performance of Trial Prospector as the number of patients increase from 20 to 80 with 5, 10, and 15 trials. To support larger patient populations and number of trials, we propose to implement the CDE and PSM modules using parallel computing infrastructure to enhance the performance of both data extraction and matching.
Discussion
Trial Prospector is a practical, end-to-end automated tool that can be seamlessly incorporated into the clinical workflow and, thus, facilitate the enrollment of patients on clinical trials. We demonstrated that Trial Prospector increases accessibility to protocol details for physicians and, thus, reduces the workload of research staff through automated data extraction, integration, and matching processes. The feedback from physicians strongly encouraged further deployment of Trial Prospector. To facilitate wider deployment of Trial Prospector with support for multiple types of cancer, some additional features will be implemented during the next phase of development.
Integration with Patient Scheduling System and Trial Data Entry.
At present, a list of patients is manually entered into Trial Prospector using the CDE module interface by the research staff. This is a potential bottleneck in supporting patients across all oncology clinics, which can be addressed through integration with the patient scheduling system. The next phase of Trial Prospector development will include integration with the hospital scheduling system (eg, Athena). 44 Further, we propose to incorporate terms from the Eligibility Rule Grammar and Ontology (ERGO) project 45 to facilitate representation of trial information in a machine-readable format. We are also upgrading the clinical trial eligibility interface to improve ease of use for entering clinical trials information as new studies are activated or study amendments occur.
Use of NCI Thesaurus for Matching Primary Diagnosis.
There is significant variability in the use of terminology for reporting cancer diagnosis, which makes it difficult to accurately compare diagnosis term with the trial specification. For example, the trial may specify the primary diagnosis constraint as “pancreatic cancer,” but the patient record may use the term “adenocarcinoma of the pancreas.” Hence, a standard lexical comparison of the two primary diagnosis terms will result in ineligibility of the patient for the trial. This issue can be addressed by an ontology, such as the NCI Thesaurus, which classifies “adenocarcinoma of the pancreas” as a specific type of “pancreatic cancer.” We propose to use the Web Ontology Language (OWL2) 46 version of the NCI Thesaurus, which is available from the NCI Enterprise Vocabulary Services (EVS), in the PSM module for semantic matching.
The complete NCI Thesaurus is a large ontology with 60,000 concepts, which is not required for the current scope of Trial Prospector in the UHSCC gastrointestinal cancer clinic. We will extract a segment of the NCI Thesaurus class hierarchy rooted at “ncit: GastroInstestinal_Carcinoma,” which includes 193 classes representing various subtypes of gastrointestinal cancer using an automated script. Using the open-source OWLAPI java-based framework 47 together with open-source OWL reasoners (eg, Pellet 48 or HermiT 49 ), the PSM module will integrate the NCI Thesaurus into the primary diagnosis phase of the matching algorithm. The sub-sumption reasoning algorithm used in the PSM module will be based on the standard OWL2 semantics. 50
Limitations.
The user survey may not accurately capture clinical trial enrollment because some patients may have enrolled in a trial during a subsequent visit. Additional studies with larger patient numbers are required to accurately measure the effect of Trial Prospector on clinical trial enrollment and analyze other factors, such as patient characteristics and clinical trial availability, that may bear on treatment decision making. We plan to conduct a more comprehensive survey about the role of Trial Prospector after its deployment in all oncology clinics in UHSCC. Finally, we did not identify patients recruited through Trial Prospector who would not have been recruited otherwise; a prospective randomized clinical trial of Trial Prospector vs. standard management will be necessary to evaluate the effectiveness of Trial Prospector in affecting patient enrollment.
Conclusion
We described an automated tool that matches patients with ongoing clinical trials at the point of care. Physicians used this tool to facilitate patient enrollment in active clinical trials without disrupting existing clinical workflows. Trial Prospector features four loosely coupled modules to support end-to-end data management: (a) structured entry of trial eligibility criteria, (b) automatically extracting patient information from the CDM and LIM systems, (c) an algorithm that uses a scalable approach for matching patient data with eligibility criteria, and (d) an interactive UI for matching results that support updates to patient information for real-time rematch with trial data. In a pilot study of Trial Prospector, 1,367 clinical trial matching tests were performed. The feedback from the users, consisting of attending physicians and clinical fellows, has been very positive with strong recommendation for wider deployment of Trial Prospector.
Author Contributions
Conceived and designed the experiments: NJM, GQZ, SSS, ST, PM. Analyzed the data: AP, ST, PM, GQZ, NJM, SSS. Wrote the first draft of the manuscript: SSS, GQZ. Contributed to the writing of the manuscript: NJM, AP, JSBS, PM, ZL, LC, RL. Agree with manuscript results and conclusions: SSS, ST, AP, ZL, LC, PM, RL, JSBS, NJM, GQZ. Jointly developed the structure and arguments for the paper: SSS, GQZ, NJM, AP, PM. Made critical revisions and approved the final version: NJM, JSBS, GQZ, SSS. All authors reviewed and approved the final manuscript.
Footnotes
Acknowledgments
We acknowledge the contributions of Joseph Teagno, James Warfe, Christopher Opper, and Dawn Miller in the development of Trial Prospector.
