Abstract
This study aims at developing SuperOrder, an order recommendation system for outpatient clinics. Using the electronic health record data available at midnight, SuperOrder predicts the order contents for each upcoming appointment on a daily basis. A two-level prediction framework is proposed. At the base-level, the predictions are produced by aggregating three machine learning methods. The meta-level predictions are generated by integrating the base-level predictions with the order co-occurrence network. We used the retrospective data between 1 April 2014 and 31 March 2015 in pulmonary clinics from five hospital sites within a large rural health care facility in Pennsylvania to test the feasibility. With a decrease of 6 per cent in the precision, the improvement of the recall at the meta-level is approximately 20 per cent from the base-level. This demonstrates that the proposed order co-occurrence network helps in increasing the performance of order predictions. The implementation will bring a more effective and efficient way to place outpatient orders.
Keywords
Introduction
Orders placed by healthcare providers for care, treatments and diagnostic tests to patients are termed as clinical orders. Providers make these orders with their clinical knowledge considering patients’ demographics, medical history and current symptoms and diagnoses. To place a single order, it requires providers to manually type the order name or keywords to find the order from the search results. If providers look for multiple orders, this process needs to be repeated until all the desired orders are filled.1,2 This is a time-consuming process, and the resulting cognitive burden may lead to wrong, missed and duplicated orders. A number of studies have dedicated to stopping order errors by setting up alarms at the order entry to avoid drug–drug interactions and overdose.3–7 Other researchers have identified the areas of improvement for clinical ordering through investigating how the provider and patient characteristics affect their order patterns.8,9
In order to enhance clinical order efficiency and consistency, previous studies have focused on designing order sets, which are the collections of orders grouped by specific clinical purposes such as a certain type of diagnoses or procedures.1,2,10,11 Literature shows that well-developed order sets can save time in searching necessary and sufficient orders for patients and lower the variability in medical care.1,11,12 However, order sets require high cost of development, maintenance and updating.2,12–14 In addition, most of the order sets lack diverse patient conditions and consistency with the current practice, and therefore some studies show the low usage frequency of order sets.13–15 An alternative approach to reinforce the order effectiveness is to build an automated recommendation system for clinical orders, which is analogous to Amazon.com’s recommendation system where customers who bought A also bought B.13,15–20 Frequent itemset mining and association rule mining were used to identify which clinical orders are often placed together.15,17,19 However, only the associations between order themselves were considered and other factors were ignored such as patients’ medical history and current symptoms. To address this issue, some order recommendation systems were designed to incorporate other non-order clinical items such as diagnoses and lab results using Bayesian network.16,18,21 These schemes were limited to inpatient settings, in which a suggested list was generated after the initial diagnoses and orders were made. This concept is not applicable to outpatient clinics where a suggested order is demanded before the provider sees the patient since there is not enough time during the appointments.
This study aims at establishing SuperOrder, an automated order recommendation system in outpatient clinics, to predict the order contents of each appointment before a provider sees a patient. The predictions are generated based on the data from Electronic Health Records (EHR) available at midnight before the corresponding appointments occur. By giving a list of potential orders, SuperOrder will help to ease order documentation and increase order efficacy. With SuperOrder, the providers will have more time for patients, which will improve care quality and patient satisfaction. To test the system feasibility, we use 1-year EHR data in pulmonary-related outpatient clinics from five hospital sites at Geisinger Health System (Geisinger) which serves rural areas across northeastern and central Pennsylvania. The reason we choose data from pulmonary-related clinics is that order contents regarding pulmonary appointments are relatively simple compared to those in other specialties. Once the performance is validated, SuperOrder will be generalized to other specialties and benefit hospital-wide providers.
Methods
Study design
SuperOrder was developed using retrospective data in outpatient pulmonary clinics from 1 April 2014 to 31 March 2015, in which a total of 9815 patients and 18,348 appointments were involved. Pulmonary-related outpatient clinics consisted of three specialties, including pulmonary, thoracic medicine and thoracic surgery. In all, 42 providers, including 25 physicians, 3 physician assistants, 8 certified registered nurse practitioners, 5 fellows and 1 resident, were involved. There were six different order types, including laboratory tests, imaging tests, pulmonary function tests (PFTS), other tests, referrals and medications. The contents of medications involved drug dose and drug–drug interaction, which alone were complicated and should be a separate problem from the other types of orders, so in this article, we excluded medication orders. The total number of orders excluding medication orders was 34,816. The recommendation was made based on the data available at midnight before appointments, so we only considered predicting the orders made at the end of or after appointments. Orders might be cancelled or incomplete after placed due to variety of reasons. Since these cancelled and incomplete orders were truly under consideration of providers, in this study, these orders were considered the target orders as well. Same orders might be repeated in one appointment. Our objective was to predict whether an order has been placed or not, not the number of times it was ordered. During the study period, 724 different kinds of orders were collected. Most orders had only small volume and overlooking them had only small effect on information loss but improved computational efficiency. In consultation with physicians, we focused on the top 34 (4.5%) common orders (Table 1). Below is the process how these 34 orders were selected:
Top 34 common orders in pulmonary clinics.
We included all the orders with the order volume higher than 1 per cent.
Except sleep medicine referral, the order volume of all other referrals was less than 1 per cent (Table 2). To consider more referral orders, the first four most common referrals were included.
Electrocardiogram (EKG) was one of the common orders for pulmonary patients and was included in this study although the order volume was less than 1 per cent.
Appointments in pulmonary-related clinics.
CBC: complete blood count test; BMP: basic metabolic panel; AFB: acid-fast bacilli test; IgE: allergen-specific immunoglobulin E; PT/INR: prothrombin time and international normalized ratio test; NERAST: northeast regional allergy panel; APTT: activated partial thromboplastin time test; ABG: arterial blood gas test; ABO: blood group and type test; ANA: antinuclear antibody test; 25OHD: 25-hydroxy vitamin D test; RF: rheumatoid factor test; COMPR: comprehensive metabolic panel; GRAM: Gram stain test; AAT: Alpha-1 antitrypsin test; HEPFA; hepatic function panel; CHEST CT: chest computerized axial tomography scan; CXR: chest X-ray; ECHO: echocardiography; PET CT FULL: positron emission tomography – computed tomography (whole body); PFTS: pulmonary function tests; SMW: Six minute walk test; NHO: nocturnal home oximetry; PULSE OX: pulse oximeter test; DLCO: test of diffusing capacity of the lungs for carbon monoxide; ST: Stress test; RAW: plethysmography – airway resistance; TCC: tobacco cessation counsel; RESP: respiratory test; EKG: electrocardiogram; SLEEP MED REF: sleep medicine referral; OT REF: occupational therapy referral; CARD REF: cardiology referral; PULM REHAB REF: pulmonary rehabilitation referral. Bold-faced signifies names and numbers for higher-level categories.
The order volume of these 34 orders is 25,128 (72.2%) during the study period. Other detailed information regarding appointments and orders is shown in Table 2.
Model framework
Figure 1 shows the framework of SuperOrder, in which the order predictions are divided into two levels. At the base-level, the occurrences of different orders are assumed to be independent of each other, so that the predictions can be achieved by aggregating the outputs of a pool of individual classifiers. Each classifier predicts whether a given appointment contains a specific order. At the meta-level, the predictions are produced by integrating an order co-occurrence network with the base-level predictions. In the co-occurrence network, nodes relate to orders, and if two orders co-occur in one appointment, the nodes are connected by a link. Repeated co-occurrences strengthen the link between the particular nodes. The predicted orders at the base-level are matched with the nodes in the co-occurrence network, and the links are followed to extract the sub-graph. In the sub-graph, we can identify the new nodes (orders) that are missing from the base-level. We hypothesize that the meta-level predictions we have formulated can obtain a more complete and accurate order recommendation list than the base-level predictions.

Framework of SuperOrder.
The whole data set was split into two parts. The first 11-month data (01 April 2014–28 February 2015) were utilized for training each binary classifier, testing the classifier performance and deciding a cut-off threshold for each classifier. The first 11-month data were also used for constructing the order co-occurrence network. At the base-level, for each classifier, 80 per cent of the first 11-month data were randomly selected for training, and 20 per cent of them for testing the classifier performance and deciding the cut-off threshold. To avoid that some small-volume orders have too few samples in a test set, for each classifier, we forced 80 per cent of positive outcomes to be in the training set and 20 per cent of them to be in the test set under the random selection. This random sampling caused that different orders might have different training and testing sets. The last-month data (01 March 2015–31 March 2015) were used to test the performance of the aggregated predictions at the base-level and the meta-level.
Base-level predictions
At the base-level, each binary classifier was built by three methods including logistic regression (LR), random forest (RF) and gradient boosting method (GBM).22–24 For each order, we chose the method which produced the best performance on the corresponding test set in terms of the area under precision–recall (PRC) curve (AUPRC). 25 This method was denoted as Mixed method to distinguish this from purely using LR, RF and GBM in the remainder of this article. The next step was to determine a cut-off point of each classifier. SuperOrder was expected to have a precision no less than 0.3, so that for each classifier, we selected the cut-off point which had the maximum recall value given a precision of 0.3 on the test set. Recall and F-measure were the metrics to evaluate model’s performance. 26 Each binary classifier used the same pool of predictors (Table 3). The total number of the predictors was 152.
Predictors of SuperOrder.
PFTS: pulmonary function tests, COPD: Chronic Obstructive Pulmonary Disease.
Meta-level predictions
At the meta-level predictions, an order co-occurrence network was built to be directed and weighted. To derive edge weights in this network, we traced back to the bipartite network of orders and appointments. The order-appointment bipartite network was denoted as G = (U, V, E), where U = {u1, u2, …, u n } was a set of orders and V = {v1, v2, …, v m } was a set of appointments. E was a set of undirected and unweighted edges, which connected nodes in U with nodes in V. a (u i , v j ) was an adjacency matrix of network G, where
The order co-occurrence network,
In the weighting system,

Development of order co-occurrence network from bipartite order-appointment network.
This order co-occurrence network was used for identifying the potential orders missing in the base-level predictions due to the assumption of order independence. Suppose that appointment x was predicted to contain a set of predicted orders
By deciding a proper threshold for

Illustration of predicted score calculation at the meta-level.
Results
SuperOrder modelling was developed using R 2.3.2. The performances between LR, RF and GBM were compared across the 34 orders in terms of AUPRC. The best method for each order and the corresponding performance on the test set and last-month data set are shown in Table 4. In general, RF performs the best. For each classifier, although the performance may vary from the test set to the last-month data set, the average of F-measure over all the classifiers on the test sets is 0.35, which is very close to an average F-measure of 0.36 on the last-month data set.
Cut-off point of each order and the corresponding prediction performance.
AUPRC: area under precision–recall (PRC) curve; 25OHD: 25-hydroxy vitamin D test; RF: random forest; CBC: complete blood count test; BMP: basic metabolic panel; AFB: acid-fast bacilli test; IgE: allergen-specific immunoglobulin E; LR: logistic regression; PT/INR: prothrombin time and international normalized ratio test; NERAST: northeast regional allergy panel; APTT: activated partial thromboplastin time test; ABG: arterial blood gas test; ABO: blood group and type test; ANA: antinuclear antibody test; RF: rheumatoid factor test; COMPR: comprehensive metabolic panel; GRAM: Gram stain test; AAT: Alpha-1 antitrypsin test; HEPFA; hepatic function panel; CHEST CT: chest computerized axial tomography scan; CXR: chest X-ray; ECHO: echocardiography; PET CT FULL: positron emission tomography – computed tomography (whole body); PFTS: pulmonary function tests; SMW: Six minute walk test; GBM: gradient boosting method; RESP: respiratory test; EKG: electrocardiogram; NHO: nocturnal home oximetry; PULSE OX: pulse oximeter test; DLCO: test of diffusing capacity of the lungs for carbon monoxide; ST: Stress test; RAW: plethysmography – airway resistance; TCC: tobacco cessation counsel; CARD REF: cardiology referral; OT REF: occupational therapy referral; PULM REHAB REF: pulmonary rehabilitation referral; SLEEP MED REF: sleep medicine referral.
Table 5 illustrates the performance of aggregated predictions over all the classifiers in terms of micro-average precision, recall and F-measure on the last-month data set. It compares the predictions using Mixed method against those purely using LR, RF and GBM. The results show that all the methods have the precisions higher than 0.3, and Mixed method outperforms the other three methods in terms of F-measure. Other than the 34 orders, there are 690 types of orders excluded in the recommendation system. By including them as false negatives in the system and recalculating the recall and F-measure, Mixed method achieves a recall of 0.5232. It means that on average, a provider can find more than half of the items he or she wants to place for the patients on the list.
Performance at the base-level by different methods.
LR: logistic regression; RF: random forest; GBM: gradient boosting method.
To take the order dependence into account, the order co-occurrence network is built as shown in Figure 4, which contains 34 nodes and 1000 edges. The network was visualized using Gephi 9.2.0. The size of a node corresponds to its in-degree. Nodes with high in-degree, such as order ‘PFTS’ and ‘CHEST CT’, represent the orders that are usually placed with other orders by providers. The arrow size represents the edge weight from one node to another. The heaviest weight is 0.38, which occurs on the edge from node ‘ABG’ to ‘PFTS’, followed by the edge from node ‘SLEEP MED REF’ to ‘PFTS’ with a weight of 0.36.

Pulmonary order co-occurrence network.
Using the order co-occurrence network with a desired weighting threshold

Comparison of the meta-level against the base-level predictions in terms of precision and recall: (a) Mixed method, (b) logistic regression, (c) random forest and (d) gradient boosting method.
Figure 6 compares the F-measure of meta-level predictions against that of base-level predictions across all the four methods. Note that the results consider all 724 medical orders in pulmonary clinics. It shows that when Mixed method, LR and RF are adopted, the order co-occurrence network helps in improving the prediction performance. However, when we use GBM, the order co-occurrence network cannot enhance the prediction power.

F-measures for meta-level predictions and base-level predictions.
Discussion
SuperOrder, analogous to commercial recommendation systems, has been developed to predict the provider orders for upcoming outpatient appointments by statistical and graph mining techniques on the data from EHR. By embedding this prediction model in the EHR system, we aimed at easing provider order documentation and enhancing order effectiveness to improve care quality and patient satisfaction. Among these 724 different orders in pulmonary clinics, we focused on predicting 34 common orders which occupied 72 per cent of the total order volume in pulmonary-related clinics.
For these 34 common orders in pulmonary-related clinics, SuperOrder was designed as a two-level prediction model. At the base-level, the predictions were produced by aggregating the outputs of a pool of individual classifiers, where each classifier was to predict whether a given appointment contains a specific order. For each classifier, LR, RF and GBM were used to train the model, and the one producing the best performance on the corresponding test set is selected to be the method for that classifier. This framework was under the hypothesis that different methods used on different orders could provide the best prediction power. Table 4 shows this aggregated mechanism (Mixed method) did outperform the model with only one single prediction method. Under the circumstances that SuperOrder has a precision of no less than 0.3, the base-level predictions could achieve a recall of 0.5232, which meant that the base-level predictions correctly pointed out more than half of the orders that providers wanted to give to patients. This demonstrated that for half of the orders, instead of searching the keywords and finding the exact names of the orders one by one, providers can simply hit the orders shown on the recommendation list and submit the orders. This can reduce approximately half of the time previously required by providers for order placement.
After the base-level predictions generated, the order co-occurrence network was developed to incorporate order dependence and produce the meta-level predictions. The results from Figure 5 show that when Mixed method, LR and RF are used for the base-level predictions, the order co-occurrence network does increase the recall considerably with only little sacrifice in precision. This improvement is reaffirmed by comparing the F-measure of the meta-level predictions to that of the base-level predictions as shown in Figure 6. It proves the hypothesis that the meta-level predictions generate a more complete and accurate order recommendation list than the base-level predictions when Mixed method, LR and RF are used. However, when GBM is applied to produce the base-level predictions, the utilization of the order co-occurrence network cannot produce more accurate meta-level predictions, which violates our hypothesis. We speculate that this is caused by the insufficient accuracy of the base-level predictions by GBM. Of note, the meta-level predictions are generated based on the corresponding base-level predictions integrating with the order co-occurrence network. Therefore, the poor base-level predictions are very likely to lead the poor meta-level predictions even if an order co-occurrence network is well-developed. We conclude that the accurate enough base-level predictions are required for producing the better meta-level predictions with our order co-occurrence network.
As displayed in Figure 6, by using Mixed method, the meta-level predictions with
Limitations
This study has some limitations. First, the order volume of some procedures is highly fluctuating over time. This changing trend sometimes cannot be captured by the training set and test set. Figure 7 shows the four procedures that have the steep increase in order volume after January 2015. Since our study period for training and testing is the first 11 months, it is insufficient to detect the sudden increase in order volume in the next month. Therefore, when we test our prediction model on the last-month data (12th month), these four procedures have higher precision than expected but much lower recall. In addition, only 1-year patient data cannot distinguish if this increase is due to seasonality or new treatment introduction. In an attempt to solve this problem, we suggest that longer study period is needed (at least 2–3 years) to improve the prediction performance. Furthermore, SuperOrder has to be updated frequently in response to new treatment introduction or policy changes.

Order volumes of the four common procedures by month.
Although there are 724 different orders in pulmonary-related clinics during the study period, we only consider 34 common orders. Even if our predictions are all correct, the maximum recall we can achieve is 72 per cent. In order to improve the recall of SuperOrder without the loss of the precision, predicting the occurrence of other uncommon orders is necessary. How to predict these uncommon orders will be a challenge in the future. In our collected data, there was a data field named ‘order status’ indicating if an order was cancelled, sent, resulted or completed. However, many of the records had missing value in this field or showed that the orders were sent or resulted, which indicated that the order status was not updated after completion or cancellation. Therefore, we were unable to perform further research on cancelled orders. In future, if the information regarding order status can be more reliable, SuperOrder will incorporate the model to predict the likelihood of the orders being cancelled as well.
Conclusion
The abundant information in EHR data provides an opportunity to develop a better clinical decision support system by utilizing these data. This study has shown the viability of using EHR data to develop the order recommendation system in outpatient clinics that can automatically predict the orders that providers would like to place for patients. This order recommendation system, developed by integrating machine learning methods including LR, RF and GBM with a well-developed order co-occurrence network, is successful in predicting non-medication orders in pulmonary-related clinics. The implementation of this system will bring a more effective and efficient way to place medical orders for outpatients, so that providers will have more time to see patients, which can improve the care quality and patient satisfaction. This work implements one more step towards real-time clinical decision support tools on the basis of EHR data by using machine learning methods as well as graph-based techniques, which will help in providing better care for patients.
Footnotes
Acknowledgements
We would like to acknowledge Christopher Stromblad, Eric Reich and Rebecca Maff, who gave valuable feedback on the study.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship and/or publication of this article: Soundar Kumara is funded by Geisinger Health System to undertake Health Analytics Research. Yi-Shan Sung was employed by Geisinger Health System as a graduate student intern to conduct the study at Geisinger Health System. Ronald Dravenstott, Jonathan Darer and Priyantha Devapriya were also employed by Geisinger Health System when the study was being conducted.
