Machine-Learning-Based Prediction of a Missed Scheduled Clinical Appointment by Patients With Diabetes

Abstract

Background:

About 10% of patients with diabetes discontinue treatment, resulting in the progression of diabetes-related complications and reduced quality of life.

Objective:

The objective was to predict a missed clinical appointment (MA), which can lead to discontinued treatment for diabetes patients.

Methods:

A machine-learning algorithm was used to build a logistic regression model for MA predictions, with L2-norm regularization used to avoid over-fitting and 10-fold cross validation used to evaluate prediction performance. Data associated with patient MAs were extracted from electronic medical records and classified into two groups: one related to patients’ clinical condition (X1) and the other related to previous findings (X2). The records used were those of the University of Tokyo Hospital, and they included the history of 16 026 clinical appointments scheduled by 879 patients whose initial clinical visit had been made after January 1, 2004, who had diagnostic codes indicating diabetes, and whose HbA1c had been tested within 3 months after their initial visit. Records between April 1, 2011, and June 30, 2014, were inspected for a history of MAs.

Results:

The best predictor of MAs proved to be X1 + X2 (AUC = 0.958); precision and recall rates were, respectively, 0.757 and 0.659. Among all the appointment data, the day of the week when an appointment was made was most strongly associated with MA predictions (weight = 2.22).

Conclusions:

Our findings may provide information to help clinicians make timely interventions to avoid MAs.

Keywords

logistic regression model L2-norm regularization machine learning missed clinic appointment

There are now about 9 500 000 diabetic patients in Japan, 380 000 000 worldwide, and increasing.¹ It is vital for diabetic patients to start medical treatment early and keep regular hospital appointments to control their blood glucose level.^2-5 In Japan, about 10% of diabetic patients discontinue treatment, a major problem in diabetic care.⁵

Researchers have studied the factors related to treatment discontinuation,^2-5 classifying diabetic patients as those continuing and those discontinuing treatment. Using multifactorial data, differences between the two groups were identified as risk factors by testing their statistical significance. Factors like being “men” and “a person with a job” proved to be high risks for treatment discontinuation.⁵

This last set of researchers also found that an important risk factor, identified in a previous study, applied to patients in their own study who had discontinued treatment. This risk factor became operative when the amount of time after patients missed a scheduled clinical appointment (MA) crossed a particular threshold—between one day and six months depending on the purpose of each study. However, it’s not easy to set appropriate thresholds for outpatients whose hospital visits are at quite different intervals. In addition, external factors such as the amount of oral diabetic medications a patient had left (because patients feel no need to visit the clinic when they still have medications on hand) may affect a patient’s MA. Therefore, study of all factors that may impact MAs, leading to treatment discontinuation, must be as comprehensive as possible. Furthermore, even if patients at high risk of discontinuing treatment can be identified, clinicians need to identify the MAs that are most likely to lead to treatment discontinuation so they can know when they should intervene to encourage patients at high risk not to miss a predicted MA.

Accordingly, our study focused on diabetic patients’ scheduled clinical appointments to predict MAs most likely to trigger treatment discontinuation. From electronic health records (EHR), we built a prediction model of MAs by using a machine-learning algorithm. Analysis of MAs in the EHR allows for them to be predicted independent of how treatment discontinuation is defined. We included factors from previous studies related to treatment discontinuation and all external factors possibly related to discontinuation. The Institutional Review Board approved the study.

Methods

Nomenclature

“Factors,” which denotes patient characteristics (age, sex, etc.) and phenomena that can affect an MA (how and when an appointment was made, etc.), are called “features” when part of our mathematical models.

Model Design

We designed a logistic regression model that predicts a class $y \in {+ 1, - 1}$ from a feature vector $x$ of a patient’s target appointment, with $y$ representing a clinical appointment missed ( $y = + 1$ ) or kept ( $y = - 1$ ) and $x$ generated from the patient’s EHR, representing the time from the initial visit to just before the target appointment. We modeled the probability of an appointment with $x$ attributed to $y$ with a logistic regression, $(y | x; w) = \frac{1}{1 + e x p (- y w \cdot x)}$ , where w is a weight vector. The $w \cdot x$ denotes the scalar product of two vectors, $w$ and $x$ . We computed the w estimate by using training data { $(x_{1}$ , $y_{1})$ ,…, $(x_{n}$ , $y_{n})$ ,…, ( $x_{N}$ , $y_{N}$ )}.

In “Feature design” below, observed values of each quantitative variable, for example, blood test results, were linearly transformed (normalized) to make a variance of each variable = 1 and assigned to vector $x$ .

Modeling With L2-Norm Regularization

When the training data size, N, is smaller than the dimension of feature vectors, or a training data sampling is biased, maximum-likelihood estimation often overfits a logistic regression model into the training data, leading the model to classify many new appointments inaccurately.⁶ We used a machine learning software, Classias,⁷ which employs an L2-norm regularization method to mitigate overfitting and improve the model’s generalizability. With the regularization, we estimated w as $\hat{w} = a r g m a x_{w} {\sum_{n = 1}^{N} \log P (y_{n} | x_{n}; w) - λ {‖ w ‖}_{2}^{2}}$ , where the squared L2-norm of w, ${‖ w ‖}_{2}^{2}$ , is an L2-norm regularizer, acting as a mitigating penalty to provide large absolute weight values only to features occurring frequently in training data. The symbol $λ$ is a hyperparameter for regularization and needs to be tuned. Note that L2-norm regularization reduces to the maximum-likelihood estimation when $λ = 0$ .

Evaluation Method

The receiver operating characteristic (ROC) curve, area under the curve (AUC), precision, recall, and F-measure were used to measure prediction performance.

We used 10-fold cross-validation to assess how accurately the model performed for unknown data. Cross-validation partitions the whole data into 10 subsets and repeats the performance measure 10 times. Each time, one of the 10 subsets is used as the test data, the other subsets as the training data. The model was trained by using the training data and then was tested by using the test data. The final score was the average performance of all 10 performance measures. For each test, we varied the hyper parameter $λ$ of the regularization, as explained above, from 0.001 to 100, calculated the model’s measures for each $λ$ by using another 10-fold cross-validation on only the training data, and set the best one to $λ$ .

To determine each feature’s contribution to prediction in the model, we examined the absolute values of each feature’s weight learned by using the whole data. Using 10-fold cross-validation on the whole data, we set $λ$ at 0.5 for the highest AUC value. Note that each feature’s contribution was measured by its weight in the regularized model, not necessarily representing its importance as a risk factor. Its characteristic was different from the weight and odds measured by the maximum-likelihood estimation.

Feature Design

Features from a patient’s EHR were designed to fully cover factors related to treatment discontinuation, one set related to the patient’s clinical condition (X1, n = 29 025), the other to previous findings (X2, n = 97).

Table 1 shows examples of X1. To classify clinical conditions chronologically, we generated two features for each, one fulfilled within 90 days before the target appointment and the other without limit. The X1 group has numerous features but most with a zero value, hence a very sparse construction.

Table 1.

Examples of X1.

Category	Feature
Consultation	Past/recent consulted medical departments
Disease	Past/recent diagnosed diseases, recovered from/under treatment
Medicine	Past/recent prescribed medicines
	Number of doses taken per day

Table 2 lists examples of X2, most of whose features are quantitative variables. In designing X2, we referred to previous research such as examinations of discontinued treatments, interviews with specialists and patients, and human behavior analysis. Since previous research had identified sex, age, distance from house to hospital and travel time, and laboratory results as factors related to treatment discontinuation,^2-5 we used them, too. As an example of our design technique, because a diabetes specialist told us that for patients who often forget their medication, the specialist monitored compliance by comparing the intervals between scheduled clinical appointments, length of prescription term, and amount of medication the patient still had, we incorporated this calculation as a check on compliance. Because a diabetic patient told us that the very accumulation of experiences with medical treatment in general stimulated his motivation to continue diabetes treatments, we used the history of clinic visits and number of diagnosed diseases. Because human behavior analysis uses behavior patterns and external factors, we used the day of the week an appointment was scheduled. To classify behavior patterns by medical department, one feature included all departments, the other only the diabetes unit.

Table 2.

Examples of X2.

Category	Feature
Property	Sex, age
	Distance and time-distance from the house to the hospital by public transport
Consultation	Frequency of clinic visits
	Probability of visit on a given day of the week
Appointment	Interval between the date on which a clinical appointment was made and scheduled appointment date
	Day of the week when an appointment was made
	Whether the patient kept an appointment on the day it was made or missed the appointment without notice
	Weather on the appointment day
Medicine	Length of the prescription term of the latest medicine
	Total amount of medication a day
	How many times a day medication is taken
	Maximum size of prescribed tablets
Laboratory result	HbA1c, HDL-C, LDL-C, TG, and T-Cho

Class Design

Scheduled clinical diabetes appointments were classified as to whether they were kept or missed. The missed appointments (MAs) had been made but were missed on the scheduled day. The kept appointments (KAs) were kept on the scheduled day. Appointments—not patients—were analyzed because a patient who had missed an appointment at some point might have kept others, so if we had analyzed patients, we would have been unable to distinguish the status of “kept” or “missed” for that patient.

Appointments the patient changed to another day (and kept) were obviously eliminated from the MAs. Also, if a patient missed an appointment but had an appointment in the diabetes department for a later day—and kept it—the first appointment was removed from the MAs.

A few inaccuracies unavoidably tainted the MAs. For example, if patients changed clinics without notice but continued treatment or severe weather or traffic accidents prevented appointments from being kept, such cases do not involve “treatment discontinuation,” so they should be removed from the MAs, but they are so hard to detect in the EHR that they ended up included.

Results

Data Set

All prediction experiments were performed by using records from the University of Tokyo Hospital that included the history of 16 026 clinical appointments scheduled by 879 patients whose initial clinical visit had been made after January 1, 2004, who had diagnostic codes indicative of diabetes, and whose HbA1c had been tested within 3 months after their initial visit. The records were dated between April 1, 2011 and June 30, 2014. The 879 patients kept 15 104 of the 16 026 appointments scheduled, and the average number of appointments per patient was 17.4 with a standard deviation of 11.0. Hence, 922 appointments were missed by 285 patients, the average number per patient was 3.2, and the standard deviation was 3.9. Of the 285 patients, 207 had fewer MAs than the average (3.2), and their MAs accounted for about 40% of the total.

Prediction Performance

We examined the difference in prediction performance between models by using the X1, X2, and X1+X2 groups. Figure 1 and Tables 3 and 4 show the prediction performance. The X2-group model (AUC = 0.943) predicted the MAs with higher accuracy than did the X1 group (AUC = 0.912), and using both the X1 and X2 groups (AUC = 0.958) yielded even greater accuracy. The precision, recall, and F-measure of the model when using both X1 and X2 groups were, respectively, 0.757, 0.659, and 0.704. For reference, Tables 3 and 4 show, respectively, the AUC value and F-measure of the models learned by maximum-likelihood estimation ( $λ = 0$ ). The model using both X1 and X2 groups improved both the AUC value and F-measure by L2-norm regularization.

Figure 1.

ROC curve that measures the prediction performance of the logistic regression model with L2-norm regularization by using the X1, X2, and X1+X2 groups. The axes of the ROC curve consist of 1-specificity (TN/(FP+TN)) and sensitivity (TP/(TP+FN)), where TN = true negative, FP = false positive, TP = true positive, and FN = false negative.

Table 3.

AUC Value.

Feature	L2-norm regularization	Maximum-likelihood estimation
X1	0.912	0.902
X2	0.943	0.945
X1+X2	0.958	0.917

Table 4.

Precision, Recall, and F-measure.

	L2-norm regularization			Maximum-likelihood estimation
Features	Precision	Recall	F-measure	F-measure
X1	0.466	0.438	0.449	0.441
X2	0.750	0.577	0.652	0.650
X1+X2	0.757	0.659	0.704	0.587

Figures 2 and 3 show AUC and F-measure scores obtained with 10-fold cross-validation of the whole data when $λ$ was fixed to a value between 0.001 and 100 for training models. The models using the X1 group and both X1 and X2 groups reached the largest AUC and F-measure scores with $λ$ between 0.1 and 1. This result indicates that L2-norm regularization can improve prediction performance when an appropriate $λ$ is selected.

Figure 2.

AUC scores obtained with 10-fold cross-validation of the whole data when $λ$ was fixed to a value between 0.001 and 100 for training models. The AUC is the area under the ROC curve. The models reached the largest AUC score with $λ$ between 0.1 and 1.

Figure 3.

F-measure scores obtained with 10-fold cross-validation of the whole data when $λ$ was fixed to a value between 0.001 and 100 for training models. F-measure is the harmonic mean of precision and recall, where precision = (TP/(TP+FP)) and recall = (TP/(TP+FN)). The models reached the largest F-measure score with λ between 0.1 and 1.

Discussion

Ours is the first study to use machine learning to build a model that can predict MAs most likely to result in discontinuation of treatment for diabetes by using EHR, although some groups have used machine learning for various other aspects of diabetes research.^8-10 The accuracy of prediction was validated.

Prediction Model’s Performance and Future

Our model performed well in predicting MAs, with an AUC at 0.958—an AUC value over 0.9 is considered to be an excellent prediction score.¹¹ This promises to make the model a powerful tool for showing clinicians the suitable timing for intervening to keep patients continuing hospital visits.

Of course, earlier prediction of MAs is required to prevent discontinuation of treatment. We will need to improve our current model by adding new features such as changes of diabetes treatment over time and progression of diabetes. Furthermore, the quality of interaction perceived by the patient may be another factor determining whether the patient returned; we plan to evaluate this in a future model. If we can predict MAs early enough for clinicians to encourage diabetic patients to continue treatment, the number of MAs will decrease significantly, and diabetes care will be also improved.

One of our authors found that “DialBetics,” the telemedicine diabetes self-management system that can automatically give advice on the basis of a diabetic patient’s input and registry data, led patients to improve their dietary habits.^12,13 We will combine DialBetics with our model and investigate whether coordination of clinical appointments with our model more effectively motivates patients to keep clinical appointments.

General Applicability

Our model was evaluated with outpatient EHR from one hospital in Japan. To assess its general applicability, we must evaluate the model with patients at other hospitals in several different countries that have different demographics and regional characteristics. We will also evaluate our model with patients who suffer from other diseases, like asthma and other chronic diseases, that require similarly continual outpatient treatment, and we will investigate the differences in features.

Design of Features in the Model

The features we deemed possibly relevant to MAs (X2 group) rendered a higher prediction accuracy compared with those related to patients’ clinical condition (X1 group). This suggests that features derived from knowledge about and experience with the diabetes treatment of clinicians is more effective for improving accuracy in predicting MAs. Nevertheless, X1 is also valuable since use of both X1 and X2 resulted in a higher prediction accuracy. Therefore, in a future study, we will identify the most contributive features in X1 to construct a simpler model with fewer features.

Machine Learning of Features’ Weight

The highest prediction accuracy was obtained by the model constructed by machine learning without maximum-likelihood estimation but with L2 regularization using only the X1 group or both X1 and X2 groups. Since the number of features in these groups exceeds that of the training data, it carries a high risk of overtraining a part of the training data. We confirmed that L2 regularization reduced over-fitting well. The accuracy of the model that learned with L2 regularization was almost the same as that of the model constructed with only the X2 group using maximum-likelihood estimation because overfitting was effectively suppressed by regularization reducing the number of features to under 1% of the whole training data.

The appointment records we used were possibly uneven because of records obtained from patients who made many appointments. We will consider the relationship between prediction accuracy and the uniformity of appointment records.

Machine learning is generally applied to “bigger” data than we used in this study (16 026).¹⁴ The prediction accuracy will be improved by using more data because machine-learning generalization increases as the number of training data increases.¹⁵ SS-MIX2, which is currently being developed to standardize medical data stored in the EHR systems of hospitals, will enable us to use “big data” and improve the prediction accuracy of our model in the near future.^16,17

How Features Contribute to MA Predictions

Tables 5 and 6 show the top five qualitative and quantitative features, respectively. These features were given the largest absolute weights in our trained model and thus contributed strongly to MA prediction. We found that features related to when and how appointments were made, rather than to patients’ clinical condition, influenced the accuracy in predicting MAs. It would seem that irregularly scheduled appointments are more apt to be missed. We examined the largest qualitative/quantitative features in detail.

Table 5.

Qualitative Features That Contribute to the Prediction.

Qualitative features	Weight
Appointment was made on a Sunday	2.22
Appointment was scheduled for a Friday	1.91
Graves’ disease was diagnosed and recovered from	−1.36
Diabetic ketoacidosis with type 2 diabetes diagnosed and under treatment	1.32
Rilmazafone Hydrochloride Hydrate recently prescribed	1.29

Table 6.

Quantitative Features That Contribute to the Prediction.

Quantitative features	Weight
Frequency of visiting the clinic’s diabetes department in the past	−1.20
Number of medical departments consulted in the past	−0.999
Probability of the previous kept appointment was a Friday	0.965
Interval between when a clinical appointment was made and the date for which it was scheduled	−0.912
Probability of the previous appointment was made on a Monday	0.897

The largest qualitative feature was “Appointment made on a Sunday.” In all, 164 (about 1%) of total appointments were made on Sundays; 73 (44.5%) of those appointments were MAs. The percentage of MAs in the appointments made on Sunday was much higher than that of the total appointments (5.75%). Physicians do not usually make appointments on Sundays, so consultations on Sunday are very unlikely except for visits to the emergency room. This means that those 164 Sunday appointments were made for the patients who needed to have a follow-up visit and be examined by the physicians who examined them in the emergency room. It seems that some patients came on a Sunday because they had used up their medications: the prescription for their latest medicine of 39 of the 73 MAs was expired. Another possibility is that appointments made on Sundays, and appointments made only 8 days after a previous one, suggest a sicker patient or more complex condition. The reason the rest of the appointments were missed is not clear. Further investigation is needed to clarify why the appointments made on Sundays are more likely to be missed.

The qualitative feature “Appointment scheduled for a Friday” was second largest: 493 (17.4%) of 2831 appointments scheduled for a Friday were MAs. We examined the previous appointments of the KAs scheduled for a Friday and compared them with the previous appointments of MAs. Generally, the previous appointments of the KAs were likewise scheduled on Fridays, suggesting that KA patients incorporated visits to the hospital for diabetes treatment into their life as a matter of routine. In contrast, the previous appointments of MAs were less likely to have been scheduled on a Friday, suggesting that Friday was not their regular hospital-visit day. This suggests that the same day of the week as the patient’s previous hospital visits should be considered when making an appointment.

Last, we examined appointment intervals, which is the fourth largest quantitative feature (the third largest involved diabetes history). The average interval in the case of KAs was 45 days, while that in the case of MAs was only 8 days. Next appointments in the case of KAs are usually made on the day of patients’ hospital visits. Of course, the interval between appointments is dictated by a patient’s health status; but, in practice, it generally falls between one and two months. As expected, most MAs were made on a day when a patient did not visit the hospital according to our examination. This suggests that appointments should be made on days when patients have a hospital visit.

We will analyze other of the largest features related to clinical condition. For example, the feature, “Rilmazafone Hydrochloride Hydrate recently prescribed,” may indicate that those patients had a sleep disorder, which might suggest a useful hypothesis about why appointments are missed.

Conclusion

We designed a model that can predict with high accuracy when a diabetic patient is likely to miss a scheduled hospital appointment, possibly leading to discontinuation of treatment. This is the first study utilizing machine learning to design this kind of model. Our model should be a powerful tool for letting clinicians know when they must intervene to prevent treatment discontinuation. We plan to conduct a prospective predictive study based on our findings to establish the usefulness of the current findings.

Footnotes

Abbreviations

AUC, area under the ROC curve; EHR, electronic health record; KA, kept appointment; MA, missed clinical appointment; ROC, receiver operating characteristic.

Declaration of Conflicting Interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Kayo Waki, one of the authors, belongs to the Department of Ubiquitous Health Informatics, supported by the University of Tokyo and NTT DOCOMO, Inc.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded by the University of Tokyo and Nippon Telegraph and Telephone Corporation (NTT) as a joint research in the University of Tokyo COI (Center of Innovation), Sustainable Life Care, Ageless Society dedicated to Self-managing Healthcare in Aging Society of Japan.

References

Ministry of Health, Labour and Welfare. The National Health and Nutrition Survey in Japan [in Japanese]. 2012.

Karter

Parker

Moffet

. Missed appointments and poor glycemic control: an opportunity to identify high-risk diabetic patients. Med Care. 2004;42 (2):110-115.

Rhee

Slocum

Ziemer

. Patient adherence improves glycemic control. Diabetes Educ. 2005;31(2):240-250.

Currie

Peyrot

Morgan

. The impact of treatment noncompliance on mortality in people with type 2 diabetes. Diabetes Care. 2012;35(6):1279-1284.

Noda

Yamazaki

Hayashino

Izumi

Goto

Guideline against diabetes discontinue treatment [in Japanese]. 2014.

Bishop

CM.

Pattern Recognition and Machine Learning. New York, NY: Springer; 2006.

Okazaki

Classias: a collection of machine-learning algorithms for classification. 2009 Available at: http://www.chokkan.org/software/classias/. Accessed March 25, 2015.

Wang

Adrian

Chen

Wang

KM.

An improved electromagnetism-like mechanism algorithm and its application to the prediction of diabetes mellitus. J Biomed Inform. 2015;54:220-229.

Sudharsan

Peeples

Shomali

Hypoglycemia prediction using machine learning models for patients with type 2 diabetes. J Diabetes Sci Technol. 2015;9(1):86-90.

10.

Hazlehurst

Lawrence

Donahoo

. Automating assessment of lifestyle counseling in electronic health records. Am J Prev Med. 2014;46(5):457-464.

11.

Kleinbaum

Klein

Logistic Regression: A Self-Learning Text. New York: Springer; 2010.

12.

Waki

Fujita

Uchimura

. DialBetics: a novel smartphone-based self-management support system for type 2 diabetes patients. J Diabetes Sci Technol. 2014;8(2):209-215.

13.

Waki

Aizawa

Kato

. DialBetics with a multimedia food recording tool, FoodLog: smartphone-based self-management for type 2 diabetes. J Diabetes Sci Technol. 2015;9(3):534-540.

14.

Kaggle. Available at: http://www.kaggle.com/. Accessed March 25, 2015.

15.

Duda

Hart

Stork

DG.

Pattern Classification. New York, NY: John Wiley; 2000.

16.

Pathak

Bailey

Beebe

. Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium. J Am Med Inform Assoc. 2013;20(e2):e341-e348.

17.

Consortium for SS-MIX Dissemination and Promotion. Available at: http://www.ss-mix.org/. Accessed March 25, 2015.