Abstract
Optimal medical decision making is challenged by uncertainty regarding both treatment effects and patient risks in the absence of treatment.1,2 Such uncertainties create even greater problems for situations in which decisions must be made sequentially over time, such as making medication recommendations for chronic conditions. Markov decision processes (MDPs) are well-established tools for aiding such sequential decision making.3-7 MDP models are frequently used in operations engineering and have potential applications in medicine, in which they can enable the decision maker to develop personalized treatment plans to optimize a patient’s health outcome based on the best available information, with due consideration for parameter uncertainty. However, using MDPs can be difficult for real-world clinical practice. They often require specialized software, such as MATLAB or CPLEX, 8 that are unavailable to most clinicians and hospitals. Using an MDP model often requires long computational times, especially when the medical problem studied has high dimensional state (many clinical variables) and action spaces (the possible decisions that could be made), which make look-up tables infeasible. 9 Moreover, the decision rules obtained with an MDP may be complex and not easily interpretable. Together, these limitations also create a black box for clinicians, limiting acceptance and making communicating the results of the MDP to patients difficult. Therefore, a tool that harnesses the improved decision making capabilities of an MDP model (without the burden of specialized software, long computation time, and complexity) would prove useful in the management of patients with chronic disease requiring serial decision-making.
One such chronic disease is hypertension, or high blood pressure (BP), which is a major risk factor for cardiovascular disease (CVD; consisting of heart attack, angina, stroke, and other heart and blood vessel diseases), the leading cause of death in the United States and worldwide.10-12 We have created and previously reported on hypertension treatment policies using an MDP model with the goal of maximizing a patient’s expected discounted quality-adjusted life years (QALYs), and we found that it saved significantly more QALYs than the Seventh Joint National Committee’s (JNC7’s) proposed treatment policy. 13 This was accomplished by combining the power of MDP operations research modeling with a benefit-based tailored treatment strategy,14-16 which simultaneously considers all known factors that affect an individual’s untreated risk and the treatment’s relative effects on those risks.
In this article, we develop a novel framework that approximates the optimal hypertension dosage decision from the MDP model using Poisson regression and generalized linear mixed effect modeling. We then compare the performance and accuracy of our approximation to the original MDP model.
Methods
All methods are described in more depth in the supplemental appendix.
Data Source and Study Design
We sampled 100,000 simulated patients from the Third National Health and Nutrition Examination Survey (NHANES III), 17 a survey of U.S. patients, to generate a representative sample of the U.S. population. NHANES III was chosen because BP was treated much less aggressively in the early 1990s, allowing for easier estimation of untreated BPs. Our data set included adults between the ages of 40 and 84 years that were examined over a 10-year horizon. The QALY MDP for CVD was solved for all 100,000 patients over a 10-year planning horizon (assuming a 0.001 disutility per prescribed medication) to obtain the MDP treatment policy for each patient. We also applied the current hypertension treatment guidelines in the United States, known as JNC7, 18 to each patient in the sample.
Markov Decision Process Formulation
We modeled the process of sequentially determining hypertension treatment medications over a planning horizon as a discrete-time, finite horizon MDP formulation. 19 The objective of the MDP was to determine the optimal treatment strategy π* for a single patient that maximizes his/her expected discounted QALYs over the planning horizon, t = 1, . . ., T. We considered a 10-year planning horizon with annual treatment decisions. The MDP formulation was characterized by four features: state space, action space, state transition probabilities, and rewards. Table A1 summarizes the inputs and data sources for the model.
State Space
The MDP utilizes a state-space representation of a single patient to fully describe his/her characteristics at each time period t = 1, . . ., T in the planning horizon of length T. The state st consists of demographic information, clinical observations, and the patient’s health state. The demographic information includes the patient’s age, sex, smoking status, and diabetes status. The clinical observations are measurements of the patient’s untreated systolic blood pressure (SBP), high-density lipoprotein (HDL), total cholesterol (TC), and if the patient has left ventricular hypertrophy as determined by electrocardiogram. Last, there are 10 mutually exclusive patient health states: 1) healthy (no history of coronary heart disease (CHD) or stroke); 2) history of CHD but no CHD event this period; 3) history of stroke but no stroke this period; 4) history of CHD and stroke but no adverse event this period; 5) survived a CHD event this period; 6) survived a stroke this period; 7) death from a non–CVD-related cause; 8) death from CHD event this period; 9) death from stroke this period; and 10) dead.
Action Space
The physician makes annual treatment decisions
Transition Probabilities
Before treatment decisions are made at the beginning of each decision period, the patient’s state st
is mapped to a pretreatment, one-period likelihood of CHD,
We further utilize the expected change in SBP to compute the relative risk reduction (RRR) for CHD,
In addition to the computing health state transition probabilities, we model the dynamics of the other elements of the patient’s state st over the planning horizon. To forecast the patient’s untreated SBP, HDL, and TC, we utilized linear regression on the NHANES III data. We considered the following candidate predictors for each linear regression model: intercept term, age, squared age, sex, smoking status, diabetes status, race (white, black, Hispanic), history of CHD, and history of stroke. Using Akaike information criterion (AIC), the best models for SBP, HDL, and TC were determined. Table A4 presents the coefficients for each model. We applied the linear regression model to each patient to forecast that patient’s SBP, HDL, and TC over the next 10 years. We adjusted the forecasts by applying the difference between the linear regression fitted value and the observed value in the NHANES data to all forecasted values (effectively changing the intercept term for each risk factor regression model).
The health state transition probabilities and linear regression models jointly determine the state transition probabilities
Rewards
Our objective is to maximize the patient’s expected discounted QALYs, that is, find the policy
where
Poisson Model Development
We split the full sample into a training set of 20,000 patients, a threshold set of 20,000 patients, and a testing set of 60,000 patients. Using the training set, we parameterized a Poisson regression model using linear mixed-effects modeling to account for the correlation of the sequential measurements taken to each patient during their annual examination, such as BP and cholesterol readings. The optimal medication count at each decision period from the MDP policy was used as the response variable, while demographic information and risk factor data were used as predictor variables. Poisson regression is a regression model where the outcome variable is a count variable (e.g., number of medications). By expressing the expected value of the outcome variable as an exponential function of a linear combination of predictor variables, Poisson regression guarantees that the predicted outcome is nonnegative—in contrast to traditional regression that would allow for predictions of negative medications. Linear mixed-effects modeling combines fixed effects (e.g., SBP, age, race) with random effects (e.g., patient). In biostatistics, the fixed effects represent population averages while the random effects represent patient-specific effects. Since our data contain multiple readings per patient, we require a random effect to control for the within patient correlation of outcome variables.
We considered two Poisson regression models: 1) full model and 2) risk-only. The full Poisson model used an initial candidate set of predictors of age, sex, smoking status, diabetes status, pretreatment SBP, diastolic blood pressure (DBP), HDL, TC, 5-year CVD risk (heart attack plus stroke) as computed by the Framingham risk calculator, 21 and an interaction term for each variable with CVD risk. The risk-only Poisson model just considered 5-year CVD risk, determined using the Framingham risk calculator, as a predictor. 21 The final set of predictors and regression coefficients of the full Poisson model were determined using the AIC.
Since the optimal medication count comes from solving a finite-horizon discounted MDP, the optimal policy is time-dependent. However, since age is a state variable of the MDP, which increases at the same rate as time, the optimal policy can be viewed as approximately age-dependent. Therefore, the time-dependence of the optimal policy is approximately accounted for by including age as a predictor in the Poisson regression models.
Number of Medications to Prescribe: The Rounding Threshold
Within the models, we determined the number of medications to prescribe under the Poisson policies examining a sequence of rounding thresholds (0 to 1 by 0.1 increments). The sequence of thresholds was first applied to each parameterized Poisson regression function in the threshold set of 20,000 patients. If the Poisson regression fitted value was less than or equal to the rounding threshold, we rounded down to the nearest integer. For each rounding threshold, we computed the expected QALYs per 1000 patients. The rounding threshold that maximized QALYs was set as the preferred threshold.
Feasibility Adjustments
Because many believe hypertension is especially dangerous at higher levels, we treated all patients whose BP was >150/90 mmHg. Also, since there is no evidence that BP treatment is beneficial at especially low levels, our models do not allow treatment if the BP is below 120/90 mmHg.
Model Evaluation
Given the preferred threshold found in the threshold set (n = 20,000), we evaluate the performance of the Poisson policy model in the testing set (n = 60,000). We computed the expected QALYs saved and expected number of CVD events prevented (when compared to no treatment) under the two Poisson policies, the optimal MDP policy, and the JNC7 guidelines. We also compared the number of medications prescribed by the Poisson and MDP policies to determine the rate at which the policies matched.
Sensitivity Analyses
To evaluate the robustness of the Poisson regression approach to approximating optimal MDP treatment policies, we also investigated how the Poisson policies performed when using a different CVD risk calculator, the ASCVD calculator. 31 This analysis allowed us to determine if the Poisson policy would be effective in other countries where the risk calculators are calibrated to a different population. We also evaluated the treatment policies under scenarios of CVD risk calculator miscalibration. Given the reliance on CVD risk as a predictor variable in both Poisson policies and the optimal MDP policy, it is important to evaluate the performance of these treatment policies when the risk computed by the calculator is ±25% off from the patient’s true CVD risk.
Results
Comparison of Sample Populations
Table 1 summarizes the patient characteristics of the training and testing set. We found that randomization was successful (i.e., no clinically significant differences between the populations).
Summary Statistics of Population
Note: IQR = interquartile range; SBP = systolic blood pressure; DBP = diastolic blood pressure; HDL = high-density lipoprotein; TC = total cholesterol.
Poisson Regression Analysis
We applied the Poisson policies to the testing set and compared the number of medications prescribed against the optimal MDP policy for each patient. Table 2 provides the percentage of decision periods where the Poisson policies exactly matched, prescribed fewer medications, and prescribed more medications than the optimal MDP policy. We found that the full Poisson model has 99% accuracy and a very low mean squared error (0.006), while the risk-only Poisson model (the model just considering the 5-year CVD risk as computed with the Framingham risk calculator) has 92% accuracy and a higher mean squared error (0.859). Further evidence of the accuracy of the Poisson policies is plotted in Figure 1, which shows the distribution of the number of medications for both Poisson policies and the optimal MDP policy. The frequency plotted is the number of times a patient was prescribed that number of medications for 1 year, based on the testing population of 60,000 patients followed for 10 years. The full Poisson policy and MDP policy do not have substantively different distributions. On the other hand, the risk-only Poisson policy differs significantly when the optimal number of medications to prescribe is 4 or 5.
Comparison of MDP and Poisson Treatment Policies
Note: MDP = Markov decision process.

Number of medications prescribed for MDP and Poisson treatment policies
Results in Table 2 and Figure 1 only examine discordant prescribing decisions, but a much more important issue is how different prescribing decisions, using the Poisson policies instead of the optimal MDP policy, affects health outcomes for patients. Table 3 provides a comparison of the expected QALYs and expected number of CVD events when prescribing decisions are based on the two Poisson policies, the MDP policy and JNC7. We found that using the full Poisson policy results in nearly identical health performance for patients compared to the optimal MDP policy. Furthermore, the reduction in accuracy to 92% for the risk-only Poisson policy only reduced CVD events prevented by 1.0 and QALYs saved by 5.4 per 1000 patients (10,000 patient-years), compared to the MDP policy. In contrast, the full Poisson policy prevents 17.9 fewer events and saves 109.2 more QALYs per 1000 patients when compared to the JNC7 treatment policy.
Health Performance of Treatment Policies
Note: CVD = cardiovascular disease; QALY = quality-adjusted life year; MDP = Markov decision process; JNC7 = Seventh Joint National Committee.
CVD events prevented when compared to no treatment.
QALYs saved when compared to no treatment.
Table A7 in the supplemental appendix presents the regression coefficients and P values of the full and risk-only Poisson models. The regression coefficients of the final set of predictor variables included in the full Poisson model were obtained using the AIC method. Based on these final regression models, we found that the optimal rounding threshold is 0, that is, always round the fitted value from the Poisson regression down to the nearest integer.
Sensitivity Analyses
To determine if the Poisson policies perform well when CVD risk is calculated based on a different population, we evaluated the performance of the four treatment policies when the ASCVD risk calculator is used to estimate CVD risk instead of the Framingham equations (see Table 4). We found that the improvement over JNC7 remained substantial for both CVD events prevented (17 fewer CVD events per 1000 patients) and QALYs gained for the full Poisson model and MDP policies when using the ASCVD calculator. The risk-only Poisson model continued to perform only slightly worse than the optimal MDP policy.
Health Performance of Treatment Policies Under ASCVD and Framingham Risk Calculator Miscalibration
Note: CVD = cardiovascular disease; QALY = quality-adjusted life year; MDP = Markov decision process; JNC7 = Seventh Joint National Committee.
We next examined the effects of risk calculator miscalibration, that is, the calculated CVD risk systematically over- or underestimates the patient’s true CVD risk. Table 4 reports the performance of the treatment policies under overestimation of CVD risk (+25% calibration error) and underestimation of CVD risk (−25% calibration error). For both over- and underestimation of true CVD risk, the Poisson policies yield higher expected CVD events prevented (22.2 and 13.5 fewer events per 1000 patients, respectively) and QALYs gained (136.2 and 81.4 more QALYs per 1000 patients, respectively) than JNC7. The differences between MDP and the full Poisson model remain negligible and the differences between MDP and the risk-only Poisson model remain small.
Discussion
Our analyses point the way for how decision analysis can become much more practical for daily clinical use, by finding that Poisson regression can be used effectively to approximate a fully optimized MDP model for the prescription of BP medications. The full Poisson model, which includes interactions effects, resulted in negligible differences in the treatment decisions and health outcomes compared to the optimal MDP treatment policy. The risk-only Poisson model (including only 5-year CVD risk as a factor) reduced CVD complications almost as much as the full Poisson model. Furthermore, both Poisson models dominated JNC7 treatment guidelines in QALYs and CVD events. We also found that the performance of the MDP and Poisson models was robust to the CVD risk calculator used (Framingham or ASCVD) and to calculator miscalibration (±25%). While not directly studied here, our QALY maximizing decisions likely lead to more cost-effective decisions than JNC7 given the high medical cost of CHD events and strokes and the low cost of BP medications.
More broadly, our study revealed that policy approximation can offer very comparable performance to MDP optimized treatment policies without the drawbacks of MDP models. MDP models generally require specialized software and lengthy computational times that would prohibit their use in many clinical settings. In this study, computational times were short (within seconds) but we required software capable of implementing recursive algorithms in order to solve the MDP. Approximations to these complex models limit these drawbacks by generating simple equations that can be solved nearly instantly using standard software or a calculator. With the advent of big data, decision support tools can develop approximations to the optimal treatment policy by training statistical models on very large data sets of patients. By doing so, the computational burden is removed from the end-user (i.e., the clinician or facility) and frontloaded to the decision support developer who has greater computational capabilities and access to the required software.
Furthermore, policy approximation improves the interpretability of the optimal treatment decision rules for clinicians. For Poisson regression, the coefficients of the predictors provide insight into how sensitive the optimal number of medications is to changes in patient risk factors. This insight may translate to improved trust of the decision support tool and increase its utilization and acceptance in clinical practice. This is particularly true for the risk-only Poisson model since there is only one regression coefficient to interpret, whereas the full Poisson model may have multicollinearity, which prevents direct interpretation of the sign and magnitude of the coefficients. For the risk-only Poisson model, the intercept coefficient is 1.167 and the coefficient for 5-year CV risk is 1.006. The positive sign of the risk coefficient implies that increases in CV risk lead to increases in the number of medication prescribed. This matches physician intuition and guidelines, which bolsters the trustworthiness of the Poisson regression. The intercept coefficient can be interpreted to mean the risk-only Poisson model would prescribe at least 3 medications to patients (before applying the minimum SBP threshold constraint). The risk coefficient implies that the patient’s pretreatment 5-year CV risk would need to be at least 22% (44%) to recommend 4 (5) BP medications. The full list of the Poisson models’ coefficients is available in the supplemental appendix.
In general, our policy approximation methodology could be applied to any MDP formulation developed with a large structured dataset. For instance, the optimal prescription planning derived with MDP formulations for patients with type 2 diabetes 20 or for patients with heart disease 7 could be approximated using Poisson regression, as long as there is enough data to fit the model. Our methodology can also be applied to other objective functions, including maximizing cost-effectiveness and minimizing the number of CVD events, and application domains.
Our study’s limitations derive mainly from questions about the inputs in our models. Most notably, the precise details of who benefits from BP lowering are debatable. Our modeling choices, however, are based on the best available clinical trial data from an individual-level meta-analysis 32 and fit the SPRINT Trial well, which found treatment benefit to 120 mmHg when using 3 to 4 BP medications in very high CVD risk patients. 15 We chose to base our comparison on the JNC7, 18 because American practitioners often know it well, and JNC8 was not a consensus guideline and is unlikely to be relevant for much longer. Efforts are already underway to update these guidelines, and our results suggest that our benefit-based tailored treatment approach should at least be strongly considered in deliberations of how to update BP guidelines post-SPRINT. 14
Critics of using CVD risk in clinical decision making have pointed to risk prediction tools’ well-known susceptibility to calibration problems when developed in one population and applied to different populations. 33 Our results, however, demonstrate that a benefit-based treatment approach remains superior to traditional treatment target–based approaches, such as JNC7, even in the presence of substantial calibration problems. Of course, it is always preferable to use a risk prediction tool that is well calibrated to the patient population being treated, and in the era of the modern electronic medical records, it is becoming much more feasible for integrated health systems to develop or recalibrate risk tools to be optimally fitted to their own population. 34 Furthermore, critics have pointed out that risk prediction tools with similar discrimination, as measured by the area under the receiver operating curve, often given substantively different risk estimates for individual patients. Our study demonstrates that whether the Framingham and ASCVD tool was used,21,31 the Poisson models prevented many more CVD events and saved many more QALYs compared to the JNC7 approach.
Conclusion
We developed a Poisson regression model to approximate the optimal treatment decisions from an MDP model. We found that our policy approximation enables fast, easily interpretable, and comparable decision support without a need for specialized software. Our application to hypertension treatment planning for patients at risk for CVD indicates high fidelity to optimal treatment policies and high performance. Treatment policy approximation resulted in more QALYs and fewer CVD events than current clinical practice. Furthermore, the improvement in health outcomes over JNC7 was robust to which risk calculator was used, as well as to systematic calculator error. While this study specifically addressed hypertension treatment, we believe the methodology of using Poisson regression to approximate fully optimized MDP policies has the potential to reduce computational time, improve acceptability by clinicians, and maintain health-optimization performance for treatment decisions for chronic diseases more generally.
Footnotes
Presented in part at the 2015 Institute for Operations Research and the Management Sciences (INFORMS) Annual Meeting.
Financial support for this study was provided in part (for JBS and RAH) by grants from the Department of Veterans Affairs HSR&D (IIR 06-253 and CDA 13-021) and the Michigan Center for Diabetes Translational Research (NIDDK of The National Institutes of Health [P60 DK-20572]). The funding agreement ensured the authors’ independence in designing the study, interpreting the data, writing, and publishing the report.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
