Abstract
Health savings accounts (HSAs) are tax-advantaged savings vehicles available to people enrolled in high-deductible health plans (HDHPs), health plans with higher deductibles and therefore greater consumer cost-sharing than other plans. HSAs allow annual pretax contributions and tax-free withdrawals to pay for qualified out-of-pocket (OOP) medical expenses. The unused HSA balance carries over year to year, and the funds may be invested and accrue interest over time. HSAs were introduced by the 2003 Medicare Modernization Act to complement HDHPs as a strategy for curbing rising health care costs. The pairing of HDHPs and HSAs was based on the hypothesis that assigning individuals the responsibility for a greater portion of their health care costs may incentivize them to be more prudent in their health care spending. Each year, HSA account holders must decide how much money to contribute to their HSA, up to an annual contribution limit set by the federal government.
Like most health plans, HDHPs are characterized by the following parameters: 1) the consumer must cover annual costs up to a certain deductible; 2) the consumer must cover a coinsurance rate (percentage) of costs above the deductible; and 3) the total annual OOP payments is capped by some OOP maximum. HDHPs are generally distinguished from other plans by their higher deductibles and lower premiums. In particular, HSA-qualifying HDHPs must meet a federally mandated minimum deductible level ($1,300 for individuals, $2,600 for families in 2017) and maximum OOP max value ($6,550 for individuals, $13,100 for families in 2017). These plans may be adopted either on an individual or family basis (though each family member must carry their own individual HSA), and plan parameters vary widely. For example, for family plans in 2017, the proportion of plans with deductibles in the ranges of {$2,000-$2,999, $3,000-$3,999, $4,000-$4,999, $5,000-$5,999, $6,000+} were {13%, 35%, 15%, 14%, 23%}. 1
While the merit of HDHPs as a health care cost containment strategy continues to be the subject of debate, 2 HDHP/HSA adoption has grown rapidly since their introduction, and they have commanded the attention of employers as a cost-curbing mechanism. HDHP plan enrollment has grown from about 1 million in 2005 to over 20 million in 2016, 3 and from 8% of the workforce in 2009 to 24% in 2015. 4 Additionally, a 2018 survey found that 70% of large employers offered at least one HDHP, up from 60% just one year prior. 5
In the face of this rapid adoption, a key conclusion of a recent comprehensive review of HDHPs is that there is a lack of tools to assist consumers in decision making with regard to such plans. 6 To fill this gap, this article addresses a key question for HSA adopters: How should a household budget for its HSA contributions from year to year? To our knowledge, this is the first article to examine this question. Additionally, the article also seeks to address a related question faced by policy makers: Do current contribution limits provide all households with the flexibility to use HSAs efficiently, and if not, which alternative should be adopted by policy makers?
Regarding the first question, we model a household as relying upon its HSA to cover OOP expenses over a 30-year period, and therefore seeking to balance excessive pretax HSA contributions against posttax expenditures due to insufficient HSA balance. We hence seek the contribution amount that minimizes the household’s total expected discounted expenditures throughout the contribution period. We consider two types of policies and use simulations to evaluate their relative performance.
Regarding the second question, the federal government imposes an annual contribution limit on both individuals and households. i We assess whether the current limits enable households to adhere to the best performing contribution policy examined in the first question. If not, we examine how the government should relax this limit, and whether the resulting impact on tax receipts would exceed what is currently allowed.
Analyzing the questions above requires a model for the evolution of household medical expenditures from year-to-year. We develop a cost evolution model for this purpose whose performance is competitive with leading commercial models. While there is considerable literature on modeling ii health care costs, 11 a novel aspect of our approach is to model the transitions of a household’s health care cost percentile from one year to the next rather than model actual costs directly. We discuss the advantage of this approach in Methods.
Methods
Data Description
Our dataset is from the Medical Expenditure Panel Survey,
12
a set of large-scale surveys of families and individuals drawn from a nationally representative subset of households in the United States. It is based on an overlapping panel design in which data on medical expenditures (and other information) are collected for two consecutive calendar years from each household. Henceforth, panel
As the unit of analysis, we used Health Insurance Eligibility Units (HIEUs), “sub-family relationship units constructed to include adults plus those family members who would typically be eligible for coverage under the adults’ private health insurance family plans.” For panel
We divided the dataset into three samples: Training (Panels 7 to 14, corresponding to year pairs {2002, 2003} through {2009, 2010}), validation (Panels 15 and 16, corresponding to year pairs {2010, 2011} and {2011, 2012}), and test (Panels 17 and 18, corresponding to year pairs {2012, 2013} and {2013, 2014}). The training set was used to estimate all model parameters described below, and exploratory analyses were performed on the validation set to arrive at the chosen model. The chosen model was then refit to the combined training and validation datasets, and then used to predict the second year costs in the test set. All data preparation and model estimation was performed using SAS.
We develop two versions of the cost evolution model described further below: basic and expanded. The basic version uses a small set of covariates and is well suited for long-term planning as it does not require modeling the evolution of a large set of demographic and clinical variables. The expanded version includes a broader set of covariates and is well suited for generating more accurate short-term predictions that could have other applications as well. The covariates are the following:
Basic model: Household size, percentage of individuals within the HIEU of each insurance type (private, public, private + public, uninsured*). The values marked with an asterisk (*) denote the baseline values within each group.
Expanded model: This model additionally includes the following: % of individuals within the household of each sex (male, female*), age group (0-19, 20-39*, 40-59, 60+), race (White*, Black, American Indian/Alaskan Native, Asian/Native Hawaiian/Pacific Islander, Multiple races), perceived health status (excellent*, very good, good, fair, poor), perceived mental health status (excellent*, very good, good, fair, poor), region (West*, East, South, North), % of individuals not receiving help or supervision with instrumental activities of daily living (IADLs), % of individuals not receiving help or supervision with activities of daily living (ADLs), % of individuals without any functional limitations.
Overall, our 2002–2014 dataset includes 176,205 individuals belonging to 90,359 HIEUs. The final training, validation, and test sets contained 57,701, 15,818, and 16,840 HIEU pairs, respectively.
Problem Formulation
In year t, a HSA-eligible plan can be characterized by three parameters: A deductible
The household begins each year t in health state
If the current account balance is insufficient to cover these expenses (i.e.,
A household has the freedom to select a contribution amount
where
The health state
To select the investment return rate
Candidate Contribution Policies
We investigate two types of contribution policies in this article. The first is a dynamic policy that optimizes (3) over a long time horizon (
Dynamic Policy
The contribution in each period depends on the current account balance
Intuitively,
Static Policy
The household contributes the amount that brings its HSA balance up to the OOP maximum
The rationale is that in the absence of a personalized predictive cost model, a reasonable strategy would be to strive to cover all OOP expenses from within the HSA. This is intuitively more conservative than the dynamic policy. The annual maximum contribution value
Cost Evolution Model
Our model for producing year to year forecasts of household health care costs uses as predictors: 1) the household’s health care costs in the prior year and 2) demographic information and a set of self-reported health variables v described in Data Description. The use of prior year costs is based on evidence that their use alone can perform almost as well as models that employ a broader set of clinical/demographic variables.7,9,18,19 Furthermore, using more than one year of prior information has been shown to add little predictive power. 20
Rather than directly predict cost or some transformation of it, a novel aspect of our approach is to model the transitions of a household’s health care cost percentile from one year to the next. vi The rationale for modeling percentiles rather than actual costs is apparent from Figure 2: The presence of a linear association among cost percentiles in two consecutive years is apparent in the right panel. Indeed, the correlation among consecutive year percentile values is 0.70, much higher than the corresponding correlation among actual costs in consecutive years (0.39).

A schematic description of the sequence of events within a period

Scatter plots of year to year cost transitions. X-axis represents costs in year
Specifically, we propose a four-stage model for evolving costs: 1) a binary outcome model for the event that a household incurs zero cost in the current year (a nonnegligible percentage of households have zero cost in a given year); 2) conditional on having positive costs, a continuous parametric model for the household’s cost percentile; 3) a mapping from cost percentile to actual cost; and 4) inflation adjustment of actual costs from year to year.
Some of the models can incorporate either the basic or the expanded covariates
Zero Cost Model
In any given year, a fraction of households incur zero cost. We employ logistic regression to estimate the probability
Percentile Transitions
Conditional on incurring positive costs in year
where
where
The rationale for the Gamma distribution can be seen from Figure 3. Each subplot shows the empirical probability density of

Plots of the empirical probability density for
The mixture we use has to accommodate both the positive and negative skews on display in Figure 3. Since the skew of a Gamma distribution is positive, its reflection has negative skew. We therefore use a mixture of a truncated Gamma and a reflected truncated Gamma distribution to model cost percentile transitions:
Setting
Distribution of Nonzero Expenditures
To map the conditional percentile

Plots of the empirical probability density for the logarithm of the nonzero household expenditures. The upper left plot is for the nonzero costs in the second year of panel 7, and the bottom right is for panel 14. The x-axis is log(Yt) and the y-axis is the probability density. Details of the panel data are given in Data Description under Methods. Fitted skew-normal distributions are overlayed.
Let us denote the cumulative distribution function of the log-skew-normal distribution for positive costs in year
Cost Parameter Evolution and Inflation Estimation
For planning over long horizons, we need to account for the evolution of costs from year to year, including inflation. We therefore require a model for the evolution of the parameters
To estimate the health care cost inflation rate
Evaluation of Model Performance
We test the performance of our cost evolution model against ones employed in industry. The comparative studies by the Society of Actuaries24,25,26 report performances for these models in terms of out-of-sample (prospective)
using the same metrics for comparison. We also calculate the predictive ratio, the ratio of the sum of predictions divided by the sum of actual expenditures. For
Recall that
Evaluation of Contribution Policies
To compare the performances of the static and dynamic policies, we simulate the total expected discounted cost under each of these policies over a 30 year contribution period.
The results are calculated for an HSA-eligible family plan with a $3,500 deductible, $12,500 OOP maximum, and 20% coinsurance rate. viii We focus on family plans because individual and family plans have different federal requirements on their HDHP parameters (minimum deductible, maximum OOP max). As such, we assume a family of size of 2.8 (the average size of family units of 2 or more individuals). ix We further restrict attention to private (employer-sponsored) insurance holders, as this is the population most likely to use HDHPs, and therefore we assume all participants are covered by private health insurance.
Since the health care inflation rate
The policies that we considered are the following: 1) dynamic policy; 2) static policy with annual contribution limit of $2,000 (static $2,000); 3) static $4,000; and 4) static $6,450. x We use the static $2,000 policy as the comparison benchmark because it is close to the median annual contribution for HSAs with nonzero account balances, 27 and as such serves as a proxy for common HSA contribution behavior. Household costs are simulated using the basic model because modeling the evolution of household covariates in the expanded model is beyond the scope of this article. All computations were performed in R 3.3.2.
We also perform sensitivity analyses on the results as different parameters are varied. First, we focus on the parameters of the HDHP that vary the most in the marketplace. Specifically we examine the nine pair-wise combinations of deductible values ($2,500, $3,500, $4,500) with OOP maximum values ($7,500, $10,000, $12,500). xi Second, we separately assess sensitivity to the discount rate by examining three discount rate values (4%, 6%, 8%).
Assessment of Tax Impact and Federal Contribution Limit
To determine if adopting the dynamic policy would lead to total contribution amounts exceeding what the federal contribution limit allows for over 30 years, we examine the distribution of inflation-adjusted contributions over this timeframe. To take a conservative approach in this analysis, we focus on households on the high end of the contribution spectrum. Specifically, we assume an initial cost percentile of
We also analyze the proportion of annual contributions that exceed the federal annual limit. A high proportion would indicate that the current federal limit needs to be raised to enable households to take full advantage of the dynamic policy. We perform this analysis using our default parameters of deductible ($3,500) and OOP maximum ($12,500). For this analysis, using $3,500 rather than $4,500 represents a more conservative approach, since it leads to a lower proportion of limit-exceeding contributions.
Results
Performance of Cost Evolution Model
Table 1 compares our basic and expanded models to the commercial models detailed in the comparative studies.24–26 On the basis of
Out-of-Sample Model Performance Comparison
MAPE, mean absolute prediction error
Not reported in this study.
Results based on out-of-the-box models without further recalibration.
MAPE values in this study were not normalized to a relative scale and were therefore not comparable.
Performance of Contribution Policies
Figure 5 presents boxplots representing the costs of the static $4,000, static $6,450, and the dynamic policies as a percentage of the cost of the static $2,000 policy. Each boxplot summarizes the distribution of outcomes across 10,000 simulations of the 30-year contribution period. The columns represent initial cost percentile (
Both the dynamic and static $6,450 policies consistently outperform the static $2,000 one. For example, at a marginal tax rate of 40% and
The dynamic policy has just a moderate edge over the static $6,450 one, around 2 to 3 percentage points lower based on median performance (Figure 5).
The cost advantage of the dynamic and static $6,450 policies over the static $2,000 ones rises with marginal tax rate. For example, at
The advantage the dynamic policy has over the static $2,000 one is not sensitive to the parameters that define the HDHP. Across the nine combinations of deductible and OOP maximum tested, the cost advantage was 8% to 11% based on median performance, about the same as those for the default plan parameter values (Figure 6).
The advantage the dynamic policy has over the static $2,000 policy decreases moderately as the discount rate rises. When the discount rate doubles from 4% to 8%, the cost advantage of the dynamic policy fell from 13% to 9% (Figure 7).
The initial cost percentile
The impact of the marginal tax rate on total costs is highest for the static $2,000 policy, falls as contribution limit rises, and is negligible for the dynamic policy. For the static $2,000 policy, with

Boxplots of the expected discounted costs for the static policies with contribution limits of $4,000 and $6,450, and the dynamic policy. Costs are expressed as % of the cost of the static $2,000 policy. Columns: The left, middle, and right columns display results for the 25th, 50th, and 75th initial cost percentiles

Sensitivity analysis to changes in health plan parameters. Other parameters held at: Initial cost percentile

Sensitivity analysis to changes in discount rate. Other parameters held at: Initial cost percentile

Mean discounted total costs of the static $2,000 policy and the dynamic policy, as tax rate and initial cost percentile
Tax Impact and Implications of Federal Contribution Limit
We compute the distribution of inflation-adjusted contributions over 30 years. The 25th, 50th, and 75th percentiles are $50,100, $71,400, and $96,300, respectively, in 2013 dollars. These are all significantly lower than the 2013 federal limit of
We compute the proportion of annual contributions that exceed the federal annual limit, for combinations of initial cost percentiles (0.25, 0.50, 0.75) with tax rates (15%, 25%, 40%). We find that 9% to 11% of households in a given year have recommended contributions that exceed the federal limit. For these households, the mean amount ranges from $10,100 to $10,300 in 2013 dollars.
Discussion
Taking advantage of the strong correlation between a household’s cost percentiles in consecutive years, the 1-year predictive accuracy of our cost evolution model is on par with leading industrial models. A key difference is that our model uses variables whose evolution is easy to track, making it possible to project a household’s cost over a long horizon. By contrast, the objective of the industrial models is to generate short-term forecasts using clinical variables that evolve stochastically over time. We use the results produced by our model to answer the two questions posed in the introduction:
How Should a Household Budget for Its HSA Contributions?
Our recommended dynamic policy and the static $6,450 one both incur substantially lower costs than our proxy for common contribution behavior (the static policy with a $2,000 contribution limit). The cost advantage is explained as follows. In the first few years following HSA adoption, there is a risk that the account will have insufficient funds to cover OOP expenses. By recommending that the household make a substantial contribution in year 1, the dynamic policy optimizes against this risk. Similarly, the static $6,450 limit policy makes large contributions in the first several years to quickly bring the balance up to the OOP max, thereby reducing this risk substantially. By contrast, the static $2,000 policy grows the balance more slowly, and is at greater risk of account insufficiency early on. The same phenomenon applies in years immediately following large expenditures, when the account is depleted. The dynamic and static $6,450 policies can quickly replenish the account, while the $2,000 policy requires more years to do so.
Our results suggest that while the dynamic policy is the most cost efficient among the ones examined, the static $6,450 policy is a worthy alternative. In the absence of personalized analytic guidance, the latter policy is a sensible strategy: Households outside the top tax brackets can capture most of the cost savings of the dynamic policy while still operating within the federal contribution limit. Our findings also indicate that insufficient annual contributions can result in substantially higher total costs.
The advantage the dynamic policy has over the static policies holds true across a range of parameter values, and is greatest for households with lower discount rates—those who place greater weight on future outcomes.
Should Policy Makers Revisit the Current Federal Contribution Limits for HSAs?
The last portion of our results suggest that 9% to 11% of households in a given year will need to make a HSA contribution that materially exceeds the current federal contribution limit. To allow these households to make full efficient use of HSAs, we propose an alternative to the current approach of capping annual contributions: Permit households to contribute to their HSA without limit up to a certain account balance (the limit-free balance), and then impose an annual limit on contributions thereafter. The limit-free balance could be set equal to each plan’s OOP maximum. This two-tiered approach allows households to quickly save up to the threshold recommended by the dynamic policy. Importantly, our results suggest that the resulting impact on overall tax receipts will be well below what is currently allowed by legislation.
To our knowledge, this is the first article to develop rigorous guidelines for HSA contributions, and we view it as a promising first step. To improve the accuracy of the model, future research should examine some of the possible extensions described below.
Health care costs during retirement. We might also factor into our objective health care costs during retirement, during which no contributions are made. For this objective, a preferred policy is likely to be a hybrid of the dynamic policy (to cover current costs) and a fixed contribution component (to cover estimated retirement costs), with the goal of attaining an account balance at the end of the contribution period that is sufficient to cover costs during retirement. We can leverage the modeling framework developed here to analyze this type of policy.
HSA withdrawals for nonmedical expenses. A reviewer correctly pointed out that a household can withdraw money from its HSA for nonmedical expenses. Such a withdrawal would trigger a penalty on top of the tax due on the withdrawal amount, so our current model prohibits negative contribution amounts. This can be relaxed by developing an additional model for predicting whether a withdrawal may be needed, as a function of the household’s wealth and spending needs. Fitting such a model requires gathering additional household financial data, which we leave for future research. If the current dynamic policy is operationalized to provide contribution recommendations, we advise that users contribute the minimum of what is recommended and the amount they anticipate would not be needed for other purposes.
Impact of HSAs and high-deductible plans on expenditure distribution. In this article, we did not attempt to model this because the Medical Expenditure Panel Survey dataset only has limited information about the insurance plan parameters of individuals, and only up to 2001 (before the implementation of HSAs). In the future our model could be recalibrated to data from households with multi-year enrollment in HDHPs. The results of the RAND Health Insurance Experiment suggest that increased cost sharing reduces the likelihood of visiting a physician, but has a smaller effect on the costliness per episode of care. 28 We hypothesize the effect might appear as an increase in the probability of zero costs, and/or as a shifting of the overall yearly cost distribution curve to the left.
Switching to a low-deductible health plan. Some households have access to both HDHPs and low-deductible plans, and may choose to switch between them. Under this scenario, the optimal policy may be of the form xii : If the household’s current cost percentile is below a certain level, use a HDHP and contribute the amount (4) to a HSA; otherwise, select the low-deductible plan. Further exploration of the choice between high- and low-deductible plans is left for future research, possibly along the direction discussed above.
Time-dependent covariates in cost prediction. Our cost evolution model uses time-fixed covariates to produce long-term forecasts, whereas leading industrial models use the most recently available values of time-dependent variables to forecast a little ahead in time. To incorporate these variables into our model for long-term planning, we need to have a way to simulate the evolution of these variables over time. This itself is a complex problem that is worthy of a separate research question. We note however that the short-term predictive performance of our model is already on par with the industrial ones. Hence, feeding a household’s most recent covariate values into our model should provide contribution recommendations that approximate the more sophisticated model.
We anticipate that our policy (or a version of it built atop more elaborate versions of the same models) could in the future be turned into a web application that consumers can use to determine contributions on an annual basis. While the details of such an implementation are outside the scope of this article, we imagine the application would operate as follows: The user would be requested to submit parameter values such as household size, prior year health care spending, marginal tax rate, and current health savings account balance. The application would then display the recommended contribution amount.
To conclude, we have developed a new analytical framework for modeling a HSA balance over time. We used this model to formulate a contribution policy that enables a household to optimize their expected discounted costs. With the growth trend in consumer-driven health plans, we expect that evidence-based modeling will play an increasing role in consumer health care financial planning.
Supplemental Material
DS_10.1177_2381468318809373 – Supplemental material for Health Savings Accounts: Consumer Contribution Strategies and Policy Implications
Supplemental material, DS_10.1177_2381468318809373 for Health Savings Accounts: Consumer Contribution Strategies and Policy Implications by David J. Lowsky, Donald K. K. Lee and Stefanos A. Zenios in MDM Policy & Practice
Footnotes
Acknowledgements
We thank our reviewers for their excellent suggestions that significantly improved the article.
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Author Note
DKKL is now affiliated with Goizueta Business School, Emory University, Atlanta, Georgia.
Supplemental Material
i
In 2013, the household limit was $6,450, and for 2017 it is $6,750. Individuals aged 55 and older and not enrolled in Medicare are allowed to make additional annual contributions of up to $1,000.
ii
iii
“In-scope” is defined as being a member of the US civilian noninstitutionalized population. We further include individuals who were born or died at some point during the survey period.
iv
We also performed an analysis of the sensitivity of results to discount rate (see Results).
v
vi
This is a continuous version of a similar approach for modeling transitions among discrete cost states: Low, Medium, High, and Very High. 21
vii
MAPE is expressed as a fraction of the mean out-of-sample costs.
viii
The relative performance of the policies was insensitive to the deductible amount or OOP maximum, per the results of our sensitivity analysis reported in Results.
ix
Family size is modeled as a continuous covariate in our cost evolution models.
x
The 2013 federal annual contribution limit was $6,450.
xi
We selected $2,500 as the lowest deductible and $12,500 as the highest OOP maximum. These correspond to the lowest and highest federal allowable limits, respectively, for all HSA-eligible plans in 2013.
xii
This follows from applying policy iteration once to the Bellman equation in the supplementary material used to derive (4).
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
