Abstract
Objective
Debates about the effectiveness of workplace wellness programs (WWPs) call for a review of the evidence for return on investment (ROI) of WWPs. We examined literature on the heterogeneity in methods used in the ROI of WWPs to show how this heterogeneity may affect conclusions and inferences about ROI.
Methods
We conducted a scoping review using systematic review methods and adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. We reviewed PubMed, EconLit, Proquest Central, and Scopus databases for published articles. We included articles that (1) were published before December 20, 2019, when our last search was conducted, and (2) met our inclusion criteria that were based on target population, target intervention, evaluation method, and ROI as the main outcome.
Results
We identified 47 peer-reviewed articles from the selected databases that met our inclusion criteria. We explored the effect of study characteristics on ROI estimates. Thirty-one articles had ROI measures. Studies with costs of presenteeism had the lowest ROI estimates compared with other cost combinations associated with health care and absenteeism. Studies with components of disease management produced higher ROI than programs with components of wellness. We found a positive relationship between ROI and program length and a negative relationship between ROI and conflict of interest. Evaluations in small companies (≤500 employees) were associated with lower ROI estimates than evaluations in large companies (>500 employees). Studies with lower reporting quality scores, including studies that were missing information on statistical inference, had lower ROI estimates. Higher methodologic quality was associated with lower ROI estimates.
Conclusion
This review provides recommendations that can improve the methodologic quality of studies to validate the ROI and public health effects of WWPs.
Workplace wellness programs (WWPs) are employer-sponsored initiatives to promote healthy behaviors among employees. Public and private sectors have used workplace interventions to improve employee health and productivity for decades, 1,2 with a focus on worker productivity. 3 Studies suggest that WWPs improve employee health by reducing modifiable risk factors, such as physical inactivity, tobacco use, unhealthy eating habits, obesity, high blood pressure, high blood glucose, and high cholesterol. 4 -8 These improvements in employee health are thought to increase health-related productivity by reducing absenteeism and presenteeism. 4 -6,9 -12 Since their inception, WWPs have expanded to include initiatives such as health promotion, prevention, disease management, and occupational health and safety. 1,13 -16
Today, WWPs are often implemented to improve employee health 4 and address rising health care costs, particularly in the United States and other Western societies. 17 As a result, many economic evaluations of WWPs have focused on return on investment (ROI). 14,18 Improvements in employee well-being and performance could decrease the organizational costs associated with health care use, high turnover rates, and health-related productivity losses. 5 -8,10,18 -20 In the past decade, however, new criticism of the WWP ROI literature argues that the expected cost savings may not materialize, citing a lack of reliable evidence on WWP effectiveness in delivering cost savings or positive ROI. 21,22
In a series of reviews from 1991 through 2011, Pelletier 23 -27 concluded that WWPs would improve health and reduce health care costs if properly implemented. In 2013, Kaspin et al 28 identified program characteristics that were associated with improved economic outcomes. Baicker et al 18 also concluded that WWPs would reduce health care costs, but this review was criticized in subsequent editorials for inadequate reliable data. 22,29,30 Lerner et al 31 conducted a review using more stringent inclusion criteria than previous reviews and found that only 10 studies were rigorous enough to be evidentiary. Despite finding that 8 of 10 studies showed a positive economic effect, Lerner et al concluded that evidence was insufficient to draw a conclusion on the economic effect of WWPs. In 2014, Baxter et al 32 found that methodologic quality of a study and study design were important determinants for WWP evaluation results. McCoy et al 33 also noted how business size and type of wellness program could affect the decision to adopt the program and the program’s effectiveness.
To our knowledge, no review has analyzed the heterogeneity of WWP evaluations and their effect on ROI findings. The objective of our review was to describe the effects of study heterogeneity that are not commonly noted in the literature. These effects include inconsistent formulation of ROI, variation in outcomes evaluated, program targets, evaluation length, publication year, conflict of interest, and the lack of statistical inference information for ROI estimates. In addition, we examined the underrepresentation of small businesses in the WWP ROI literature and which methodologic challenges affect ROI findings.
Methods
This review adheres to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. 34 The lead author (N.U.) primarily performed many aspects of the review. Thus, this review is best classified as a scoping review 35 and was, therefore, not registered at PROSPERO. We identified peer-reviewed articles in PubMed, EconLit, Proquest Central, and Scopus. Our search included all articles that met our inclusion criteria and were published before the date of our final search conducted on December 20, 2019.
The target populations for our search were workplaces, employees, worksites, or workers. The target interventions were wellness, health, health promotion, health prevention, or well-being. The target evaluation method was economic evaluation, including cost benefit, cost effectiveness, cost analysis, economic evaluation, economic analysis, or economic assessments. The outcome was ROI. A detailed list of the search terms by database is available from the authors upon request. We placed no restrictions on publication dates. We excluded publications that were not primary studies (eg, reviews, simulations, or meta-analyses) or not in English. Because “wellness” is often an umbrella term that includes components focused on lifestyle or behavior-related risk reduction and chronic disease management, we included workplace wellness, health promotion, and disease management programs.
We first reviewed articles’ titles and abstracts to determine relevance and fit for this review. Next, we conducted a full-text review of articles deemed relevant. Then, we scanned the reference lists of all identified publications, including those from systematic reviews, meta-analyses, and other reviews, to identify other relevant citations. We excluded articles that evaluated government-sponsored WWPs to maintain a focus on private employer–relevant information. We included only peer-reviewed articles to analyze the validity of recent critiques of WWP ROI studies. 21,22,29
The lead author (N.U.) read and extracted data from all articles, with targeted assistance from coauthors (G.W., J.B., D.B.). For all articles, we extracted the ROI estimate as reported, regardless of how it was defined. Reported ROI measures included (1) true ROI, expressed as either a ratio or percentage and measured as the ratio of net benefit (the difference between benefits and program costs) to program cost, which has a threshold for positive ROI of 0 36 ; (2) the benefit-to-cost ratio, which has a threshold for positive ROI of 1; or (3) net benefit with positive ROI as savings exceeding program costs. We recalculated ROI using information from each article to consistently define ROI as the net benefit-to-cost ratio. If the study did not report ROI as its finding but reported program costs and benefits, then we calculated ROI using net benefit-to-cost ratio. To increase consistency across studies, the recalculated ROI is the primary outcome of interest in our review.
We did not discount or adjust monetized values for inflation to have standard valuation across studies. Discounting would require extracting annual flow information for costs and savings, which was not possible for all articles in this analysis. We did not adjust for inflation across studies for 2 reasons. First, most of the included studies already adjusted for inflation when necessary. Second, the recalculated ROI used in this analysis is a ratio of net present values, and inflation adjustments would affect the numerator and denominator equally and so are not needed.
We extracted information from the included articles using a methodologic rigor rubric that we generated based on guidance from 5 checklists. 37 -41 Our rubric contained checklist domains of article characteristics, reporting, internal validity, external validity, and power (Table 1; detailed rubric available upon request). We used the domains of reporting, internal validity, external validity, and power to score the quality of the articles. Each domain included items that were scored 0 or 1 based on the presence (1) or absence (2) of information in the included articles. Reporting had 11 items, 3 of which were averaged to compose the score for study sample in this domain, resulting in a total score that ranged from 0 to 8. Internal validity had 13 items, 2 of which were averaged to compose the scores for the appropriate assessment of the outcome measures item and 3 of which were averaged to compose the score for the appropriate cost measures and values item, resulting in a total score that ranged from 0 to 8. External validity and power each had 1 item with a raw score of 0 or 1 based on the presence (1) or absence (0) of each in the articles. The lead author (N.U.) scored all articles based on the rubric domains. The total scores for each domain were summed for a quality index score that ranged from 0 to 18. The second author (G.W.) independently scored 3 articles to calibrate the scoring of the lead author. The lead author discussed any scoring uncertainties with the coauthors to achieve consensus in scoring.
Rubric used to assess rigor and quality of articles that were included in a scoping review, evaluated workplace wellness programs, included ROI measures, and were published before December 20, 2019
To determine if ROI results differed based on the targeted outcome of the WWP, we classified evaluated programs into disease management, wellness, or a combination of the 2. A disease management program targeted diagnosable diseases (ie, asthma, diabetes). A wellness program targeted health risks or behaviors (ie, smoking, exercise, nutrition). Because “wellness” is not precisely defined in the literature or in practice, we attempted to control for the nature of the wellness program being evaluated.
Furthermore, to analyze whether ROI results varied based on how benefits (program outcomes) were defined, we categorized studies based on the cost components included in the ROI analysis, such as costs of health care, absenteeism, or presenteeism. Health care included pharmaceutical claims and medical claims of inpatient, outpatient, and emergency department visits. Absenteeism included lost workdays, sickness absence days, disability days, or time away from work. Presenteeism included measures for productivity loss at work.
We reviewed articles for potential factors that might affect the ROI findings. We categorized the study publication year into 4 groups that attempted to balance sample size and date ranges: before 2000, 2000-2010, 2011-2014, and after 2014. Because short- and long-term effects may differ, we used an indicator variable for studies with follow-up of ≥3 years. We chose the 3-year study duration to roughly balance sample size across categories. We identified conflicts of interest using information on authors’ employment and study funding, setting the indicator equal to 1 if at least 1 author was employed by the funding or program host institution. We classified company size as small (≤500 employees) or large (>500 employees). Size was the only company characteristic that we used in the analysis because of a lack of other information across studies.
We used ordinary least squares regression to examine significant differences in ROI across study characteristics and not to imply any causal inference. Models 1 and 2 included all studies, including those with ROI values as low as –11.61 42 and –6.66, 43 which are extreme outliers 44 relative to the interquartile range of ROI estimates. Models 3 and 4 excluded these outliers. Models 1 and 3 included quality index scores as a predictor of ROI, whereas Models 2 and 4 included specific domains that make up the quality index scores to provide more detail on the effects of domains of quality on ROI. For all 4 ordinary least squares regressions, we considered P < .05 to be significant. Because few source articles included the standard error or any statistical inference information for the ROI estimate, we did not adjust the regression for sampling variation within each study; as such, it should not be considered a meta-regression.
Results
We identified and selected for further review 466 unduplicated articles (Figure). Of these, 78 articles met the inclusion criteria after title and abstract screening, 33 of which were included after full-text review. We also included 11 articles from the publications’ reference lists, and we identified 3 articles in an updated search conducted on December 20, 2019, resulting in 47 unique publications included in our review.

Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow chart showing the process for article inclusion in a scoping review of articles that evaluated workplace wellness programs, included measures of return on investment, and were published before December 20, 2019.
Of the 47 included articles, 30 provided an ROI estimate as a main outcome, 2 of which did not provide any information on the ROI formula and did not have sufficient program cost information for us to recalculate the authors’ ROI findings (Table 2). Thus, we excluded these 2 articles from subsequent analysis. We found 3 articles that had not reported ROI as their outcome but did include program benefits and costs in sufficient detail to allow ROI to be calculated. Therefore, we added these 3 articles to our subsequent analysis. The final sample for the ROI regression analyses included 31 articles with recalculated ROI outcomes. Of these 31 articles, 24 had a positive recalculated ROI and 7 had a negative recalculated ROI.
Summary characteristics in a scoping review of articles that evaluated workplace wellness programs, included measures of return on investment, and were published before December 20, 2019 (n = 47)
Abbreviations: NA, not available; ROI, return on investment.
The mean recalculated ROI was 0.38 (Table 3). The average recalculated ROI shows that companies saved $1.38 for every $1 invested in WWP. Of 31 publications that we included in our ROI regression analyses, 7 included only health care costs in the ROI, 7 included only absenteeism costs, 4 included both health care and absenteeism costs, 5 included health care and presenteeism costs, 4 included absenteeism and presenteeism costs, and 4 included all 3 costs. Twenty-two of the studies that we included in our ROI regression analyses evaluated only wellness programs, 4 evaluated only disease management programs, and 5 evaluated wellness and disease management programs. Eighteen studies had a follow-up length of ≥3 years. Five studies were published before 2000, 7 were published during 2000-2010, 9 were published during 2011-2014, and 10 were published during 2015-2019. Ten studies had a potential conflict of interest. Only 3 studies were conducted in small companies; the mean calculated ROI was 0.67 for these studies.
Descriptive statistics for recalculated ROI, article characteristics, rubric scores, quality indices, and study design among articles that had ROI measures for workplace wellness program evaluations and were included in a scoping review of articles published before December 20, 2019 (N = 31)
Abbreviations: ROI, return on investment; SD, standard deviation.
The mean reporting score was 7.0, and the mean internal validity score was 4.9. The mean external validity score was 0.2, implying that about 23% of included articles met the external validity criterion. Similarly, about 10% of included articles discussed statistical power. The mean overall quality index was 12.2 points. Of 31 publications, 6 were observational studies without a comparison group (base group), 8 were observational case studies with a comparison group, 1 was an observational cohort study with a control group, 4 were quasi-experimental studies, and 12 were randomized studies.
The ordinary least squares constant represents the average ROI from an evaluation that included only health care costs in the assessment of organizational benefits, exclusively examined a wellness program, had a follow-up length of <3 years, was published before 2000, had no apparent conflicts of interest, was conducted in a large firm, and had either the average quality score (Models 1 and 3) or the average reporting and internal validity scores (Models 2 and 4) (Table 4). Studies with both costs of absenteeism and health care, studies with costs of any presenteeism, studies with a follow-up length of ≥3 years, or studies published after 2000 (except during 2015-2019) were found to be associated with lower ROI than their base or referent categories. Studies that evaluated a program with a disease management component produced higher ROI than evaluations with only a wellness component. Studies with higher reporting scores reported lower ROI than studies with lower reporting scores. Studies with higher internal validity scores reported higher ROI than studies with lower internal validity scores. However, results were not significant across all models.
Mean effects of article characteristics on recalculated ROI for articles that had ROI measures for workplace wellness program evaluations and were included in a scoping review of articles that evaluated workplace wellness programs, included ROI measures, and were published before December 20, 2019 a (N = 31) b
Abbreviations: ROI, return on investment; SE, standard error.
aThe dependent variable for these ordinary least squares regression models is positive ROI, which is 1 if recalculated ROI is positive and 0 otherwise.
bModels 1 and 3 included average quality index scores (the sum of scores for reporting, internal validity, external validity, and power domains). Models 2 and 4 examined the effects of the 2 domains that contribute most to the score for quality index, controlling for reporting and internal validity domains separately and leaving out the summed quality index score because of collinearity.
cUsing the t test, with P < .05 considered to be significant.
dThe base category is only health care cost.
eThis component includes combinations of costs that include presenteeism.
fThe base category is “only wellness program.”
gEvaluation duration is 1 if the study period is ≥3 years.
hThe base category is publication year before 2000.
iSmall is 1 if the company has ≤500 employees.
jInternal validity, reporting, and quality index scores were demeaned (ie, sample mean subtracted from each observation).
kTwo outliers were excluded in the sample for Models 3 and 4.
Discussion
This review addresses some points not previously considered and confirms some findings of previous reviews. It identifies factors underlying heterogeneity across studies and expands on previous findings about the association between study heterogeneity and the magnitude of ROI estimates. Although heterogeneity can never be completely eliminated, this study highlights some key sources of heterogeneity that should be addressed in future studies. Perhaps the most problematic conclusion across studies was that ROI is inconsistently defined in the literature. We acknowledge that the long history of misusing the term ROI in evaluations and reviews of the literature will make it difficult to standardize its use moving forward. Nonetheless, defining ROI using its original, financial definition, 85 net benefit-to-cost ratio, is essential if the reasons for using ROI are to speak to financial decision makers.
Only 5 randomized studies reported confidence intervals or statistical inference information for the ROI estimate, making formal meta-analyses impossible. Providing confidence intervals for ROI is not common because ROI is measured as a ratio. An additional method, such as bootstrapping, is therefore needed to estimate confidence intervals or standard errors. This method could be the easiest way to improve WWP evaluations. Unfortunately, the lack of statistical inference information in most of the literature prevents formal estimation of an average ROI or testing of heterogeneity across studies. Therefore, we cannot provide a single, combined estimate of ROI in the literature. The average reported ROI we present should be considered a qualitative summary of the literature, not a quantitative finding. An important corollary of this finding is that most previous reviews claiming to be meta-analyses are, in fact, not formal meta-analyses but, rather, are qualitative syntheses such as the one we present here.
Although results from our recalculated ROI analysis were not significant, we believe they suggest important considerations for future ROI research. For example, WWPs with a specific outcome target could save more money than WWPs with only general wellness or health behavior targets. The health effect of wellness programs is mediated through behavior change, which might be harder to measure in the short run and have less immediate and direct effects on organizational costs than disease management programs. In contrast, disease management programs can directly affect health conditions that drive health care and productivity costs. If the primary objective for implementing WWPs is to control costs, WWPs should directly target the drivers of those costs. Disease management programs may offer a more direct effect on costs than wellness programs. If, however, the primary objective is to improve employee health, then WWPs should target health behaviors, recognizing that cost savings may only accrue in the long term.
Another consideration for future ROI research comes from our mixed results, which showed that conflicts of interest arose because of internal evaluations and were associated with higher ROI than evaluations without conflicts of interest. Although it is possible, and maybe even plausible, that internal evaluators have better access to data, thereby allowing them to better estimate ROI, independent evaluation is essential to increasing confidence in the evidence base. Eliminating conflicts of interest may be one of the most difficult obstacles in the field because of the need to rely on the cooperation of the WWP host companies.
Finally, recent critiques of the WWP ROI literature suggest that studies with greater internal validity scores yield lower ROI estimates. Yet we found that studies with greater internal validity scores (ie, with stronger evidence for causal inference) had higher ROI estimates. In general, evaluation studies, regardless of study design, do not provide the distribution information of the benefits including outliers, which could be one contributor to the positive association in estimation methods and ROI findings.
Limitations
This review had several limitations. One limitation was the small sample size and lack of formal meta-analysis underlying our pooled estimates of ROI. The standard errors did not account for the underlying sampling variation of the ROI estimates drawn from the literature and so did not support formal meta-analytic hypothesis testing. Moreover, the mean ROI did not account for the scale of programs. In theory, ROI handles this issue by being a ratio, but only if programs exhibit constant returns to scale.
Other limitations included measurement errors in data collection and self-selection into program participation. These inherent limitations cannot be eliminated. Randomized clinical trials are difficult if not impossible in some firms for legal and logistical reasons. In addition, some health-related data are not available to independent evaluators for legal reasons, making it necessary to involve an internal collaborator. Finally, this review included only peer-reviewed articles, which may lead to potential publication bias.
Public Health Implications
Our review focused on ROI findings because of the ongoing debate about the findings of the economic evaluation literature. However, the relevant outcomes from employers’ perspectives were varied and subject to change based on companies’ characteristics. For example, a small nonprofit company in one industry might adopt a WWP for corporate citizenship purposes, whereas a large for-profit company in the same industry might adopt a WWP to reduce turnover. Much of the economic evaluation literature has neglected this point.
Our scoping review provides information on areas that can improve methodologic quality for economic evaluations of WWPs. Lack of statistical inference information on ROI is an important reporting issue because we cannot conduct a meta-analysis to derive common effects of WWPs when statistical inference information is missing. The economic evaluation literature needs better reported peer-reviewed studies and attention on WWPs in companies with various characteristics, especially small companies with various reasons for WWP adoption. The advancements suggested in our scoping review will help us understand organizations’ motivations for adopting and implementing WWPs and align private- and public-sector motivations to receive policy support. The goal is for future research to validate whether WWPs can substantially affect public health.
Footnotes
Acknowledgments
The authors thank Albert N. Link, PhD, of the Department of Economics at the University of North Carolina Greensboro, and Michael Pittard, MFA, of the Department of English at the University of North Carolina Greensboro for their valuable contribution to the preparation of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
