When Simple Becomes Complicated: Why Excel Should Lose its Place at the Top Table

Abstract

Traditionally, the majority of health economic modelling has been performed in spreadsheet calculators such as Microsoft Excel as it is perceived to be more transparent and easy to use. However, as the modelling requirements become more realistic and therefore complex, spreadsheets become increasingly cumbersome and difficult to manage. We argue that specialist statistical packages such as R should be used when the models become suitably complex. We acknowledge the difficulties associated with script-based statistical software, but argue that user-written packages designed for health-technology assessments simplify the analysis when compared to spreadsheet calculators. Additionally, we argue that the production of web-applications based on R will allow the statistical capabilities of specialist software to be available for all. All that is needed is a dialogue between the modellers and the academic to make the software available for all.

Keywords

Cost-effectiveness analysis Decision-analytic models R Statistical packages

Are spreadsheet calculators really simple and transparent?

Historically, health-technology assessments (HTAs) have often been based on modelling performed in specialised commercial packages (such as TreeAge) or even more frequently using spreadsheet calculators (almost invariably Microsoft Excel). This choice as the de facto standard is because these tools are “transparent, easy to use and easy to share with clients and stakeholders” and thus HTA bodies tend to request Excel models as part of a reimbursement dossier. Perhaps an additional factor contributing to the success of these tools and this modelling strategy is that often economic modellers are not statisticians by training. Consequently, they may be less familiar with the full functionality of general-purpose statistical packages such as SAS, Stata or R.

For simple models, which can be easily arranged in a small number of spreadsheets, it is indeed useful to give the user the possibility of modifying a small number of parameters by simply changing the value of a cell or selecting an option from a drop-down menu. Moreover, when the amount of information and underlying statistical modelling required for the cost-effectiveness analysis is not very large or complex, “clients and stakeholders” (e.g., the sponsors and reviewers acting on behalf of the regulators) can navigate relatively easily through the spreadsheets.

Perhaps one evident drawback of this spreadsheet-based approach is that increasingly cost-effectiveness models are based on pre-existing templates (for a specific country or drug, for example) and then “adapted” to the situation at hand (e.g., a different geographical context, or an intervention with a similar mechanism of action). More importantly, in the recent past, health economic models have become grounded in more advanced statistical foundations (1 –5).

These modelling techniques allow for the development of more sophisticated mathematical representations that reflect the underlying clinical and economic process associated with implementing a health-care intervention more closely (6). Additionally, HTA bodies such as NICE in the UK are requesting the inclusion of more in-depth analysis, e.g., Probabilistic Sensitivity Analysis (PSA) (7). This creates a tension between the necessity to “keep things simple”, for the sake of the final users of the models (e.g., the team in the pharmaceutical company presenting a reimbursement dossier) and that of developing a model that fully accounts for the uncertainty associated with the usually limited information (e.g., when extrapolating the results of a short-term clinical trial to life-time horizon).

The consequence of this is that models developed in Excel are increasingly complex and often cumbersome to handle from a technical point of view. Key difficulties associated with complex Excel models are:

•

Models are structured over a large number of different spreadsheets, typically using Virtual Basic for Applications (VBA) macros. This is likely to limit the ability to visualise the data, which is one of the key drivers of the “transparency” argument.

•

While programming errors can affect any computer language, the structure of spreadsheet-based models make debugging especially challenging as cross-linking may be difficult to follow when there are many spreadsheets with active cells in each.

•

Modifications of parameter values are performed using drop-down menus that are programmed using macros. This implies that the spreadsheet itself is merely a graphical interface.

•

Some analyses, such as the estimation of parametric survival models, are beyond the capabilities of Excel, meaning that a hybrid of programmes is used for modelling. This increases the chance that human error will enter the process and limits the traceability of the analysis.

•

More advanced PSA methods such as Value of Information (8), which are becoming a more important component of HTA applications especially when looking for managed entry agreements, are just not available in Excel. For example, recent work has focused on efficient ways of estimating the Expected Value of Perfect Partial Information (EVPPI) using Gaussian Process regression to speed up the computation (3, 4). In contrast, these have been implemented in suitable R packages (9, 10).

We are not alone (2, 11) in arguing that many of these perceived advantages of simpler tools over “proper” statistical software require a serious rethink, in light of these concerns. As the complexity of spreadsheet-based models increases the perceived complexity of implementation using tools such as R should, in our minds, no longer be a barrier. Additionally, in our view, specialist software has a large number of advantages:

Scripting facility: the whole analysis can (and should) be performed by writing scripts instructing the software about the steps necessary to:

estimate the relevant model parameters using the available data;

ii)

construct the relevant economic summaries (i.e., the population average cost and benefits);

iii)

determine the optimal decision, based on current evidence, by computing the maximum expected utility associated with the interventions being assessed;

iv)

perform thorough PSA to assess the impact of uncertainty on the decision-making process.

While scripting has traditionally been seen as a drawback or an unnecessary complication, it actually improves replicability and provides the much longed for transparency.

Graphical facility: statistical packages such as R have very good graphical engines. This guarantees, at virtually no extra cost, high quality output that can be included in research papers or reimbursement dossiers to be submitted to the regulators.

Statistical facility: models are increasingly complex and involve subtle issues that require careful statistical modelling. All specialist statistical software is designed such that all modelling tasks can be performed in the same programme.

Computational facility: related to the previous point, some of the most advanced analyses (for example involving “microsimulations” or the analysis of the value of information) require a computational engine that, again, is beyond the capability of Excel.

In addition to these advantages, several recent pieces of work have made these specialist soft wares more readily available. The first of these is the development of specialist packages that are designed for HTA and health modelling. The other is the proliferation of web apps that allow users to run this specialist software in the background whilst circumventing the need to learn the scripting required to run a full analysis. This, we believe, is the first step in a revolution in HTA where modellers with no knowledge of statistical software can use the huge computational capabilities available in these packages. Notice here the tension between the possible lack of knowledge about the statistical software, as opposed to the statistical modelling: of course, our recommendation is that models should be developed and assessed by qualified researchers and practitioners. This applies equally to industry (e.g., consultancy or pharmaceutical companies preparing the models) and regulators (e.g., the reviewers working for the HTA body). But we believe that this feature (if highly controlled and validated) can help improve the productivity of the whole process.

Additionally, these web apps can be set up to provide written reports and download graphics so the analyses and explanations can be standardised and tailored to specific audiences. Therefore, this is a call, not only to economic modellers to embrace statistical software, but to modellers and developers to begin creating and using these tools to demystify these statistical soft wares and unlock their capabilities for a wider audience. The role of HTA bodies is also crucial in facilitating the success of this process.

How can packages help?

There are several statistical programmes that provide increased modelling capacity over Excel and other spreadsheet calculators. These include SAS, Stata and R, which we specifically advocate as our software of choice. For this reason, we focus in the rest of the paper on packages developed using this latter programme. The key attraction of R is that it is open source, meaning that there are no subscription fees and there is a wealth of online support. Additionally, R capabilities are augmented by user written packages targeting specific statistical challenges, in this case HTA and health economic modelling.

In the current paradigm, it is not unusual to see a cost-effectiveness model for a cancer drug (which covers over 40% of NICE submissions) developed in the following way: first, a survival analysis is performed using trial data (perhaps made available by the sponsor to the modellers), typically using statistical software (e.g., SAS). The results of the statistical model are then imported in an Excel spreadsheet in the form of the relevant estimates together with some measure of variability (typically a 95% confidence interval). Then the relevant survival curves are estimated after having approximated the correlation structure between the model parameters (e.g., by implementing a Cholesky decomposition in Excel).

Finally, these are used to construct the relevant transition probabilities (typically in a separate sheet) to populate the Markov model used to describe the progression of patients among a set of health states (e.g., pre-progression, post-progression and death).

This process is rather cumbersome and has the potential of missing out some important level of correlation in the many model parameters because of its multi-stage nature: the output of the main statistical analysis is only fed to the economic model through some point estimates (copied and pasted from the output of the statistical software), leading to potentially sub-optimal analyses. Conversely, full modelling based on R would allow these steps to be performed in one go. This means that the output of the statistical model can be directly fed to the economic model and ideally the full uncertainty can be characterised accordingly. For example, using packages such as flexsurv (12), it is possible to analyse survival data using complex models (e.g. splines) and jointly with the other model parameters; the survival curves and the uncertainty underlying them could be directly computed and fed to the economic model, which would in turn allow a proper development of PSA. While we acknowledge that PSA is not universally rated as crucial across different HTA agencies, we also maintain that it should be and thus regard this as a fundamental step in improving the process.

After completing the modelling in R, the output must be presented and interpreted. Again, in R, this can be achieved using a specialist package, BCEA (10), that can post-process the results of any health economic model. This package produces standardised output for the analysis of the results (13) which can be simply included in any HTA. This standardisation is the key to making the analysis in R transparent — systematising the functions used to do standard analyses allows modellers to reproduce the analysis (be it for a research paper, or for a dossier to be submitted to a regulatory agency such as NICE).

Even more importantly, BCEA includes functions for more advanced PSA measures such as the Value of Information (VoI) and multiple treatment comparisons. This means that these measures can be calculated using one simple command while in Excel they simply cannot be computed as it lacks the statistical methods to deal with these issues. This is despite the fact that they can make up a key part of any analysis on the cost-effectiveness of different treatments. That tools such as VoI are not mandatory can be arguably ascribed to the objective complexity and computational burden associated with their development. But if methods exist to overcome this issue, we believe that they should be more widely accepted, or in fact requested by the HTA bodies.

We believe that using these packages should go some way towards persuading people that R is not as difficult as it seems. For example, a simple command such as plot in BCEA allows the user to depict a cost-effectiveness plane, the expected incremental benefit, a cost-effectiveness acceptability curve (CEAC) and the expected value of perfect information (13). Clearly, this graphical capability from one simple command goes above and beyond what can be expected of other softwares and goes against the belief that R and other script based softwares are inherently “complex”.

Web Apps can make life even simpler

To further dissuade people from the “complexity” of R, we now present what we believe to be the future of applied statistical modelling, particularly for cost-effectiveness analysis: web applications created using R Shiny. While to an R user these apps are not very useful as they simply perform R functions with limited flexibility (compared to scripting), they bridge the gap between the limitations of the spreadsheet calculators and the full flexibility of script based software by creating a graphical interface for the analysis.

Specifically, we feel that using web-interfaces is indeed very important to disseminate the message and convince practitioners of the supremacy of R or other specialised software over Excel.

In general, web applications allow the user an intermediate step between the “standard” Excel based modelling and the “ideal” (at least to our mind) situation in which all the analysis is performed directly in R. They also produce a graphical interface to help “translate” the model into simpler, possibly graphical terms. This will probably overcome any complaints that clients (e.g., pharmaceutical companies commissioning cost-effectiveness analysis for their products) or stakeholders (e.g., reviewers and committee members in regulatory agencies) have: they want to be able to use menu-bars and sliders to modify the models in an easy and intuitive way.

On this client side, a modern web browser supporting Javascript is capable of displaying the web-applications. Additionally, when accessing the applications through the Internet, all the calculations are performed by a server, so that even the more demanding operations are not relying on the user's own device. In the case of sensitive data that cannot be shared via the web, a local version of the application can be used on the individual machine, again, without the need to run R or any scripts directly. A web application BCEAweb can be accessed at the webpage https://egon.stats.ucl.ac.uk/project/BCEAweb/ to make all the functionalities included in BCEA available without writing a single line of code. This was inspired by the Sheffield Accelerated Value of Information (SAVI) web-app (9), which can be accessed at the webpage http://savi.shef.ac.uk/SAVI/. The aim of SAVI is the calculate the expected value of perfect partial information based on the work developed in (3). In additional to their computational capacity both these applications produce a report in either Word or pdf that gives the user some introduction and explanation for all the statistical output.

We believe that the scope for these web applications in HTA is significantly beyond these two cases and indeed other web apps are available, e.g., bmetaweb (14) accessible at https://egon.stats.ucl.ac.uk/project/bmetaweb/. The power of this analysis is a graphical interface, similar to VBA in Excel, which draws on the computational capabilities of R. Throughout this editorial we have demonstrated consistently the power of R over Excel but acknowledge that there are difficulties associated with learning a script based language for modelling. We believe that web apps can be the bridge between these two worlds.

Therefore, we issue a double-sided challenge. To modellers: embrace these programmes and give them a go. R and other statistical software offer incredible capabilities and whilst the learning curve is steep, we believe the time saving and flexibility is worth the effort! To academics: let's meet people in the middle. We need to work on producing packages that are simple and easy to use but also flexible. This will make results easy to reproduce and transparent. Additionally, we need to create web applications that allow these R packages to reach a wider audience. Complex modelling can be easy – if we move away from Excel and embrace the world of R, packages, web applications and communication! And, as a final note, we conclude that communication must be the key. What tools – web apps and packages – are needed to fit more complex models and analyse their output? These can be made and then the sky is the limit in terms of complex modelling with simple implementation.

Footnotes

Financial support: Dr Gianluca Baio is partially funded by a research grant sponsored by Mapi. Ms Anna Heath is funded by the EPSRC. Both authors are the developers of the R package BCEA.

Conflict of interest: None of the authors has financial interest related to this study to disclose.

References

Welton

Sutton

Cooper

Abrams

Ades

Evidence synthesis for decision making in healthcare.

John Wiley & Sons, Ltd, Chichester, UK, 2012.

Baio

Bayesian methods in health economics.

Chapman Hall/CRC Press, Boca Raton, FL, 2012.

Strong

Oakley

Brennan

Estimating multiparameter partial expected value of perfect information from a probabilistic sensitivity analysis sample: a nonparametric regression approach.

Med Decis Making 2014;34(3):311–326

Heath

Manolopoulou

Baio

Estimating the expected value of partial perfect information in health economic evaluations using integrated nested Laplace approximation.

Stat Med 2016;35(23):4264–4280 http://onlinelibrary.wiley.com/doi/10.1002/sim.6983/full.

Briggs

Sculpher

Claxton

Decision modelling for health economic evaluation.

OUP Oxford

2006.

Briggs

Weinstein

Fenwick

Karnon

Sculpher

Paltiel

ISPOR-SMDM Modeling Good Research Practices Task Force. Model parameter estimation and uncertainty analysis a report of the ispor-smdm modeling good research practices task force working group–6.

Med Decis Making 2012;32(5):722–732

Claxton

Sculpher

McCabe

et al.

Probabilistic sensitivity analysis for NICE technology assessment: not an optional extra.

Health Econ 2005;14(4):339–347

Howard

Information value theory.

In IEEE transactions on system science and cybernetics 1 22–26 1966. SCC-2.

Strong

Breeze

Thomas

Brennan

SAVI - Sheffield accelerated value of information, release version 1.013

(2014-12-11), 2014.

10.

Baio

Berardi

Heath

BCEA: Bayesian cost effectiveness analysis.

2016 Available from: http://CRAN.R-project.org/package=BCEA. R package version 2.2-3 Accessed Oct 18, 2016.

11.

Williams

Lewsey

Briggs

Mackay

Estimation of survival probabilities for use in cost-effectiveness analysis: a comparison of a multi-state modelling survival analysis approach with partitioned survival and Markov decision-analytic modelling.

Med Decis Making 2016

12.

Jackson

Flexsurv: A platform for parametric survival modeling in R.

J Stat Softw 2016;70(8):1–33

13.

Baio

Berardi

Heath

Bayesian cost effectiveness analysis with the R package BCEA.

Springer

14.

Ding

Baio

BMETA: Bayesian meta-analysis and meta-regression.

2016 URL http://CRAN.R-project.org/package=bmeta. R package version 0.1.2. Accessed Oct 17, 2016.