Abstract
Detailed evaluation of the clinical utility and also health economic impact of new diagnostic tests prior to their implementation in clinical practice is important to limit overuse of tests, ensure benefits to patients, and support efficient use of health care resources. 1 Different frameworks have been developed for the phased evaluation of diagnostic tests.2–6 All these frameworks recognize that after evaluating the safety, efficacy, and accuracy of a diagnostic test, the impact of this test on health outcomes and costs should be determined. Evaluating tests in randomized controlled trials (RCTs), however, is often not feasible for ethical, financial, or other reasons, particularly in early test development stages.7–10 Indeed, RCTs evaluating the impact of diagnostic tests on patient outcomes are rare. 11 As an alternative, methods to develop decision-analytic models for the health economic evaluation of diagnostic tests, synthesizing all available evidence from different sources, have long been available.6,12–16 It is widely recognized that such models are a useful and valid alternative to evaluate the impact of new health technologies in general17,18 and diagnostic methods in particular.12,14
However, the comprehensive evaluation of the impact of new tests is typically much more complex than, for example, evaluation of the impact of new drugs. Among others, this is due to the indirect impact of tests on health outcomes by improved patient management (also referred to as “clinical utility” 19 ), the use of combinations and sequences of tests in clinical practice (depending on previous test results), and the often complex interpretation of test outcomes. In practice, model-based impact evaluations of tests therefore actually involve the evaluation of diagnostic testing strategies (i.e., test-treatment combinations).
Owing to the complexity of these diagnostic testing strategies, many model-based impact evaluations of tests make use of simplified models that do not incorporate all aspects of clinical practice. Simplified models are used because 1) evidence regarding all aspects involved in health economic test evaluations might be lacking, 2) inclusion of all aspects likely increases model complexity, or 3) researchers may not be aware of all aspects of test evaluation. For example, it is often not reported how the incremental effect of a new test, when used in combination with other tests, is determined and how the correlation between the outcomes of these different tests (applied solo or in sequence) is handled.20–23 Similarly, the selection of patients in whom the test is performed, the consequences of incidental findings (also referred to as chance findings), and the occurrence of test failures or indeterminate test results are often not reported.24–26 Although simplifications of the decision-analytic models used for such evaluations may sometimes be necessary and can be adequately justified, implicit simplification due to unawareness of all relevant evaluation aspects or without proper justification may lead to nontransparent and incorrect evaluation results.
General frameworks and guidelines regarding which aspects to include in decision-analytic modeling and how to report modeling outcomes are available27–29 but not specific enough to cover the complexities of diagnostic test evaluation. Furthermore, previous research into (aspects of) diagnostic test evaluation mostly focused on specific diseases or on specific types or combinations of diagnostic tests.23,30–35 A generic and comprehensive overview of all potentially relevant aspects in health economic evaluation of diagnostic tests and biomarkers that may be used to guide such evaluations is currently lacking.
The purpose of this article is, therefore, to provide such an overview as a generic checklist, intended to be applicable to all types of diagnostic tests and not specific to a single disease or condition or subgroup of individuals. Thereby, this checklist aims to allow researchers to explicitly consider all aspects potentially relevant to the health economic evaluation of a specific test, from a societal perspective. Therefore, this checklist is referred to as the “AGREEDT” checklist, which is an acronym of “AliGnment in the Reporting of Economic Evaluations of Diagnostic Tests and biomarkers.” Use of the checklist does not need to complicate such evaluations, as some aspects described may not be relevant to particular evaluations, but rather suggests that choices to exclude certain aspects are adequately justified.
Methods
This study consisted of 3 main steps: 1) the development of an initial checklist based on a scoping review, 2) review and critical appraisal of the initial checklist by 4 experts (CEP, MCW, MH, and TM) not involved in the scoping review, and 3) development of a final checklist based on the review by experts. Finally, each item from the checklist is illustrated using an example from previous research.
Scoping Review
In the past decades, hundreds of model-based health economic evaluations of diagnostic tests have been published, across a wide range of medical contexts. A still narrow literature search in PubMed in January 2017 resulted in a total of 1844 articles using the following combinations of search terms in title and abstract: (health economic OR cost-effectiveness) AND diagn* AND (model OR Markov OR tree OR modeling OR modelling). Besides the large number of studies that have been published in this field, systematic identification of health economic evaluations is found to be challenging. 36 This is partly caused by the multitude of MeSH terms in PubMed related to diagnostic strategies (over 48 MeSH terms exist that include the word diagnostic or diagnosis). Because of these challenges and the fact that different evaluations are very likely to include and exclude the same aspects, a scoping review was performed instead of a systematic literature review, followed by critical appraisal by 4 independent experts. A key strength of a scoping review is that it can provide a rigorous and transparent method for mapping areas of research, 37 particularly when an area is complex or has not been reviewed comprehensively before. 38
This scoping review was performed in PubMed in January 2017, searching for the following combination of search terms in the title of the article: (health economic OR cost-effectiveness) AND diagn* AND (model OR Markov OR tree OR modeling OR modelling) NOT diagnosed. The term NOT diagnosed was added to prevent retrieving many articles including patients who are already diagnosed with a certain condition, instead of focusing on the diagnostic process itself. The search was limited to articles published in English or Dutch. Studies were excluded, based on title and abstract, if they did not concern original research or did not evaluate the cost-effectiveness of the use of 1 or more tests (regardless of the effectiveness measure, for example, additional cost per additional correct diagnosis or per additional quality-adjusted life year). In addition, as guidelines for performing health economic evaluations continue to be updated,39–41 it was expected that the more recent studies would provide the most comprehensive overview of all potentially relevant items that need to be included in the checklist. To check this assumption, the PubMed search was repeated without limiting the search to studies published ≤5 years ago, resulting in 128 additional articles. Following this, 2 articles that were published >5 years ago were randomly selected.42,43 A thorough review of both articles did not result in any additional relevant items for inclusion in the checklist. Therefore, the search was limited to articles published in the past 5 years. One author screened studies for exclusion (MMAK) and consulted with a second author (HK) if necessary.
Design of the reporting checklist
All articles resulting from the scoping review were searched for items related to model-based health economic test evaluation of diagnostic tests that were either included explicitly in the evaluation, or that were only mentioned but not included (mostly in the introduction or discussion sections). Generic items, not specific to diagnostic test evaluation were not included in the new checklist as these are already covered in existing checklists. Examples of such generic items include choosing the time horizon and perspective of the evaluation.27–29 However, some overlap remains as the checklist does include items which are considered applicable to diagnostic test evaluation that are only covered partially or at a high level in existing guidelines.
A thorough screening of all articles was performed by MMAK resulting in an initial list of aspects considered to be potentially relevant. As the checklist was intended to provide a comprehensive overview of all potentially relevant aspects, all of these aspects were added to the checklist, unless it was considered to be already included in currently available guidelines (based upon agreement between MMAK and HK). The definition of each aspect was based on agreement between MMAK and HK.
Critical Appraisal and Validation of the Reporting Checklist
As diagnostic tests and imaging are used for a large variety of (suspected) medical conditions, an expert panel with a broad field of experience was required for critical appraisal of the checklist. Therefore, the expert panel was composed in such a way that at least 1 expert was experienced in each of the different areas of interest (i.e., biomarkers or imaging) and in each of the different purposes of diagnostic testing (i.e., diagnosis, screening, monitoring, and prognosis). In addition, to maximize the likelihood that the final checklist is generalizable to different countries and settings, the experts chosen lived on 3 different continents. Four experts were invited (CEP, MCW, MH, and TM) to participate via email, and none of them declined.
The initial checklist was critically appraised and validated independently by all 4 experts, who received the checklist via email. They were asked to provide individual, qualitative judgments on whether all items in this list were clear and unambiguous, to indicate any missing or redundant items in this list, and to provide suggestions for further improvement.
Finalization of the Reporting Checklist
Based on the experts’ suggestions, several changes were made to the reporting checklist. Those changes involved the rewording of items, removal of redundant items, and the addition of missing items to the checklist. As this checklist is intended to provide an exhaustive list of all aspects relevant to the health economic evaluation of diagnostic tests and biomarkers, all suggestions for the addition of missing items were adopted. All changes made to the checklist were decided upon agreement between MMAK and HK (for a full description, see online Appendix 1). The revised checklist was again critically appraised by all authors and agreed upon. Finally, the articles included in the scoping review were reread by MMAK to assess whether the final checklist items were included or mentioned.
Funding
This study was not funded.
Results
Results of the Scoping Review
The literature search resulted in 77 articles that were screened for inclusion in the scoping review, of which 14 articles were excluded. Of these, 4 articles did not specifically evaluate the cost-effectiveness of a (combination of) diagnostic test(s), 2 concerned a letter to the editor, and 7 articles focused on methodological aspects of the evaluation of diagnostic strategies (e.g., in the context of single disease, or on specific types or combinations of diagnostic tests, as mentioned earlier). In addition, 1 article was excluded because the full text could not be obtained or purchased by the university library, from online databases, from the website of the publisher, or by contacting the authors. This resulted in a total of 63 studies that were included in the scoping review. An overview of this selection process is provided in Figure 1.

Result of scoping review and checklist design process. This figure first gives an overview of the selection process of articles in the scoping review, as well as the number of checklist items this resulted in, and subsequently shows the results of the expert appraisal on the items included in the final checklist.
A critical evaluation of the 63 articles resulted in an initial list of 29 items. These items were divided into 6 main topics: 1) time to presentation of the individual to the health professional (i.e., the clinical starting point), 2) use of diagnostic tests, 3) test performance and characteristics, 4) patient management decisions, 5) impact on health outcomes and costs, and 6) wider societal impact, which may accrue to patients, their families, and/or health care professionals. This societal impact, for example, may concern the impact on caregivers (in terms of time spent on hospital visits and caregiving and the accompanying impact on productivity), on the health system or health professional (e.g., in terms of reduced patient visits), or on society (e.g., measures that aim to prevent widespread antibiotic resistance). Quantifying these aspects may provide a broader view on the potential impact of diagnostic testing.
Critical Appraisal and Validation of the Reporting Checklist
Following the critical appraisal of experts, the list was updated, with 1 item being removed and 15 items being added; 1 additional item was added based on the suggestion of a reviewer of the manuscript during the submission process. Finally, this resulted in a reporting checklist consisting of 44 items, as shown in Table 1. Of these 16 added items, 8 involved a further specification of the tests’ diagnostic performance, as included below item 3.2 in the checklist. The item that was removed concerned the generalizability of the results, which was considered not specific to diagnostic test evaluations. The full reporting checklist, including an overview in which of the studies from the scoping review each of the items was included or considered, as well as an example for each of the items, is provided in online Appendix 2. An overview of this process, including the scoping review and the critical appraisal by the experts, is shown in Figure 1. The final list of items in this reporting checklist, in chronological order from the start of the diagnostic trajectory and onward, is illustrated in Figure 2.
Reporting Checklist to Indicate Which Items Were Included in the Health Economic Evaluation of Diagnostic Tests and Biomarkers
If an item is included in the quantitative analysis, indicate the corresponding model parameter(s) and evidence source(s).
If an item is excluded from the quantitative analysis, please explain why the exclusion was necessary.
Existing guidelines indicate that subgroup analyses are relevant when different strategies are likely to be (sub)optimal in different subgroups. Subgroup specific analyses can then be performed to address multiple decision problems. Here we consider scenarios where different tests may be used in different subgroups, depending on patient characteristics or previous test outcomes.

Overview of steps in diagnostic trajectory. This figure gives a conceptual outline of the steps involved in the diagnostic trajectory, in chronological order from top to bottom. The numbers shown at the several steps correspond to the item numbers presented in Table 1. The dashed lines represent steps of which the duration may vary substantially, for example, the time between symptom onset and presentation to a clinician (which may vary from minutes in case of severe symptoms to years for mild and gradually developing conditions). The arrows indicate situations in which either the diagnostic test (result) was not usable or indeterminate (items 3.3–3.5) or situations in which the treatment proves to be ineffective (items 5.5–5.7). As this may be caused by an incorrect diagnosis, the patient may undergo a subsequent round of diagnostic testing and (possibly) treatment. Alternatively, the diagnosis may be correct but the treatment incorrect, in which an alternative treatment may be initiated. *Although the (wider) societal impact of diagnostic testing often involves long-term effects, these effects may sometimes also become apparent in the short term. Dx, diagnostic test; Tx, treatment.
Results indicate that health economic evaluations of diagnostic tests or biomarkers differ considerably in the items that have been explicitly included (or considered for inclusion) in the corresponding decision-analytic model (Table 1). Some of the items from the checklist were only included (or considered) in a few studies from the scoping review. For example, the impact of incidental findings on performing additional tests, the consistency of test results over time, and the impact of test outcomes on relatives themselves were each only addressed in 3 of the 63 included studies. These items may not have been included in other studies because they were considered not relevant to the specific context, because (scientific) evidence was lacking, or because these items were not considered due to unawareness of their relevance by the authors.
Discussion
Strengths
A strength of this study is that it combines evidence from multiple sources, including a review of literature, as well as a validation by experts. As the items included in the checklist are defined in general terms and not limited to specific diseases, tests, care providers, or patient management strategies, this reporting checklist can potentially be useful in performing and appraising health economic evaluations worldwide and across a broad spectrum of (novel) diagnostic technologies. In addition, as this checklist specifically focuses on health economic evaluations of diagnostic tests or biomarkers, an area for which no reporting checklists are yet available, it may be a useful extension to existing reporting checklists, such as the Consolidated Health Economic Evaluation Reporting Standards (CHEERS) checklist. 27 Finally, use of this checklist can also support development of health economic models through increased awareness of all potentially relevant evaluation aspects.
In addition, use of this checklist does not necessarily require more resources to be allocated to the evaluation or increase the complexity of the resulting decision-analytic model. In general, deliberation on the relevance of all aspects is key, and aspects may be excluded from the evaluation whenever this can be adequately justified. For example, when evaluating the cost-effectiveness of a new point-of-care troponin test used by the general practitioner compared to an existing, older point-of-care troponin test (in the context where the new test would replace the old test), aspects such as “time to start of the diagnostic trajectory” and “purpose of the test” will not differ between both strategies. In addition, “complication risks” associated with taking the blood sample (in both point-of-care tests) are likely extremely small, which could justify excluding these aspects from the analysis.
Limitations
Performing a systematic literature review was considered not possible given the large number of published economic evaluations of diagnostic tests. Therefore, a scoping review was performed instead by 1 reviewer. As the judgment regarding whether an aspect was incorporated in a health economic evaluation was sometimes found to be difficult, it cannot be excluded that these judgments may have differed slightly when performed by a different reviewer. In addition, as the decision to limit the search strategy to the past 5 years was based on reviewing 2 studies published >5 years ago, this small sample (i.e., 1.6% of studies published >5 years ago) cannot rule out the possibility that items have been missed by excluding all older studies. Also, the scoping review may have been subject to publication bias, as it may have omitted potentially relevant aspects from unpublished studies, as well as from method manuals (including those focusing on economic evaluations of other interventions or technologies in health care). Despite the abovementioned limitations, the critical review of the checklist by 4 independent experts from different countries makes it unlikely that important items have been missed.
In addition, the expert appraisal resulted in the addition of 16 items to the checklist. Although this may seem to be a large extension to the items already identified in the scoping review, 8 of these added items actually involved a further specification (i.e., a subitem) of the test’s diagnostic performance. It was found useful to further specify “test performance” (i.e., item 3.2, which initially integrated several performance measures) into 8 subitems to further increase the transparency and comparability of health economic test evaluations.
Implications for Practice
This study was intended to design a reporting checklist without formulating a quality judgment of the studies included in the scoping review, based on which items of the checklist they did or did not incorporate. Furthermore, some items may have been included implicitly in the health economic evaluations identified in the scoping review, which could thus not be identified by the reviewer. As scientific articles are often restricted in their length, there may often be insufficient space to mention the inclusion (or justified exclusion) of each of the items from this checklist. In these situations, authors are recommended to describe their use of this checklist in an appendix. More specifically, authors are recommended to describe which items from the checklist they included in their evaluation and what evidence was used to inform them. Furthermore, they are recommended to explicitly state the reason(s) for excluding checklist items from their evaluation. Although it may be considered time-consuming to consider all 44 items of this checklist, it should be noted that most of these items are actually subitems, which do not need to be considered if the overarching (higher-level) item is (justifiably) excluded from the evaluation.
In addition, it should be noted that not all items in this checklist can be considered of equal importance. For example, diagnostic performance will typically have a larger impact on health outcomes and costs compared to considering the occurrence of test failures or the consistency of test results over time. However, this checklist is designed to provide an exhaustive overview of all potentially relevant items, regardless of importance. Therefore, use of this checklist will likely increase the chance that all relevant aspects will be included in health economic evaluations of diagnostic tests and biomarkers. Ultimately, it is up to the researchers to make a justifiable decision on which items to incorporate and which to exclude.
Finally, experiences regarding the use of this reporting checklist in practice may be valuable to further enhance its completeness and usability. Furthermore, given the rapid methodological developments in the field of health economic evaluation of diagnostic tests, regular updating of this checklist may be warranted.
Conclusion
Given the complexity and dependencies related to the use of diagnostic tests or biomarkers, researchers may not always be fully aware of all the different aspects potentially influencing the result of a model-based health economic evaluation. The use of the reporting checklist developed in this study may remedy this by increasing awareness of all potentially relevant aspects involved in such model-based health economic evaluations of diagnostic tests and biomarkers and thereby also increase the transparency, comparability, and—indirectly—the validity of the results of such evaluations.
Footnotes
No funding has been received for the conduct of this study and/or the preparation of this manuscript.
Prof. Merlin reports that she was previously commissioned by the Australian Government to develop version 5.0 of the “Guidelines for Preparing a Submission to the Pharmaceutical Benefits Advisory Committee.” Some of the content concerning “Product Type 4—Codependent Technologies” influenced the guidance suggested in the current article. Prof. Weinstein reports that he was a consultant to OptumInsight on unrelated topics.
All other authors declare that there is no conflict of interest.
Authors’ Note
This research was conducted at the department of Health Technology and Services Research, University of Twente, Enschede, the Netherlands.
