Sage Journals: Discover world-class research

Abstract

Background. To develop guidance on the effect of data presentation format on communication of health probabilities, the Making Numbers Meaningful project undertook a systematic review. Purpose. This article, one in a series, covers evidence about a “synthesis task,” in which readers examine stimuli to synthesize information about multiple features of health options, such as chances of both harm and benefit for a treatment. This article presents evidence of the effect of format on perceptual, cognitive, affective, and behavioral outcomes. Data Sources. MEDLINE, Embase, CINAHL, the Cochrane Library, PsycINFO, ERIC, ACM Digital Library; hand search of 4 journals. Finding Selection. Manual pairwise screening to identify experimental and quasi-experimental research comparing 2 or more formats for presenting quantitative health information to lay audiences. This article reports on 91 findings derived from 45 unique studies reported in 42 articles. Data Extraction. Pairwise extraction of information on stimulus (data in a data presentation format), cognitive task, and perceptual, affective, cognitive, or behavioral outcomes. Data Synthesis. Evidence was found about 6 outcomes: identification/recall, contrast, effectiveness perceptions/feelings, behavioral intentions/behavior, trust, and preference. No strong evidence was found. Moderate evidence suggests that for synthesis tasks, behavioral intention is not affected by whether the risk and benefit probabilities are in text or in tables, that people prefer tables to text for presenting this information, and that effectiveness feelings are not affected by whether or not numbers are supplemented by narratives. Limitations. Granular data extraction and evidence syntheses lead to narrow evidence statements. Conclusions. Current evidence on synthesis tasks is moderate strength at best. Future studies should enrich the evidence on how to present information needed to synthesize multiple features of health options, given the importance of this task.

Highlights

This study found a moderate number of studies assessing strategies for evaluating sets of probabilities conveying information such as risks and benefits.

Evidence is moderate that although presenting sets of probabilities in table versus sentences may not affect behavioral intentions, people may prefer tables.

Contrary to previous studies about probability feelings, moderate evidence suggested that narratives may not affect effectiveness feelings.

Evidence was insufficient to draw conclusions regarding contrast, identification, and trust outcomes, and no studies assessed recall, categorization, computation, or discrimination outcomes.

Keywords

systematic review evidence synthesis numeracy decision aids risk communication risk perception shared decision making

Many health-related decisions require considering not just 1 or 2 probabilities but sets of probabilities. For example, a recently approved Alzheimer disease drug carries a chance of benefit (modest slowing of cognitive decline) as well as a chance of serious adverse effects (brain edema or bleeding, death).¹ To make an informed decision, patients need information about both the benefits and the harms so that they can make a holistic judgment about whether to start this medication. Patient decisions become even more complicated when a patient needs to examine harms and benefits for multiple treatment options instead of just one. Patients require clear and easy-to-understand information about their options and the benefits and harms of each to make informed decisions.² Information about the options often includes probabilities of benefit and the probabilities of harm.

A growing body of research evidence shows that the format used to present these sets of probabilities can substantially affect perceptions and decisions. As one example, loss framing the chance of side effects as a 24% chance can make a treatment option seem less attractive than gain framing the same likelihood as 76% chance of not experiencing side effects.¹ Therefore, to develop evidence-based guidance on the effects of format for this type of quantitative information as well as other quantitative information relevant to health, the Making Numbers Meaningful project conducted a broad systematic literature review on how to communicate health-related numbers across types of data and types of data presentation formats.³ We assumed a basic communication model in which a reader performs cognitive tasks to make sense of a stimulus, experiencing cognitive, affective, perceptual, or behavioral responses captured by researchers with outcome measures. Our focus was on the effects of stimulus formats on these outcomes.

To make sense of the evidence given the wide variety of types of stimuli and outcome measures used in studies of numerical communication, we segmented the research in several ways, using concepts from our previously published numbers communication taxonomy.⁴ First, we divided the research by whether it studied the communication of probabilities (such as chance of disease) or quantities (such as lab results).

Another categorization was by the task required of the reader or research participant. A synthesis task (such as the tasks covered in the current article) is one in which the reader must use a stimulus to synthesize information about multiple features that have the same valence or different valences. Synthesis tasks can involve assessing features of a single option or assessing features of multiple options. Examples include reviewing a list of probabilities of several side effects associated with a medication, examining a table of probabilities of risks and benefits for 2 medications, and using a decision aid that presents chances of benefits and harms across several options. Other tasks, covered in other Making Numbers Meaningful articles, include point tasks (examining a stimulus for information about individual probabilities), difference tasks (using the stimulus to evaluate the difference between probabilities, such as the effect of a therapy on the chance of recovery), and trend tasks (examining the stimulus to assess a pattern of probability over time).

Within this article, we further classify the research by the outcome measures used in the studies, which helps demonstrate whether a format has different effects on different outcomes. In this article, we cover the outcomes that we found in the literature on synthesis tasks: 1) identifying and restating a probability difference presented in the stimulus (termed identification) or recalling it (termed recall); 2) identifying the largest or smallest of a set of probability differences (termed contrast); 3) perception of the size of a probability difference measured on a scale of sizes (effectiveness perception) or an affective scale about worry, concern, or another feeling (effectiveness feeling); 4) intended, selected, or planned behavior (behavioral intentions) or actual health-related actions (behavior); 5) perceived credibility of the information as presented (trust); and 6) perceived helpfulness, attractiveness, or usefulness of the data presentation format (preference). Although the first 2 of these outcomes (identification and recall) would appear to be different, we grouped them as described below because of a frequent lack of clarity in the research about which was being measured. Similarly, we grouped behavioral intention with behavior because these 2 constructs are theoretically related and very few research studies in this domain measured actual behavior.

Throughout, we cover evidence on all data presentation formats: numbers, graphics, and verbal probabilities.

As a result of this approach, the systematic review produced a series of results papers (Table A).

Table A

Current Article’s Scope within the Making Numbers Meaningful Systematic Review

This standardized numbering system has been used for results subheadings in this article and across all Making Numbers Meaningful results articles to ensure that readers can find comparable information in all articles. Gray cells represent combinations that are not possible according to the definitions presented in Ancker et al.⁴

Our objective in the current results article is to present evidence about the effects of format for probability synthesis tasks. As shown in Table A and listed above, 9 outcomes were possible, but in fact, the review found only 6: identification or recall, contrast, effectiveness perceptions and feelings, behavioral intention or behavior, trust, and preference. We include evidence on all data presentation formats, including numbers, graphics, and verbal probabilities.

Methods

The Making Numbers Meaningful systematic literature review (Prospero number CRD42018086270) sought to estimate the effect of format on perceptual, affective, cognitive, or behavioral outcomes by identifying research from multiple disciplines that compared 2 or more ways of presenting health-related numbers to lay or nonmedically trained audiences. The current article is limited to health probabilities, although other articles focus on other types of information. In brief, we searched MEDLINE, Embase, CINAHL, the Cochrane Library, PsycINFO, ERIC, and ACM Digital Library and checked the tables of contents of Medical Decision Making, Patient Education and Counseling, Risk Analysis, and Journal of Health Communication. Our methods article describes the literature search, screening, risk-of-bias evaluation, data extraction, assessment of study risk of bias (S-ROB), assessment of finding credibility, and organization into evidence tables.³ Instruments from the review, including the search strategy, the S-ROB instrument, and the data extraction instrument, are available in the methodology folder of the Making Numbers Meaningful Project at OSF (https://osf.io/rvxf2/).

We identified 316 articles for the entire project, of which 42 involved synthesis tasks with stimuli showing probability data and are covered in the current article. From each study, we extracted information about task, stimulus (data and data presentation format), and outcome. Substudies (i.e., independent research studies published in the same article) were extracted as separate records. Each study/substudy was assessed for S-ROB using a rubric adapted for this project and available at OSF (https://osf.io/rvxf2/).³

Each study could include 1 or more tasks, format comparisons, and outcomes. In the data extraction, we called each unique combination of task, format comparison, and outcome a finding. For example, imagine a study in which readers were randomized to see probability information about medication risks and benefits in tables of either percentages or frequencies per 1,000 (which we term rates per 10ⁿ) and then asked to complete a questionnaire instrument about the information. This study might produce a synthesis task finding comparing tables of percentages versus tables of rates per 10ⁿ in their effects on the participant’s intended choice of medication (behavioral intention) and a second synthesis task finding about the participant’s preferences between formats. (If the participant was also asked to identify a specific risk or benefit probability in the table rather than evaluate the table holistically, this would result in a point task finding; point task findings do not appear in this article but would appear in another Making Numbers Meaningful article.)

Each finding was rated for credibility by 2 expert reviewers (primarily J.S.A. and B.J.Z.-F., with N.C.B. substituting in cases of conflict of interest). Reviewers assigned credibility scores holistically by considering sample size, statistical methods, face validity and comparability of the stimuli being compared, validity of the outcome measures and covariates, and S-ROB for the study from which the finding was extracted. Credibility was assigned holistically on a sale from 1 to 10 on the basis of the expert team’s evaluation of these factors, rather than according to a quantitative rubric. Credibility of findings from a study often varied. For example, the primary outcome might result in a high-credibility finding, but secondary outcomes not subjected to hypothesis testing or subset analyses with small sample size might produce low-credibility findings.

Findings were grouped by task and outcome and synthesized into guidance statements. We applied a standard rubric to grade the strength of evidence according to finding risk of bias, finding credibility, and consistency of findings. Consistency was considered high if all findings were significant in the same direction or if a large majority were significant in one direction with a few lacking in significance. Consistency was considered moderate if findings showed a small majority of significant effects in one direction with the remainder lacking significance, while consistency was considered low if the findings showed significant effects in different directions. Our standard rubric for evidence strength was:

Strong: High consistency within group of 2 or more high-credibility findings or a mix of high- and moderate-credibility findings.

Moderate: a) High consistency within a group of 2 or more moderate-credibility findings or b) moderate consistency within 2 or more findings in which at least 1 was high-credibility and the others moderate-credibility.

Weak: Moderate consistency within group of 2 or more moderate-credibility findings or only a single high-credibility finding.

Insufficient evidence—too few findings: a) Only low-credibility findings available or b) only 1 moderate-credibility finding.

Insufficient evidence—conflicting findings: Any case in which evidence consistency was low.

Findings with high credibility (7 or higher on a scale of 1 to 10) and moderate credibility (4.5–6.5) are discussed below. Findings with lower credibility (4 or lower) are mentioned below and counted in Table B but do not contribute to the evidence summaries or the statements in the evidence tables.

Table B

Section Headings for Each Subset of Outcome Evidence Included in This Article and the Number of Included Findings^a

Subsection		Identifi-cation/Recall	Contrast	Effectiveness Perceptions/Effectiveness Feelings	Behavioral Intention/Health Behavior	Trust	Preference for Format	Total Findings per Data Presentation Format Comparison
Data presentation format comparison	Section Number/Subsection Letter	1	2	6	7	8	9
Comparisons between numerical formats	A	1A (n = 1)	2A (n = 2)	6A (n = 2)	7A (n = 14)	8A (n = 1)	9A (n = 4)	24
Comparisons between graphical formats	B	—	2B (n = 1)	—	7B (n = 9)	—	9B (n = 5)	15
Comparisons between numerical and graphical formats	C	—	2C (n = 2)	—	7C (n = 14)	—	9C (n = 5)	21
Comparisons between numerical and verbal probabilities	D	—	—	6D (n = 1)	7D (n = 2)	—	—	3
Effects of elements added for context	E	—	—	6E (n = 2)	7E (n = 7)	—	—	9
Effects of gain-loss framing	F	—	—	6F (n = 1)	7F (n = 11)	—	9F (n = 2)	14
Effects of representing uncertainty	G	—	—	—	7G (n = 1)	—	—	1
Effects of animation or interactivity	I	—	—	—	7I (n = 3)	—	9I (n = 1)	4
Total findings per outcome		1	5	6	58	1	17	91

—, n = 0.

There were no relevant findings for the following comparisons: larger or smaller denominators (row H) or longer or shorter time periods (row J). There were no relevant findings for the following outcomes: categorization (section 3), computation (section 4), or discrimination (section 10). (Probability Perceptions, section 5, is not possible for synthesis tasks.) The standardized numbering system in Table B has been used for the subheadings of all Making Numbers Meaningful results articles. The numbers ensure that, for example, studies of the effects of number formats on trust are always placed in a subhead labeled subsection 8A (whether or not that article contains sections 1 through 7). Our goal is to ensure that readers can use this subhead system to locate similar sections across articles.

Results

The review identified 42 articles that evaluated the effect of format on synthesis tasks. Table B shows the counts of findings for each outcome (column) and format comparison (row). Each section in the Results presents findings for 1 outcome; the sections begin with a description of the outcome, summarize the relevant high- and moderate-credibility findings in text form, and present a table of the evidence generated from the findings. As mentioned in the methods, low-credibility findings are mentioned in the text summaries but do not contribute to the evidence.

As shown in Table B, most studies on synthesis tasks focused on the outcomes of behavioral intentions (such as intention to take a medication or to choose between therapies) or preferences for format (such as whether participants found a particular format more attractive or informative than other formats).

With regard to outcomes, no studies assessed categorization, computation, or discrimination outcomes. However, these particular outcomes are not impossible. Participants could have been asked to sum the probabilities of a list of side effects (compute) or to determine whether a therapy with a list of side effects met some prespecified definition of a high-risk therapy (categorize). Participants could have been asked to make judgments about risk-benefit tradeoffs as the size of the risk or the size of the benefit was systematically varied to determine the smallest increment in probability that produced different decisions (discrimination). Regarding format comparisons, no studies examined denominator manipulations or time period variations.

The full spreadsheet of synthesis task findings cited in the current article is available at the Making Numbers Meaningful Project at OSF (https://osf.io/rvxf2/) in the Probability Findings folder.

Effects of Different Formats on Ability to Identify or Recall Information for Synthesis Tasks (Identification Outcome): Section 1

When researchers asked participants to view a stimulus and identify or restate the numbers in it, we considered that outcome to be identification. If the stimulus was removed before the participant answered the question, we considered the outcome to be recall.

Comparisons between numerical formats on the ability to identify or recall numbers (subsection 1A)

A finding by Brick et al.⁵ involving a single identification question was reported only as part of a 12-item comprehension measure; it was not summarized.

Effects of Different Formats on Ability to Identify Largest or Smallest Number for Synthesis Tasks (Contrast Outcome): Section 2

A number of studies assessed comprehension in part by asking respondents to compare probabilities and identify the larger (or smaller) one. For example, one study asked participants whether the chance of benefits was larger than the chance of harms. We termed this outcome contrast.

Comparisons between numerical formats on the ability to identify largest or smallest number (subsection 2A)

PERCENTAGES VS. RATES PER 10ⁿ: A moderate-credibility finding from Waters et al.⁶ showed that ability to determine which option had the lowest overall cancer risk (combining multiple risks and benefits) was higher when information was presented as percentages (such as 3%) than as rates per 10ⁿ (such as 3 in 100).

Table 2A

Evidence-Based Guidance for Effects of Numerical Formats on Ability to Identify Largest or Smallest Number in Synthesis Tasks.

Comparison	Evidence Strength	General Guidance
Percentages v. rates per 10ⁿ	Insufficient—too few findings (n = 1)	It is not clear whether presenting combinations of risk and benefit probabilities as percentages versus rates per 10ⁿ affects ability to identify the highest overall risk in risk-benefit communications.
Order effects	Insufficient—too few findings (n = 1)	It is not clear whether presenting risks before benefits or the reverse order affects ability to identify which set of probabilities is larger.

ORDER EFFECTS: A moderate-credibility finding by Fraenkel et al.⁷ found higher ability to recognize that the chance of benefits was larger than the chance of risks when benefits were presented before risks versus the reverse order.

Comparisons between graphical formats’ effect on ability to identify largest or smallest number (subsection 2B)

A moderate-credibility finding by Waters et al.⁸ showed that in a description of a drug that reduced the chance of one cancer and increased the chance of another, ability to recognize that the drug reduced total probability of disease was similar with part-to-whole icon arrays (graphical displays showing a matrix of icons to represent both the numerator and denominator of a percentage) or bar charts.

Table 2B

Evidence-Based Guidance for Effects of Graphical Formats on Ability to Identify Largest or Smallest Number in Synthesis Tasks

Comparison	Evidence Strength	General Guidance
Percentages v. rates per 10ⁿ	Insufficient—too few findings (n = 1)	It is not clear whether presenting combinations of risk and benefit probabilities as icon arrays or bar charts affects ability to identify the highest overall risk in risk-benefit communications.

Comparisons between numerical and graphical formats’ effect on the ability to identify largest or smallest sets of numbers (subsection 2C)

In a moderate-credibility finding, Waters et al.⁶ showed that ability to determine which option had the lowest overall cancer risk (combining multiple risks and benefits) was higher when presented in a vertical bar graph than in numerical formats (either rates per 10ⁿ or percentages; moderate-credibility finding). A moderate-credibility finding from another study by Waters et al.⁸ showed that in a description of a drug that reduced the chance of one cancer and increased the chance of another, ability to recognize that the drug reduced the total probability of disease was better with icon arrays than with percentages alone.

Table 2C

Evidence-Based Guidance for Effects of Numerical versus Graphical Formats on Ability to Select Largest or Smallest Number in Synthesis Tasks

Comparison	Evidence Strength	Applied Example of Evidence-Based Communication	General Guidance
Bar graph v. numerical formats	Moderate (n = 2)	When asked to combine a set of probabilities (such as with a drug that decreases the chance of one cancer by 10% and increases the chance of another by 1%), people may do a better job of determining the overall benefit or harm of the drug with graphics than with numbers alone.	When asked to combine a set of probabilities to determine the overall benefit or harm of an option, people may perform better with graphics than with numbers alone.

Effects of Different Formats on Effectiveness Perceptions and Effectiveness Feelings: Section 6

When participants were asked their perception of how large or small an effect was on a quantitative scale indicating size (for example, a scale anchored at “very small” and “very large”), we considered that an effectiveness perception. If the perception was measured on a scale with affective words such as worry or concern, we considered it effectiveness feelings.

Comparisons between number formats on effectiveness perceptions and effectiveness feelings (subsection 6A)

ABSOLUTE DIFFERENCES IN RATES PER 10ⁿ WITH OR WITHOUT PERCENTAGES: Blalock et al.⁹ assessed whether format affected perceptions of whether overall benefits of an intervention outweighed its risks (high-credibility finding). Benefit was formatted as either a probability difference (how many fewer people would experience the event out of 100,000) or a pair of percentages, and chances of side effects were presented as a list of rates per 100,000. No format differences were significant.

Table 6A

Evidence-Based Guidance for Effects of Numerical Formats on Effectiveness Perceptions and Effectiveness Feelings for Synthesis Tasks

Comparison	Evidence Strength	Applied Example of Evidence-Based Communication	General Guidance
Differences in rates per 10ⁿ with or without percentages	Weak (n = 1)	When providing information about the chances of benefits and of side effects with a therapy, and expressing the chance of benefit as “10 fewer people out of 1,000 will get the disease,” adding percentages (e.g., “a reduction from 5% to 4%”) may not affect overall perceptions of the therapy.	In risk-benefit communications, adding a pair of percentages with and without treatment to an absolute difference as rate per 10ⁿ may not affect subjective perceptions of risk-benefit tradeoffs (overall effectiveness).
Order effects	Insufficient—too few findings (n = 1)	It is not clear whether presenting risks before benefits or the reverse order affects subjective perceptions of the risk-benefit tradeoff between treatment options in risk-benefit communications.

ORDER EFFECTS: A moderate-credibility finding by Bergus et al.¹⁰ was that perceived favorability of a low-risk–low-benefit option was higher when risks were presented before benefits (v. the reverse order), although the lack of effect in other conditions in this study somewhat reduces confidence in this finding.¹⁰

Comparisons between numerical and verbal probabilities on effectiveness perceptions and effectiveness feelings (subsection 6D)

A moderate-credibility finding¹¹ found no differences in perceived relative magnitude of benefits and harms of different screening tests when presented verbally versus with common denominator rates per 10ⁿ.

Table 6D

Evidence-Based Guidance on Effects of Numerical versus Verbal Formats on Effectiveness Perceptions and Effectiveness Feelings in Synthesis Tasks

Comparison	Evidence Strength	General Guidance
Rates per 10ⁿ v. percentages	Insufficient—too few findings (n = 1)	It is not clear whether using verbal probabilities versus rates per 10ⁿ affects perceptions of the balance between benefits and harms in risk-benefit communications.

Effect of adding elements for context on effectiveness perceptions and effectiveness feelings (subsection 6E)

PERSONAL NARRATIVES: Both a high-credibility finding¹² and a moderate-credibility finding¹¹ found no difference in perceptions of benefits (risk reduction) and/or harms when statistical information was or was not accompanied by narratives.

Table 6E

Evidence-Based Guidance on Effects of Contextual Information on Effectiveness Perceptions and Effectiveness Feelings in Synthesis Tasks

Comparison	Evidence Strength	Applied Example of Evidence-Based Communication	General Guidance
Personal narratives	Moderate (n = 2)	In a message about benefits and harms of a medication, people’s perceptions of effectiveness do not appear to be influenced by the addition of a story about someone who experienced the benefit but also had a side effect.	In risk-benefit communications, perceptions of probability differences for benefits or harms (i.e., effects) appear not to be affected by adding narratives.

Effects of Framing on Effectiveness Perceptions and Effectiveness Feelings (Subsection 6F)

A moderate-credibility finding¹¹ found no differences in perceived relative magnitude of benefits and harms of different screening tests when presented in a gain versus loss frame.

Table 6F

Evidence-Based Guidance on Effects of Gain-Loss Framing on Effectiveness Perceptions and Effectiveness Feelings in Synthesis Tasks

Comparison	Evidence Strength	General Guidance
Gain v. loss framing	Insufficient—too few findings (n = 1)	It is not clear whether presenting probability benefits and harms in gain v. loss framing affects perceptions of the balance between benefits and harms in risk-benefit communications.

Effects of Different Formats on Behavior or Behavioral Intention: Section 7

Activities such as getting a mammogram were considered health behaviors. Intentions such as expressing an intention to get a mammogram were considered behavioral intentions. As behaviors were rarely measured in these studies, and because behavioral intention is both empirically and theoretically linked to behavior, we grouped studies of behavioral intention with the studies of behavior throughout the MNM project. In this particular summary of findings about synthesis tasks, all findings pertained to behavioral intention.

Comparisons between number formats in their effect on behavior or behavioral intention (subsection 7A)

NUMBERS IN TEXT VERSUS TABLES: Two high-credibility findings from 2 different studies by Tait et al.^13,14 and a moderate-credibility finding by Brick et al.⁵ all found no impact on behavioral intentions when risk and benefit probabilities (percentages or rates per 10ⁿ) were provided in sentence text or in table formats. However, 2 moderate-credibility studies (Schwartz et al.¹⁵ substudies 1 and 2) found a higher intention to choose the lowest risk drug when risk and benefit information was presented in a drug facts box table format using both percentages and rates per 10ⁿ versus with percentages only embedded in small print text and tables. However, large differences between versions makes it possible that these effects may be due to other types of formatting differences beyond the text versus table comparison.

Table 7A

Evidence-Based Guidance on Effects of Numerical Formats on Behavior or Behavioral Intentions in Synthesis Tasks

Comparison	Evidence Strength	Applied Example of Evidence-Based Communication	General Guidance
Numbers in text v. tables	Moderate (n = 5)	In a risk-benefit message that contains information about a chance of benefitting from a therapy (e.g., a chance of 60%) and a list of side effects (e.g., risks of 2%, 5%, and 10%, respectively), people’s intention to take the therapy does not appear to be affected by whether the information is in text or in a table.	In risk-benefit communications, presenting risk and benefit probabilities in table versus sentence text does not appear to affect behavioral intentions.
Adding absolute differences to rates	Weak (n = 2)	People’s intentions may not be affected by telling them a medication reduces their risk of death from 6% to 3% but increases their chance of a side effect from 10% to 15%, as compared with telling them a medication reduces their risk of death by 3%, from 6% to 3% but increases their chance of a side effect by 5%, from 10% to 15%.	In risk-benefit communications, adding absolute probability difference information to risk and benefit probabilities or rates may not affect behavioral intentions.
Percentages v. life expectancy	Weak (n = 1)	People may be more likely to choose a surgery described as having a 10% chance of death due to surgery and an average life expectancy of those who survive of 7 y (for a total life expectancy of 6 y) versus one described as having a 10% chance of death due to surgery and a 66% chance of death by 5 y.	In time-tradeoff communications, presenting options in life expectancy terms instead of percentage surviving may increase the choice of the option with higher long-term survival.
Relative differences v. absolute differences or absolute rates	Insufficient—inconsistent findings (n = 3)	It is not clear whether presenting comparative harms as relative risk v. pairs of absolute rates per 100 affects behavioral intentions in risk-benefit communications.
Ratios v. rates per 10ⁿ	Insufficient—too few findings (n = 1)	It is not clear whether presenting chance of harms as a ratio (e.g., 3:1) as counts or as rates per 10ⁿ in risk-benefit communications affects behavioral intentions.

RELATIVE DIFFERENCES VERSUS ABSOLUTE DIFFERENCES OR ABSOLUTE RATES: Three findings contrasted relative versus absolute differences but used different formats for both the relative difference and the absolute rates. A moderate-credibility finding¹⁶ found higher intention to take a risk-reducing medication when its harms were presented in rates per 10ⁿ plus icon array format instead of relative risk reduction as a percentage (e.g., reduce risk by 40%). However, the format used to present benefits did not appear to affect behavioral intentions, and the confounding of number format with the use of graphics makes interpretation of this finding difficult. Another moderate-credibility finding by Hembroff et al.¹⁷ found higher intention to recommend a medication when risks were presented as pairs of 1-in-X versus relative risk reduction, but again, the format of benefit information did not appear to affect behavioral intentions. A third moderate-credibility finding by Miller and Holdaway¹⁸ found that when women were given the relative risk of 2 options (e.g., cesarean birth has a mortality rate 2.5 times higher than vaginal), they were more likely to choose the riskier option than when given the absolute rate per 10ⁿ for both options (e.g., a 0.01 per 100 mortality chance for cesarean birth vs. 0.004 per 100 mortality chance for vaginal birth).

ADDING ABSOLUTE DIFFERENCES TO RATES: Two moderate-credibility findings^9,19 found no effects on behavioral intentions comparing absolute rates alone versus absolute probability differences (either alone or with absolute rates).

PERCENTAGES VERSUS LIFE EXPECTANCY: In a high-credibility finding, McNeil et al.²⁰ studied time-tradeoff decisions and found that intent to choose the option with higher long-term survival (and lower short-term survival) was higher with a life expectancy format instead of percentage surviving.

RATIOS VERSUS RATES PER 10ⁿ: In a moderate-credibility finding communicating the chance of overdiagnosis from screening, Waller et al.²¹ found lower intentions to screen when the overdiagnosis risk was presented as a ratio of overdiagnoses to lives saved (e.g., 3:1) versus as counts of number of overdiagnoses or as common-denominator rates.

Two findings were considered lower credibility and not summarized (poor reporting of details in Wegwarth et al.²² and inconsistencies across scenarios in Bergus et al.¹⁰).

Comparisons between graphic formats in effect on behavior or behavioral intention (subsection 7B)

MULTIOUTCOME VERSUS SINGLE-OUTCOME ICON ARRAYS: Two high-credibility findings (from Zikmund-Fisher et al.²³ 2010 substudies 1 and 2) found higher behavioral intentions to take cancer treatment medications when part-to-whole icon arrays of survival benefits highlighted total and incremental survival outcomes than when icon arrays highlighted multiple survival and mortality outcomes. However, 2 moderate-credibility findings (from McDowell et al.²⁴ substudies 1 and 2) found no difference in prostate cancer screening behavioral intentions when risks and benefit tables used integrated multioutcome icon arrays versus separate single-outcome arrays.

Table 7B

Evidence-Based Guidance on Effects of Graphical Formats on Behavior or Behavioral Intentions in Synthesis Tasks

Comparison	Evidence Strength	Applied Example of Evidence-Based Communication	General Guidance
Multioutcome v. single-outcome icon arrays	Weak (n = 4)	People may have a higher intention to take a therapy if the reduced chances of death, hospitalization, and symptoms are presented in 3 separate icon arrays than if they are shown in a single integrated icon array using different colors to indicate the chances of the 3 outcomes.	In risk-benefit communications, intention to take a treatment may be higher if several beneficial treatment effects are shown in separate icon arrays than if they are shown in a single integrated or multioutcome icon array.
Grouped v. random icon arrays	Weak (n = 1)	People may have a higher intention to take a therapy if the 10 out of 100 risk of side effects is shown in an icon array that groups the 10 event icons together versus randomly distributes them in the icon array.	In multirisk tradeoff communications, more people may express intent to choose the low-risk option if the chances of treatment harms are shown in grouped icon arrays than in random icon arrays.
Graphic type	Insufficient—too few findings (n = 2)	It is not clear whether bar charts, pie charts, or icon arrays are most influential on behavioral intentions in risk-tradeoff communications.

GROUPED VERSUS RANDOM ICON ARRAYS: A high-credibility finding²⁵ showed higher behavioral intention to choose a lower-risk treatment with static grouped icon arrays than with random arrays (static or animated).

GRAPHIC TYPE: In a high-credibility finding, Waters et al.⁸ showed higher intention to take a cancer risk–reducing drug when benefits were presented using multioutcome part-to-whole icon arrays than as stacked bar charts. However, a moderate-credibility finding²⁶ found no differences in medication behavioral intentions between multiple bar chart formats, multiple pie chart formats, and part-to-whole icon arrays.

Two low-credibility findings were not summarized due to small sample sizes and competing stimuli²⁷ or lack of reporting details (Weinstein et al.²⁸ substudy 2).

Comparisons between numerical and graphical formats in effect on behavior or behavioral intention (subsection 7C)

RATES PER 10ⁿ VERSUS ICON ARRAYS: Eight findings contrasted icon array graphics and rates per 10ⁿ, with mixed results. Two high-credibility findings^8,13 found increased intentions to take a risk-reducing medication when probabilities of benefits and harms were presented in icon array format. However, no differences between icon arrays and rates were found in 2 additional high-credibility findings^14,26 and 4 moderate-credibility findings (Fraenkel et al.,²⁹ Cozmuta et al.,³⁰ McDowell et al.²⁴ substudies 1 and 2).

Table 7C

Evidence-Based Guidance on Effects of Numerical versus Graphical Formats on Behavior or Behavioral Intentions in Synthesis Tasks

Comparison	Evidence Strength	Applied Example of Evidence-Based Communication	General Guidance
Rates per 10ⁿ v. bar chart v. pie chart	Weak (n = 2)	Telling a person that a medication reduces their chance of dying from 6 in 100 to 3 in 100 but increases their chance of a side effect from 10 in 100 to 15 in 100 may not affect their intentions regardless of whether this information is in numbers, a bar chart, or a pie chart.	Presenting probability tradeoffs in bar chart formats or pie chart formats versus numerical rates may not affect behavioral intentions.
1-in-X v. icon arrays	Weak (n = 1)	People may be more likely say they will take a vaccine if the side effect chance is shown in an icon array of 1 colored icon and 19 unaffected icons than if it is described as affecting 1 out of 20 people.	People may be more likely to express intention to take a therapy if its adverse effects are shown in an icon array than if they are described in 1-in-X rates.
Absolute percentages v. icon arrays	Insufficient—too few findings (n = 2)	It is not clear whether presenting probability tradeoffs in icon array format versus absolute percentages affects behavioral intentions.
Relative risk percentages v. icon arrays	Insufficient—too few findings (n = 1)	It is not clear whether presenting probability tradeoffs in icon array format versus relative risk reductions as percentages affects behavioral intentions.
Rates per 10ⁿ v. icon arrays	Insufficient—inconsistent findings (n = 8)	In risk-benefit communication, it is unclear whether behavioral intentions will be affected by whether chances of benefit and of harm are provided in rates per 10ⁿ or in icon arrays.

RATES PER 10ⁿ VERSUS OTHER GRAPHICS: The high-credibility Hawley et al.²⁶ finding also showed no differences in behavioral intentions between rates per 10ⁿ and several bar chart– and pie chart–based graphical formats. Waters et al.⁸ also found no differences between stacked bar charts and numbers.

1-IN-X VERSUS ICON ARRAY: In a simple high-credibility finding,³¹ intention to take an influenza vaccine was higher when the chance of adverse effects was presented as a set of dots (simple icon array) than in 1-in-X frequencies (e.g., “1 in 1,000”).

PERCENTAGES VERSUS ICON ARRAYS: A complicated moderate-credibility finding³² found higher behavioral intentions when benefits were presented as pairs of percentages and risks were presented using a verbal list without probability information, and when risks and benefits were presented in multioutcome icon arrays (without numbers) versus a flow chart graphic that included natural frequency numbers. The major differences between these formats, however, decrease confidence in both the equivalence of percentages and icon arrays and the comparisons with the flow chart format. In addition, a moderate-credibility discrete choice experiment³³ found no differences in choices when probabilities were presented as percentages alone or as percentages plus part-to-whole icon arrays.

RELATIVE RISK PERCENTAGES VERSSU ICON ARRAYS: A moderate-credibility finding¹⁶ found higher intention to take a risk-reducing medication when its harms were presented in icon array format instead of relative risk reduction as a percentage. However, the format used to present benefits did not appear to affect behavioral intentions.

Comparisons between numerical and verbal probabilities in effect on behavior or behavioral intention (subsection 7D)

One moderate-credibility finding¹¹ found no difference in screening behavioral intentions when risks and benefits were presented using rates per 10ⁿ or verbal probability terms. Another moderate-credibility study suggests that in communicating the chance of false-positive/false-negative screening test results, the intention to get screened did not differ when the chance was described in 1-in-X rates or in verbal terms only.³⁴ Although both findings were moderate credibility, the stimuli used in the studies were quite different, suggesting only moderate consistency.

Table 7D

Evidence-Based Guidance on Effects of Numerical versus Verbal Probability Formats on Behavior or Behavioral Intentions in Synthesis Tasks

Comparison	Evidence Strength	Applied Example	General Guidance
Verbal probabilities v. rates per 10ⁿ	Weak (n = 2)	These 2 descriptions may be equivalent in their effects on the reader:• “The screening test has a high chance of detecting cancer among patients with cancer and a low chance of a false positive.”• “The screening test has a 95 in 100 chance of detecting cancer among patients with cancer, and a 1 in 100 chance of a false positive.”	Screening intentions may not be affected by whether risks and benefits are presented as rates or as verbal probabilities (e.g., “rare” or “common”).

Effect of adding elements for context on behavior or behavioral intention (subsection 7E)

AVERAGE RISK: In communication about a risk-reducing drug that had a set of benefits (reduced chance of breast cancer) as well as side effects, a high-credibility finding showed higher intent to take the therapy when people were told that their personal risk was higher than average than when they were told it was lower than average.³⁵

Table 7E

Evidence-Based Guidance on Effects of Contextual Information on Behavior or Behavioral Intentions in Synthesis Tasks

Comparison	Evidence Strength	Applied Example of Evidence-Based Communication	General Guidance
Average risk	Weak (n = 1)	People may be more likely to take a risk-reducing treatment for disease A when told that their risk of disease A is above average (e.g., personal risk = 6%, average risk = 3%) as opposed to when their risk is below average (e.g., risk = 6%, average risk = 12%).	In risk-benefit communications, providing average risk information that shows that one is at above-average risk may increase intentions to do risk-reducing behaviors.
Anecdotes/narratives	Moderate (n = 3)	People’s behavioral intentions do not appear to be affected by receiving an anecdote about someone named Tom who experienced the benefit but also had a side effect (e.g., headache), in addition to the quantitative information about the probability of benefit and probability of harm for each option.	In risk-benefit communications, providing narratives or anecdotes does not appear to affect people’s behavioral intentions when making risk-tradeoff decisions.
Descriptive labels	Insufficient—too few findings (n = 1)	It is not clear whether adding a label describing which option has a higher probability of each outcome in risk-benefit communications affects behavioral intentions.
Explanation of tradeoffs	Insufficient—too few findings (n = 1)	It is not clear whether providing a conceptual explanation and diagram showing risk-benefit tradeoffs affects behavioral intentions.
Social norm information	Insufficient—too few findings (n = 1)	It is not clear whether describing a behavior as a social norm affects behavioral intentions.

ANECDOTES/NARRATIVES: Two moderate-credibility findings^11,12 found no difference in behavioral intentions when statistical information was or was not accompanied by narratives. Similarly, a moderate credibility finding (Steinhardt and Shapiro³⁶ substudy 3) found no effect on time-tradeoff choices of embedding statistical information into a narrative.

ADDING QUALITATIVE DESCRIPTIVE LABELS: In a moderate-credibility finding, Sullivan et al.¹⁹ found that intentions to take a medication were not affected by whether a table of probabilities was accompanied by labels describing which option had the higher risk for each outcome.

ADDITIONAL EXPLANATION OF TRADEOFFS: A moderate-credibility finding³⁰ found higher intentions to take a drug when risk information was accompanied by balance beam graphics conceptually illustrating the risk-benefit tradeoff but confounding with other elements in this study reduces confidence in this guidance.

SOCIAL NORM MANIPULATION: A moderate-credibility finding by Schwartz et al.³⁷ found a similar intention to screen for cancer when screening was or was not described as the default social norm.

Effects of gain-loss framing on behavior or behavioral intentions (subsection 7F)

GAIN FRAMING VERSUS LOSS FRAMING AT 1 TIME POINT: Three findings directly examined whether framing risks and benefits at a single point in time in gain frame (chance of surviving) or loss frame (chance of dying) affected behavioral intentions. Two moderate-credibility findings in 2 separate articles by Cormier O’Connor³⁸ and O’Connor³⁹ found that in communications about the chances of benefit and harm from 2 treatments, choosing the treatment with better survival (but greater toxicity) was more likely when mortality information was survival framed than when it was mortality framed. For communications of the potential benefits and harms of cancer screening, a moderate-credibility finding by Sheridan et al.¹¹ showed no effects of framing the chances of harm.

Table 7F

Evidence-Based Guidance on Effects of Gain-Loss Framing on Behavior or Behavioral Intentions in Synthesis Tasks

Comparison	Evidence Strength	Applied Example of Evidence-Based Communication	General Guidance
Gain frame v. loss frame at 1 time point	Weak (n = 3)	Imagine drug A has better survival probability than drug B but also more side effects. People may be more likely to choose a more toxic drug A over drug B if survival information is gain framed (i.e., A has a 70% chance of survival but side effects and B has a 60% chance of survival and no side effects) than if all information is loss framed (A has a 30% chance of mortality but side effects and B has a 40% chance of mortality and no side effects).	In communication about benefits and harms of options, gain framing of survival may make it more likely that people will choose the option with better survival outcomes.
Gain frame v. loss frame at 2 time points	Weak (n = 2)	Imagine that treatment C has worse short-term outcomes but better long-term outcomes and treatment D has the opposite. People may be more likely to choose C if all information is gain framed (C: 60% chance of short-term benefit, followed by 80% chance of long-term benefit; D: 70% short-term benefit, 70% long-term benefit) than if it is loss-framed (C: 40% chance of short-term mortality, followed by 20% chance of long-term mortality; D: 30% chance of short-term mortality, followed by 30% of long-term mortality).	In communications about options with different short- and long-term risk-benefit profiles, gain-framing may increase the likelihood of choosing the option with better long-term outcomes.
Gain frame v. combination frame	Weak (n = 4)	For the same drugs A and B described above, people may be more likely to choose A if information is gain framed (e.g., 70% chance of survival, etc.) than if it is combination-framed (e.g., 70% chance of survival and 30% chance of mortality, etc.).	In communications about the chances of benefit and harm of options, choice of the option with a higher rate of survival may be higher when survival information is gain framed than when it is combination framed.
Mixed framing	Weak (n = 1)	People may be more likely to choose a treatment with a 20% chance of fatigue but only a 70% chance of avoiding nausea than a treatment with a 30% chance of fatigue but an 80% chance of avoiding nausea, but that choice might be reversed if the framing of each side effect were reversed.	In risk-tradeoff communications, using mixed frames (with some outcomes loss framed, some outcomes gain framed) to describe harms of interventions may lead people to choose the option that has the lowest chance of the negatively framed (loss framed) outcome.

GAIN FRAMING VERSUS LOSS FRAMING OF SHORT- AND LONG-TERM OUTCOMES: Two findings examined framing effects in time-tradeoffs (short-term outcomes v. long-term outcomes). A high-credibility finding²⁰ found higher intention to choose the lower short-term survival/higher long-term survival with gain framing (percentage who live). A moderate-credibility finding (Steinhardt and Shapiro³⁶ substudy 3) with a similar design did not find a significant effect of framing treatment effects, but a nonsignificant trend was consistent with higher behavioral intention to avoid short-term risks when framed as percentage who die versus percentage who live.

GAIN FRAMING VERSUS COMBINATION FRAMING: Two moderate-credibility findings found higher risk-reducing behavioral intentions with gain-framed icon arrays (highlighting survival chances) than with combination-framed icon arrays (highlighting both chances of survival and chances of mortality from multiple causes); however, confounding of the framing manipulation with the multioutcome manipulation makes it less clear that the effect was solely due to framing (Zikmund-Fisher et al.²³ substudies 1 and 2). Two moderate-credibility findings in 2 separate articles by Cormier O’Connor³⁸ and O’Connor³⁹ found no effect on behavioral intentions whether survival/mortality information was gain framed or combination framed.

MIXING FRAMES: In addition, a high-credibility finding (Peng et al.⁴⁰ substudy 4) found that when gain/loss frames were combined in the same statement such that one adverse event was gain framed and the other loss framed, respondents tended to choose the option that showed better outcomes for the negatively framed event. This is consistent with an avoidance of negative-framed adverse events and greater acceptance of positive-framed adverse events.

A low-credibility finding by Llewellyn-Thomas et al.⁴¹ is not summarized due to a very small sample.

Effect of stating or illustrating numerical uncertainty on behavior or behavioral intentions (subsection 7G)

A moderate-credibility finding⁴² found no effect on intent to take a medication when benefits and harms were presented as point estimate percentages or as a range of percentages.

Table 7G

Evidence-Based Guidance on Effects of Stating Numerical Uncertainty on Behavior or Behavioral Intentions in Synthesis Tasks

Comparison	Evidence Strength	General Guidance
Point estimates v. ranges	Insufficient—too few findings (n = 1)	It is not clear whether presenting benefit and harm probabilities as point estimates or ranges of percentages in risk-benefit communications affects behavioral intentions.

Effect of animation or interactivity on behavior or behavioral intentions (subsection 7I)

A high-credibility finding²⁵ found stronger intention to choose the options with lower side effect rates when people viewed static grouped icon arrays than when they saw a variety of animated displays. However, a moderate-credibility finding by Fraenkel et al.²⁹ found no effect on intention to screen by whether information was in a static icon array or a slideshow animation.

Table 7I

Evidence-Based Guidance on Effects of Animation or Interactivity on Behavior or Behavioral Intentions in Synthesis Tasks

Comparison	Evidence Strength	General Guidance
Animated v. static displays	Insufficient—inconsistent findings (n = 2)	It is not clear whether animation in risk-benefit tradeoff communications affects behavioral intentions.

An additional finding (Weinstein et al.²⁸ substudy 2) examined effects of interactivity on behavior in tradeoff situations but could not be synthesized due to lack of reporting details.

Effects of Different Formats on Trust in the Information: Section 8

Comparisons between numerical formats on trust in sets of numbers (subsection 8A)

A low-credibility finding⁵ is not summarized solely because inconsistencies between the reported means and submeans, possibly due to typographic errors, made it difficult to confirm the presence or size of the effect.

Preferences for Formats (Preference Outcome): Section 9

Whenever participants were asked which format they preferred, we recorded a preference outcome. Related concepts such as perceived usefulness of a format is also included as preference.

Preferences for different number formats (subsection 9A)

NUMBERS IN TEXT VERSUS TABLES: Three high-credibility findings examined presenting benefit and harm probabilities in table formats versus in sentence text. Two^5,13 found some degree of preference for table formats over numbers in sentence text, while a finding from a different Tait et al.¹⁴ study found no difference.

Table 9A

Evidence-Based Guidance on Preferences for Numerical Formats in Synthesis Tasks

Comparison	Evidence Strength	Applied Example of Evidence-Based Communication	General Guidance
Numbers in text v. tables	Moderate (n = 3)	Imagine a drug with a 75% chance of benefit and 3 side effects, with chances of 2%, 5%, and 10%, respectively. People seem to prefer this information in a table instead of in a sentence.	In risk-benefit communication with probabilities of both harms and benefits, people appear to prefer numbers (rates per 10ⁿ, percentages) in table formats (e.g., drug facts box) over the same numbers in sentence text.
Percentages and/or rates per 10ⁿ	Weak (n = 1)	Imagine a drug with chances of benefit and of side effects. People may not have a preference about whether the information is presented as a 75% chance of benefit with a 2% chance of side effects or a 75 in 100 chance of benefit with a 2 in 100 chance of side effects.	In risk-benefit communication involving probabilities for different risk tradeoffs, people may not have a preference between percentages v. common denominator rates per 10ⁿ.

PERCENTAGES AND/OR RATES PER 10ⁿ: A high-credibility finding⁴³ found no variations in preference among tables that included percentages only, rates per 10ⁿ only (with either fixed or variable denominators), or percentages and rates per 10ⁿ.

Preferences for different graphic formats (subsection 9B)

SINGLE-OUTCOME VERSUS MULTIPLE-OUTCOME ICON ARRAYS: Two moderate-credibility findings (Zikmund-Fisher et al.²³ substudies 1 and 2) found preference for simpler icon arrays showing survival/mortality rather than more complex ones portraying several potential outcomes (e.g., cancer-specific survival/mortality as well as all-cause survival/mortality). Both types also showed incremental benefit.

Table 9B

Evidence-Based Guidance on Preferences for Graphical Formats in Synthesis Tasks

Comparison	Evidence Strength	Applied Example of Evidence-Based Communication	General Guidance
Single outcome v. multioutcome icon arrays	Weak (n = 2)	In risk-benefit communications, people may prefer to receive an array with icons highlighted to illustrate that a therapy increases chances of survival from 50 in 100 to 70 in 100, rather than a more complex array with different colors to show 70 individuals surviving, 18 dying from cancer, and 12 dying from other causes.	In risk-benefit communications, people may prefer icon arrays that highlight only survival outcomes versus those that highlight survival and multiple mortality outcomes.
Grouped v. scattered icon arrays	Weak (n = 1)	In risk-benefit communications, people may prefer to see a 15% chance of cardiovascular disease as an icon array with the 15 event icons grouped together rather than scattered throughout the array.	People may prefer grouped icon arrays to randomly arranged icon arrays (whether static or animated) in communications about multiple probabilities.
Iconicity	Insufficient—too few findings (n = 1)	It is not clear whether people prefer icon array displays with higher or lower levels of iconicity when showing risk-benefit tradeoffs.
Graphical formats	Insufficient—too few findings (n = 1)	It is not clear whether people prefer specific graphical formats (e.g., bar charts, icon arrays, or pie charts) for showing risk tradeoffs.

GROUPED VERSUS SCATTERED ICON ARRAYS: A high-credibility finding²⁵ found preference for grouped icon arrays (static or animated to group) versus random icon arrays (static or animated).

ICONICITY: In a moderate-credibility finding, Gaissmaier et al.⁴⁴ found no preference difference between icon array displays that varied in degree of iconicity (i.e., their abstractness v. concreteness).

COMPARING GRAPHICAL FORMATS: In a moderate-credibility finding that compared different graphic formats, Tait et al.⁴⁵ found both icon arrays and bar charts preferred to pie charts or number tables.

Preferences for numerical versus graphical formats (subsection 9C)

Evidence about preference for numbers versus graphics in showing risk tradeoffs is limited both by the few studies specifically looking at risk tradeoffs and by the fact that the studies in this category often evaluated different sorts of graphics and numbers. Mixed findings are probably because of important differences in the types of graphics used, as well as in what kind of meaning people needed to take away from the communication.

Table 9C

Evidence-Based Guidance on preferences for Numerical versus Graphical Formats in Synthesis Tasks

Comparison	Evidence Strength	General Guidance
Numbers v. graphical formats	Insufficient—inconsistent findings (n = 5)	It is not clear whether numbers or graphics are generally preferred for communicating sets of harm and benefit probabilities in risk-benefit communications.

In a high-credibility finding, Veldwijk et al.⁴⁶ found that participants in a discrete choice experiment preferred presentations of choice options using numbers (percentages or 1 in X) over numbers plus icon arrays. Also, a high-credibility finding by Tait et al.¹³ found a preference for numbers in tables over icon arrays.

However, a very similar high-credibility finding from the same team¹⁴ found the reverse: a preference for icon arrays over tables or numbers in sentence text. In addition, a moderate-credibility finding in another Tait et al.⁴⁵ study found that icon arrays or bar charts were preferred over pie charts or numbers (percentages plus rates per 10ⁿ). A final moderate-credibility finding by Gaissmaier et al.⁴⁴ found no preference differences between rates per 10ⁿ and icon arrays that varied in their degree of iconicity (i.e., abstractness v. concreteness).

Preferences for gain-loss framing (subsection 9F)

Two large-sample, moderate-credibility findings (Zikmund-Fisher et al.²³ substudies 1 and 2) found that women preferred simpler gain-framed (survival only) icon arrays over more complex combination framed (survival + mortality) icon arrays in showing effect of treatment. However, the fact that the combination-framed displays included more types of information raises questions about whether the framing caused this effect.

Table 9F

Evidence-Based Guidance on Preferences for Framing in Synthesis Tasks

Comparison	Evidence Strength	Applied Example of Evidence-Based Communication	General Guidance
Single outcome (gain framing) v. combination multioutcome framing	Weak (n = 2)	People may prefer a gain-framed icon array highlighting the 70 out of 100 people who survived, rather than a more complex combination-framed multioutcome icon array that shows 70 people surviving, 18 dying from cancer, and 12 dying from other causes.	In icon arrays portraying chances of benefit of a therapy, people may prefer gain-framed arrays that highlight survivors alone to multioutcome icon arrays that highlight all survival and mortality outcomes and are thus combination-framed.

Preferences for animation or interactivity (subsection 9I)

A high-credibility finding from Zikmund-Fisher et al.²⁵ found preference for static grouped arrays or animated arrays that group icons over animated displays that shuffle icons (either automatically or on demand).

Table 9I

Evidence-Based Guidance on Preferences for Animation or Interactivity in Synthesis Tasks

Comparison	Evidence Strength	Applied Example of Evidence-Based Communication	General Guidance
Static icon arrays v. grouped animated icon arrays v. shuffled animated icon arrays	Weak (n = 1)	With icon arrays, people may prefer to see probabilities (such as a 20% chance of a side effect) in static icon arrays rather than animated arrays that shuffle the icons to portray randomness. If different types of animated arrays are available, people may prefer ones with grouped icons.	In risk-benefit communications, people may prefer probabilities to be presented using icon arrays that are static or, if animated, that group the event icons together versus animated arrays that shuffle the event icons.

Summary of Evidence

This synthesis task review found no strong evidence but uncovered multiple pieces of moderate evidence as well as many weak evidence findings.

Moderate evidence for risk-benefit synthesis communications includes the following:

Presenting sets of risk and benefit probabilities in tables versus sentences does not seem to affect behavioral intentions (section 7A). However, people seem to prefer numbers in table format as opposed to in a sentence (section 9A).

Effectiveness feelings do not seem to be affected by whether or not narratives (e.g., personal experiences) are provided (section 6E).

Weak evidence regarding behavioral intentions for risk-benefit synthesis communications suggests the following:

Presenting benefits as a single-outcome icon arrays instead of a multioutcome icon array may increase behavioral intentions (section 7B).

Presenting adverse events as an icon array instead of 1-in-X numbers may increase behavioral intentions (section 7C).

In time-tradeoff communications, presenting options in life expectancy terms instead of percentage surviving may increase the choice of the option with higher long-term survival, while gain framing instead of loss framing may lead people to choose options with lower short-term survival but higher long-term survival (section 7F).

Framing benefits of interventions using gain frame only rather than loss frame or combination (gain plus loss) framing only may increase intentions to choose the intervention, while using mixed frames to describe harms of interventions may tend to lead people to choose the option that has the lowest chance of the negatively framed (loss-framed) outcome (section 7F).

Presenting treatment options in grouped icon arrays as opposed to random may increase intentions to select lower-risk treatment options (section 7B).

Providing average risk information showing someone is above-average risk may increase intentions to engage in risk-reducing behaviors (section 7E).

Behavioral intentions may not be affected by 1) adding absolute probability to risk/benefit rates (section 7A), 2) presenting tradeoffs in bar charts versus pie charts (Section 7B), or 3) including narratives or anecdotes (section 7F)

Weak evidence on other outcomes related to synthesis tasks:

In risk-benefit synthesis communications, people may tend to prefer gain-framed versus combination-framed icon arrays (sections 9B and 9F).

People may prefer grouped (static or animated) versus random (static or animated) icon arrays (sections 9B and 9I).

There may not be preference differences between risk-benefit synthesis communications using percentages or common-denominator rates per 10ⁿ (section 9A).

Adding pairs of percentages (i.e., chance of outcome with and without treatment) to an absolute difference as a rate per 10ⁿ as part of risk-benefit synthesis communications may not affect effectiveness feelings (section 6A).

Discussion

This article summarizes the available research involving probability synthesis tasks, in which readers evaluate and make decisions about sets of probabilities, such as the risks and benefits of a medical intervention or the set of side effects for a medication. Synthesis tasks are inherently more complex than point tasks (which focus on single probabilities) and difference tasks are (involving pairs of probabilities or differences between probabilities), because readers must consider probability across multiple features, which may also differ in severity (for harms) or benefit (for benefits). The available synthesis task studies focused heavily on behavioral intention outcomes (61 findings) with a smaller number evaluating format preferences (17 findings), both of which involve holistic evaluation of the set of probability information. Only a few findings assessed effectiveness perceptions and/or feelings (6 findings), here measuring overall perceptions across multiple risks/benefits rather than a single pairwise comparison. We are only able to present insufficient findings regarding contrast, identification, and trust outcomes, and no studies evaluated recall, categorization, computation, or discrimination outcomes.

A plurality of findings compared 2 or more numerical formats (24 findings) with a relatively balanced numbers of findings comparing numbers and graphics (21 findings), graphics alone (15 findings), effects of gain-loss framing (14 findings), and impact of providing context (7 findings). There were fewer findings related to comparisons of numeric and verbal probabilities, representations of uncertainty, and the effect of animation or interactivity.

The available evidence for synthesis tasks provides little in the way of clear guidance for practice. People appear to prefer tabular presentations over data presented in sentences when performing synthesis tasks, and doing so does not appear to affect behavioral intentions, so there may be little reason not to prioritize tabular approaches. Conversely, while narratives did not affect overall effectiveness perceptions and may not affect behavioral intentions in the synthesis task evidence we reviewed here, our review of evidence related to difference tasks⁴⁷ did find that narratives can strongly influence behavioral intentions for those sorts of tasks. As a result, communicators will need to consider carefully what tasks their patients or readers need to do in order to decide how to balance these differing conclusions. Many of the weak evidence findings listed here derive from either small numbers of studies or single high-credibility studies, and future research may either strengthen or contradict these conclusions.

Like other articles in the Making Numbers Meaningful review, we are limited by the possibility of missing studies, using a small group of experts to determine risk of bias and credibility, and a very granular data extraction process that results in specific and yet narrow evidence. Although our approach focuses on ensuring that studies are compared only with very similar studies, it also separates research findings across different tasks and outcomes, and these tasks and outcomes may all be relevant in evaluating the pros and cons of different number formats. We were also unable to evaluate the effect of differences in presentation formats based on variables such as numeracy or culture due to small numbers of studies for each of these variables and because of inconsistencies in how they were measured.

In conclusion, most evidence regarding synthesis tasks pertained to behavioral intention and preference outcomes, which makes sense given that these outcomes are holistic evaluations of the full set of information provided, not just 1 or 2 numbers. It is notable, however, that we did not derive any strong evidence (only moderate or weak), and the moderate evidence suggested a lack of effect of tabular presentations and narratives on behavioral intentions. Such evidence of noneffects is important, as many decision aids seek to help patients determine the best treatment option for themselves, not to persuade toward a particular course (e.g., induce intention or feelings of effectiveness). We also found moderate and weak evidence regarding format preferences, but as in other articles in this review, we urge caution in making presentation format decisions based solely on preference, as it may be confounded with other measures that allow people to objectively evaluate information (e.g., contrast, categorization).

It is important to note that data presentations that can be used for synthesis tasks (such as tables of risk and benefit probabilities) can also be used for point and difference tasks (such as recall of single probabilities, contrasts of 2 probabilities, or effectiveness perceptions based on a pair of probabilities). Designers of such communications, therefore, need to consider not only the evidence presented here but also the evidence regarding the effects of different communication formats on point and difference tasks.^47–50 Such integrative analyses are a natural extension of the present research, and we strongly encourage the development of such multilevel guidance for communications practitioners.

Footnotes

Acknowledgements

We thank the Numeracy Expert Panel for contributions to conceptualizing the MNM project (Cynthia Baur, Sara Cjaza, Angela Fagerlin, Carolyn Petersen, Rima Rudd, Michael Wolf, and Steven Woloshin). We are grateful to Marianne Sharko, MD, MS, Andrew Z. Liu, MPH, and Lisa Grossman Liu, MD, PhD, for contributions to article screening and risk of bias assessment. We also thank Jordan Brutus for assisting with data management.

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Financial support for this study was provided entirely by a grant from the National Library of Medicine (R01 LM012964, Ancker PI). The funding agreement ensured the authors’ independence in designing the study, interpreting the data, writing, and publishing the reports.

ORCID iDs

Natalie C. Benda

Jessica S. Ancker

Brian J. Zikmund-Fisher

Availability of Research Resources

All research resources are available at the Making Numbers Meaningful Project at OSF (). This project includes a Methodology Files folder (containing the search strategy, the data extraction instrument, and the study risk of bias [S-ROB] rubric), the list of each included article mapped to the Making Numbers Meaningful review article that covers it, and a Probability Findings folder displaying the extracted findings for each of the Making Numbers Meaningful review articles.

References

Sims

Zimmer

Evans

, et al. Donanemab in early symptomatic Alzheimer disease: the TRAILBLAZER-ALZ 2 randomized clinical trial. JAMA. 2023;330(6):512–27. DOI: 10.1001/jama.2023.13239

Stacey

Volk

RJ.

The International Patient Decision Aid Standards (IPDAS) collaboration: evidence update 2.0. Med Decis Making. 2021;41(7):729–33. DOI: 10.1177/0272989x211035681

Ancker

Benda

Sharma

, et al. Scope, methods, and overview findings for the Making Numbers Meaningful evidence review of communicating probabilities in health: a systematic review. MDM Policy Pract. 2025;10(1):23814683241255334. DOI: 10.1177/23814683241255334

Ancker

Benda

Sharma

Johnson

Weiner

Zikmund-Fisher

BJ.

Taxonomies for synthesizing the evidence on communicating numbers in health: goals, format, and structure. Risk Anal. 2022;42(12):2656–70. DOI: 10.1111/risa.13875

Brick

McDowell

Freeman

ALJ

. Risk communication in tables versus text: a registered report randomized trial on ‘fact boxes.’ R Soc Open Sci. 2020;7(3):190876. DOI: 10.1098/rsos.190876

Waters

Weinstein

Colditz

Emmons

Formats for improving risk communication in medical tradeoff decisions. J Health Commun. 2006;11(2):167–82. DOI: 10.1080/10810730500526695

Fraenkel

Stolar

Swift

Street

Chowdhary

Peters

Subjective numeracy and the influence of order and amount of audible information on perceived medication value. Med Decis Making. 2017;37(3):230–8. DOI: 10.1177/0272989X16650665

Waters

Weinstein

Colditz

Emmons

KM.

Reducing aversion to side effects in preventive medical treatment decisions. J Exp Psychol Appl. 2007;13(1):11–21. DOI: 10.1037/1076-898X.13.1.11

Blalock

Sage

Bitonti

Patel

Dickinson

Knapp

Communicating information concerning potential medication harms and benefits: what gist do numbers convey?

Patient Educ Couns. 2016;99(12):1964–70. DOI: 10.1016/j.pec.2016.07.022

10.

Bergus

Levin

Elstein

AS.

Presenting risks and benefits to patients. J Gen Intern Med. 2002;17(8):612–7.

11.

Sheridan

Sutkowi-Hemstreet

Barclay

, et al. A comparative effectiveness trial of alternate formats for presenting benefits and harms information for low-value screening services: a randomized clinical trial. JAMA Intern Med. 2016;176(1):31–41. DOI: 10.1001/jamainternmed.2015.7339

12.

Okuhara

Ishikawa

Okada

Kato

Kiuchi

Persuasiveness of statistics and patients’ and mothers’ narratives in human papillomavirus vaccine recommendation messages: a randomized controlled study in Japan. Front Public Health. 2018;6:105. DOI: 10.3389/fpubh.2018.00105

13.

Tait

Voepel-Lewis

Zikmund-Fisher

Fagerlin

Presenting research risks and benefits to parents: does format matter?

Anesth Analg. 2010;111(3):718–23. DOI: 10.1213/ANE.0b013e3181e8570a

14.

Tait

Voepel-Lewis

Zikmund-Fisher

Fagerlin

The effect of format on parents’ understanding of the risks and benefits of clinical research: a comparison between text, tables, and graphics. J Health Commun. 2010;15(5):487–501. DOI: 10.1080/10810730.2010.492560

15.

Schwartz

Woloshin

Welch

HG.

Using a drug facts box to communicate drug benefits and harms: two randomized trials. Ann Intern Med. 2009;150(8):516–27. DOI: 10.1059/0003-4819-150-8-200904210-00106

16.

Hudson

Toop

Mangin

Pearson

Risk communication methods in hip fracture prevention: a randomised trial in primary care. Br J Gen Pract. 2011;61(589):e469–76. DOI: 10.3399/bjgp11X588439

17.

Hembroff

Holmes-Rovner

Wills

CE.

Treatment decision-making and the form of risk communication: results of a factorial survey. BMC Med Inform Decis Mak. 2004;4:20. DOI: 10.1186/1472-6947-4-20

18.

Miller

Holdaway

How communication about risk and role affects women’s decisions about birth after caesarean. Patient Educ Couns. 2019;102(1):68–76. DOI: 10.1016/j.pec.2017.09.015

19.

Sullivan

O’Donoghue

Aikin

KJ.

Communicating benefit and risk information in direct-to-consumer print advertisements: a randomized study. Ther Innov Regul Sci. 2015;49(4):493–502. DOI: 10.1177/2168479015572370

20.

McNeil

Pauker

Sox

Jr Tversky

On the elicitation of preferences for alternative therapies. N Engl J Med. 1982;306(21):1259–62. DOI: 10.1056/NEJM198205273062103

21.

Waller

Whitaker

Winstanley

Power

Wardle

A survey study of women’s responses to information about overdiagnosis in breast cancer screening in Britain. Br J Cancer. 2014;111(9):1831–5. DOI: 10.1038/bjc.2014.482

22.

Wegwarth

Wagner

Gigerenzer

Can facts trump unconditional trust? Evidence-based information halves the influence of physicians’ non-evidence-based cancer screening recommendations. PLoS One. 2017;12(8):e0183024. DOI: 10.1371/journal.pone.0183024

23.

Zikmund-Fisher

Fagerlin

Ubel

A demonstration of “less can be more” in risk graphics. Med Decis Making. 2010;30(6):661–71. DOI: 10.1177/0272989x10364244

24.

McDowell

Gigerenzer

Wegwarth

Rebitschek

FG.

Effect of tabular and icon fact box formats on comprehension of benefits and harms of prostate cancer screening: a randomized trial. Med Decis Making. 2019;39(1):41–56. DOI: 10.1177/0272989X18818166

25.

Zikmund-Fisher

Witteman

Fuhrel-Forbis

Exe

Kahn

Dickson

Animated graphics for comparing two risks: a cautionary tale. J Med Internet Res. 2012;14(4):e106. DOI: 10.2196/jmir.2030

26.

Hawley

Zikmund-Fisher

Ubel

Jankovic

Lucas

Fagerlin

The impact of the format of graphical presentation on health-related knowledge and treatment choices. Patient Educ Couns. 2008;73(3):448–55. DOI: 10.1016/j.pec.2008.07.023

27.

Kasper

Heesen

Köpke

Mühlhauser

Lenz

Why not? Communicating stochastic information by use of unsorted frequency pictograms—a randomised controlled trial. Psychosoc Med. 2011;8:Doc08. DOI: 10.3205/psm000077

28.

Weinstein

Sandman

Hallman

WK.

Testing a visual display to explain small probabilities. Risk Anal. 1994;14(6):895–6. DOI: 10.1111/j.1539-6924.1994.tb00053.x

29.

Fraenkel

Peters

Tyra

Oelberg

Shared medical decision making in lung cancer screening: experienced versus descriptive risk formats. Medi Decis Making. 2015;36(4):518–25. DOI: 10.1177/0272989X15611083

30.

Cozmuta

Wilhelms

Cornell

Nolte

Reyna

Fraenkel

Influence of explanatory images on risk perceptions and treatment preference. Arthritis Care Res. 2018;70(11):1707–11. DOI: 10.1002/acr.23517

31.

Kaplan

Hammel

Schimmel

LS.

Patient information processing and the decision to accept treatment. J Soc Behav Pers. 1986;1(1):113–20.

32.

McIntosh

Minshall

Saya

, et al. Benefits and harms of selective oestrogen receptor modulators (SERMs) to reduce breast cancer risk: a cross-sectional study of methods to communicate risk in primary care. Br J Gen Pract. 2019;69(689):e836–42. DOI: 10.3399/bjgp19X706841

33.

Vass

Rigby

Payne

Investigating the heterogeneity in women’s preferences for breast screening: does the communication of risk matter?

Value Health. 2018;21(2):219–28. DOI: 10.1016/j.jval.2017.07.010

34.

Miles

Rodrigues

Sevdalis

The effect of information about false negative and false positive rates on people’s attitudes towards colorectal cancer screening using faecal occult blood testing (FOBt). Patient Educ Couns. 2013;93(2):342–9. DOI: 10.1016/j.pec.2013.06.010

35.

Fagerlin

Zikmund-Fisher

Ubel

PA.

“If I’m better than average, then I’m ok?”: comparative information influences beliefs about risk and benefits. Patient Educ Couns. 2007;69(1–3):140–4. DOI: 10.1016/j.pec.2007.08.008

36.

Steinhardt

Shapiro

MA.

Framing effects in narrative and non-narrative risk messages. Risk Anal. 2015;35(8):1423–36. DOI: 10.1111/risa.12368

37.

Schwartz

Perkins

Schmidt

Muriello

Althouse

Rawl

SM.

Providing quantitative information and a nudge to undergo stool testing in a colorectal cancer screening decision aid: a randomized clinical trial. Med Decis Making. 2017;37(6):688–702. DOI: 10.1177/0272989X17698678

38.

Cormier O’Connor

Boyd

Tritchler

Kriukov

Sutherland

Till

JE.

Eliciting preferences for alternative cancer drug treatments: the influence of framing, medium, and rater variables. Medi Decis Making. 1985;5(4):453–63. DOI: 10.1177/0272989X8500500408

39.

O’Connor

AM.

Effects of framing and level of probability on patients’ preferences for cancer chemotherapy. J Clin Epidemiol. 1989;42(2):119–26. DOI: 10.1016/0895-4356(89)90085-1

40.

Peng

Miao

Feng

Xiao

Five different types of framing effects in medical situation: a preliminary exploration. Iran Red Crescent Med J. 2013;15(2):161–5.

41.

Llewellyn-Thomas

McGreal

Thiel

EC.

Cancer patients’ decision making and trial-entry preferences: the effects of “framing” information about short-term toxicity and long-term survival. Med Decis Making. 1995;15(1):4–12. DOI: 10.1177/0272989X9501500103

42.

Sladakovic

Jansen

Hersch

Turner

McCaffery

The differential effects of presenting uncertainty around benefits and harms on treatment decision making. Patient Educ Couns. 2016;99(6):974–80. DOI: 10.1016/j.pec.2016.01.009

43.

Woloshin

Schwartz

LM.

Communicating data about the benefits and harms of treatment: a randomized trial. Ann Intern med. 2011;155(2):87–96. DOI: 10.7326/0003-4819-155-2-201107190-00004

44.

Gaissmaier

Wegwarth

Skopec

Müller

Broschinski

Politi

Numbers can be worth a thousand pictures: individual differences in understanding graphical and numerical representations of health-related information. Health Psychol. 2012;31(3):286–96. DOI: 10.1037/a0024850

45.

Tait

Voepel-Lewis

Brennan-Martinez

McGonegal

Levine

Using animated computer-generated text and graphics to depict the risks and benefits of medical treatment. Am J Med. 2012;125(11):1103–10. DOI: 10.1016/j.amjmed.2012.04.040

46.

Veldwijk

Lambooij

van Til

Groothuis-Oudshoorn

CGM

Smit

de Wit

GA.

Words or graphics to present a discrete choice experiment: does it matter?

Patient Educ Couns. 2015;98(11):1376–84. DOI:10.1016/j.pec.2015.06.002

47.

Benda

Ancker

Sharma

, et al. How difference tasks are affected by probability format, part 2: a Making Numbers Meaningful systematic review. under review. MDM Policy Pract. In press.

48.

Ancker

Benda

Sharma

, et al. How point (single-probability) tasks are affected by probability format, part 2: a Making Numbers Meaningful systematic review. MDM Policy Pract. In press.

49.

Ancker

Benda

Sharma

, et al. How point (single-probability) tasks are affected by probability format, part 1: a Making Numbers Meaningful systematic review. MDM Policy Pract. In press.

50.

Benda

Ancker

Sharma

, et al. How difference tasks are affected by probability format, part 1: a Making Numbers Meaningful systematic review. MDM Policy Pract. 2025;10(1):23814683241294077. DOI: 10.1177/23814683241294077

How Synthesis Tasks Are Affected by Probability Format: A Making Numbers Meaningful Systematic Review

Abstract

Highlights

Keywords

Methods

Results

Effects of Different Formats on Ability to Identify or Recall Information for Synthesis Tasks (Identification Outcome): Section 1

Comparisons between numerical formats on the ability to identify or recall numbers (subsection 1A)

Effects of Different Formats on Ability to Identify Largest or Smallest Number for Synthesis Tasks (Contrast Outcome): Section 2

Comparisons between numerical formats on the ability to identify largest or smallest number (subsection 2A)

Comparisons between graphical formats’ effect on ability to identify largest or smallest number (subsection 2B)

Comparisons between numerical and graphical formats’ effect on the ability to identify largest or smallest sets of numbers (subsection 2C)

Effects of Different Formats on Effectiveness Perceptions and Effectiveness Feelings: Section 6

Comparisons between number formats on effectiveness perceptions and effectiveness feelings (subsection 6A)

Comparisons between numerical and verbal probabilities on effectiveness perceptions and effectiveness feelings (subsection 6D)

Effect of adding elements for context on effectiveness perceptions and effectiveness feelings (subsection 6E)

Effects of Framing on Effectiveness Perceptions and Effectiveness Feelings (Subsection 6F)

Effects of Different Formats on Behavior or Behavioral Intention: Section 7

Comparisons between number formats in their effect on behavior or behavioral intention (subsection 7A)

Comparisons between graphic formats in effect on behavior or behavioral intention (subsection 7B)

Comparisons between numerical and graphical formats in effect on behavior or behavioral intention (subsection 7C)

Comparisons between numerical and verbal probabilities in effect on behavior or behavioral intention (subsection 7D)

Effect of adding elements for context on behavior or behavioral intention (subsection 7E)

Effects of gain-loss framing on behavior or behavioral intentions (subsection 7F)

Effect of stating or illustrating numerical uncertainty on behavior or behavioral intentions (subsection 7G)

Effect of animation or interactivity on behavior or behavioral intentions (subsection 7I)

Effects of Different Formats on Trust in the Information: Section 8

Comparisons between numerical formats on trust in sets of numbers (subsection 8A)

Preferences for Formats (Preference Outcome): Section 9

Preferences for different number formats (subsection 9A)

Preferences for different graphic formats (subsection 9B)

Preferences for numerical versus graphical formats (subsection 9C)

Preferences for gain-loss framing (subsection 9F)

Preferences for animation or interactivity (subsection 9I)

Summary of Evidence

Discussion

Footnotes

Acknowledgements

ORCID iDs

Availability of Research Resources

References