Sage Journals: Discover world-class research

Abstract

Simultaneous confidence intervals that are compatible with a given closed test procedure are often non-informative. More precisely, for a one-sided null hypothesis, the bound of the simultaneous confidence interval can stick to the border of the null hypothesis, irrespective of how far the point estimate deviates from the null hypothesis. This has been illustrated for the Bonferroni-Holm and fall-back procedures, for which alternative simultaneous confidence intervals have been suggested, that are free of this deficiency. These informative simultaneous confidence intervals are not fully compatible with the initial multiple test, but are close to it and hence provide similar power advantages. They provide a multiple hypothesis test with strong family wise error rate control that can be used in replacement of the initial multiple test. The current paper extends previous work for informative simultaneous confidence intervals to graphical test procedures. The information gained from the newly suggested simultaneous confidence intervals is shown to be always increasing with increasing evidence against a null hypothesis. The new simultaneous confidence intervals provide a compromise between information gain and the goal to reject as many hypotheses as possible. The simultaneous confidence intervals are defined via a family of dual graphs and the projection method. A simple iterative algorithm for the computation of the intervals is provided. A simulation study illustrates the results for a complex graphical test procedure.

Keywords

Simultaneous confidence intervals graphical testing procedure informative confidence intervals

1. Introduction

1.1. Informative simultaneous confidence intervals (iSCIs)

Numerous multiple comparison procedures are available for a large spectrum of settings, where more than one confirmatory assertion is desired so that the familywise error rate is controlled in the strong sense (see e.g. Hochberg and Tamhane¹ or Dickhaus² for an overview). Because of the increasing importance in practice, a (draft of the) EMA/CHMP Guideline on multiplicity issues in clinical trials³ has been published. Besides error control, it emphasizes the importance of obtaining clinically interpretable results by providing confidence intervals that “allow for consistent decision making with the primary hypothesis testing strategy.” The implementation of such a decision strategy is impeded by the fact that simultaneous confidence intervals (SCIs), which are both compatible with a given multiple test, and informative, are not easy to find. Compatibility means that a null hypothesis is rejected if and only if it is excluded from the confidence interval.

We will call a confidence interval informative, if the information provided by the interval always increases (and only stays constant in the case that all gatekeepers for a hypothesis were not rejected) with increasing evidence against the corresponding null hypothesis. As a consequence, when a null hypothesis is rejected, the confidence interval will have a non-zero distance to the null hypotheses, except for the singular and usually negligible event that the corresponding (un-adjusted) p-value is equal to its final local level. Hence, informative confidence intervals almost always provide additional information to the mere hypothesis test.

To be more formal, let us consider $m$ null hypotheses $H_{j} : θ_{j} \leq 0$ for $j = 1, \dots, m$ . We are interested in rejecting as many hypotheses as possible and at the same time obtain iSCIs. By the possibilities of shifting, inverting and intersecting one-sided intervals, we restrict ourselves to left-sided hypotheses, without loss of generality. The SCIs are then $m$ -dimensional rectangles that are bounded from the left, that is, $SCI = (L_{1}, \infty) \times \dots \times (L_{m}, \infty)$ . They are compatible with the rejection decisions if for $i = 1, \dots, m$ , the null hypothesis $H_{i}$ is rejected if and only if $L_{i} \geq 0$ . We now give a formal definition for an iSCI.

Definition 1
We call a SCI given by lower bounds $L := (L_{1}, \dots, L_{m})$ informative if (a)
$L_{i} > - \infty$ whenever $H_{i}$ has no gatekeeper or for at least one gatekeeper $H_{j}$ for $H_{i}$ we have $L_{j} > 0$ ;
(b)
$L_{i} (X^{'}) > L_{i} (X)$ , if the following holds for two data sets $X^{'}$ and $X$ : (i)
$X^{'}$ provides more evidence against $H_{i}$ than $X$ , and
(ii)
for all $j \neq i$ the evidence against $H_{j}$ is stronger in $X^{'}$ or the same in both data sets, and
(iii)
$L_{i} (X) > - \infty$ .

Remark 2
(i)
The restriction in (a) of Definition 1 corresponds to the case where rejection of a hypothesis $H_{i}$ is only of interest when another or several other hypotheses (the “gatekeepers”) are rejected. If the gatekeeper(s) cannot be rejected, then $H_{i}$ and its parameter $θ_{i}$ are not considered at all and $L_{i} = - \infty$ is unavoidable.
(ii)
Note, that in the case where for both data sets $X$ and $X^{'}$ no gatekeeper for $H_{i}$ has been rejected, iSCIs are permitted to provide $L_{i} (X) = L_{i} (X^{'}) = - \infty$ .
(iii)
We have intentionally left point (b) in Definition 1 somewhat informal, namely with regard to meaning of the statement “increasing evidence against $H_{i}$ .” The mathematical definition of this must depend on the statistical model and hypothesis under investigation. When the test statistic and p-value is based on an estimate ${\hat{θ}}_{i}$ of the parameter $θ_{i}$ in $H_{i} : θ_{i} \leq 0$ , then the evidence against $H_{i}$ increases with increasing estimate ${\hat{θ}}_{i}$ .
(iv)
Let us assume that the evidence against $H_{i}$ increases with increasing ${\hat{θ}}_{i}$ . Let us further assume that the conditional distribution of the estimate ${\hat{θ}}_{i}$ given the other estimates ${\hat{θ}}_{j}$ , $i \neq j$ is continuous, that is, has a conditional Lebesgue density. Then $L_{i} = 0$ will occur only with probability zero and we get $L_{i} > 0$ whenever $H_{i}$ is rejected. This is the case, for instance, when the $m$ estimates ${\hat{θ}}_{1}, \dots, {\hat{θ}}_{m}$ are multivariate normal.
(v)
We will see (and explicitly state) below that for our method and theory to apply, we need the existence of p-values $p_{i} (μ_{i})$ for all shifted null hypotheses $H_{i}^{μ_{i}} : θ_{i} \leq μ_{i}$ , which are all strictly decreasing with increasing evidence against $H_{i}$ . This is usually the case when all these p-values are based on the same estimate ${\hat{θ}}_{i}$ for $θ_{i}$ . We will present a typical example in Section 2.
(vi)
The property, that $L_{i} > 0$ whenever $H_{i}$ is rejected, has been used by Brannath and Schmidt⁴ and Schmidt and Brannath^5,6 as the defining feature of an iSCI. It has been illustrated by simulations with multivariate normal estimates and formally verified by Schmidt and Brannath⁵ under the assumption of continuously distributed p-values. As noted by Schmidt and Brannath,⁵ with non-continuous estimates and p-values, even classical confidence bounds do not meet this definition, that is, can be equal to the border of the null hypothesis with positive probability. Definition 1 has the advantage to be independent of the (conditional) distribution of the estimates and p-values and to apply (as far as we can see) to all classical confidence bounds including those of single-step SCIs.

Strassburger and Bretz⁷ and Guilbaud^8,9 have proposed SCIs for a large class of stepwise procedures. These SCIs are not always informative, for example, if not all hypotheses are rejected then the confidence intervals of rejected hypotheses equal the whole alternative hypotheses, irrespective of how much the point estimate points into the alternative hypothesis. Hence, they contradict Definition 1 and they are only of limited use, because they do not provide any more information than the rejection itself. Guilbaud¹⁰ proposes SCIs that can be more informative than the rejection itself in certain scenarios for some of the hypotheses studied. However, they also do not meet Definition 1. The recommendation of the (draft of the) EMA Guideline on multiplicity issues in clinical trials concerning this conflict of interest is the following: “it is advised to use simple but conservative confidence interval methods, such as Bonferroni-corrected intervals.” This is comprehensive with regard to the wish of having intervals that do not lead to misinterpretation, which is of greatest importance in practice. On the other hand, there is the need for a compromise, because the recommended intervals are either compatible with the—conservative—hypothesis test, or informative but not compatible with a more complex test. Since SCIs contain always more information than pure rejection decisions, a natural way out of this conflict is to construct the multiple testing procedure by directly defining informative simultaneous confidence bounds $L_{1}, \dots, L_{m}$ . These bounds are naturally consistent with the multiple test, which, by definition, rejects a null hypothesis $H_{i}$ if and only if $L_{i} \geq 0$ .

Based on this idea, Brannath and Schmidt⁴ have constructed informative SCIs, which are always informative and uniformly more powerful than the Bonferroni procedure with regard to the number of rejected hypotheses. They can be seen as a compromise between the powerful Bonferroni-Holm procedure and the informative Bonferroni procedure. Similar procedures were proposed for the hierarchical and the fallback test by Schmidt and Brannath.^5,6 All these procedures belong to the class of graphical test procedures introduced by Bretz et al.¹¹ which permit to account for preferences and hierarchies among the different null hypotheses $H_{i}$ , $i = 1, \dots, m$ . In this article, we extend the previous idea to SCIs that are based on a given graphical test as defined by Bretz et al.¹¹ We will see that the rejections made with the original graph cannot be exactly reproduced by our procedure. In contrast, the proposed confidence bounds will always be informative for all hypotheses which do not have gatekeepers, which, to our knowledge, is not possible for the original procedure. Actually, we can find a trade-off between the number of rejections done by the original graph and the expected size of the confidence bounds by choosing the involved information weight $q$ defined later accordingly. The new confidence intervals are defined via a continuous family of graphical tests that are derived from the original one. They can numerically be calculated by an extension of the algorithm by Bretz et al.¹¹

In the following subsection, we will recall the definition of the graphical procedures from Bretz et al.,¹¹ which sets the basic notation for the new approach. In Section 2, we introduce the new SCIs, first by a formal definition, which motivates the approach, and subsequently by an iterative algorithm, which can be implemented numerically. We will see that both perspectives yield the same SCIs. In Section 3, we link the extended setting to our previous SCIs proposed for the hierarchical, fallback, and Bonferroni-Holm procedure (Brannath and Schmidt, 2014; Schmidt and Brannath, 2014, 2015), and we give an example of a more complex graph where the advantages of the new approach are demonstrated. We conclude with a discussion in Section 4.
1.2. Graphical multiple test procedures

A multiple testing procedure by Bretz et al.¹¹ is given by a graph $G^{0}$ which consists of

initial local levels $α_{1}, \dots, α_{m} \geq 0$ for the $m$ hypotheses such that $\sum_{j = 1}^{m} α_{j} = α$ , where $α$ is the predefined significance level,

a transition matrix $(g_{i j})_{i, j = 1, \dots, m}$ , where $g_{i j}$ is the weight by which the level of the $i$ th hypothesis is shifted after its rejection to the $j$ th hypothesis.

An example is given in Figure 1. This graph is given by Bretz et al.¹¹ and is an example for a step-down test without order constraint from Bauer et al.¹² Three treatments are compared with respect to efficacy and safety so that altogether six hypotheses are tested. The significance level is equally split across the three efficacy hypotheses, that is,

α_{E_{1}} = α_{E_{2}} = α_{E_{3}} = α / 3

and

α_{S_{1}} = α_{S_{2}} = α_{S_{3}} = 0

. Safety of a treatment is tested only after the respective efficacy assertion has been shown. Hence, for all

i = 1, 2, 3

H_{E_{i}}

is a gatekeeper for

H_{S_{i}}

. There is no hierarchy between the three treatments. The corresponding transition matrix is

(\begin{matrix} 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0.5 & 0.5 & 0 & 0 & 0 \\ 0.5 & 0 & 0.5 & 0 & 0 & 0 \\ 0.5 & 0.5 & 0 & 0 & 0 & 0 \end{matrix})

whereby the first three components belong to the efficacy hypotheses and the last three to safety.

Figure 1.

Example for a complex graphical procedure (Figure 8 from Bretz et al., 2009). $H_{E_{j}}$ is an efficacy hypothesis for the $j$ th treatment and $H_{S_{j}}$ is a safety hypothesis for the $j$ th treatment.

The graphical algorithm is as follows: If a hypothesis $H_{i}$ can be rejected, its level is allocated to the other hypotheses according to the transition weights $g_{i j}$ , that is,

α_{j}^{new} = α_{j} + α_{i} g_{i j}, j \neq i

Arrows going from and to

H_{i}

are deleted and transition weights of the other arrows are updated as follows:

g_{j l}^{new} = {\begin{cases} \frac{g_{j l} + g_{j i} g_{i l}}{1 - g_{j i} g_{i j}} & if j, l \neq i, j \neq l, g_{j i} g_{i j} \neq 1 \\ 0 & else \end{cases}

(1)

In the next step, the remaining hypotheses are tested with their new local level. The graph is updated upon each rejection, until no more rejections are possible. It has been shown by Bretz et al.¹¹ that the set of rejected hypotheses is independent of the order in which these hypotheses are rejected. It was also proven that the procedure controls the familywise error rate at level

α = \sum_{j = 1}^{m} α_{j}

2. The method

2.1. Definition of confidence bounds

Given observations (e.g. from a clinical trial), we assume that we have local p-values $p_{j}$ , $j = 1, \dots, m$ , for the null hypotheses $H_{j} = H_{j}^{0} : θ_{j} \leq 0$ . Further, our observations allow us also to define p-values $p_{j} (μ_{j})$ for the shifted hypotheses $H_{j}^{μ_{j}} : θ_{j} \leq μ_{j}$ . For example, if the estimate $\hat{θ}$ is (approximately) Gaussian, then $p_{j} = p_{j} (0) = 1 - Φ ({\hat{θ}}_{j} / {SE}_{j})$ for some standard error ${SE}_{j}$ . If the standard error ${SE}_{j}$ is independent from $θ$ , then a natural p-value for testing $H_{j}^{μ_{j}}$ is

p_{j} (μ_{j}) = 1 - Φ (\frac{{\hat{θ}}_{j} - μ_{j}}{{SE}_{j}})

(2)

Usually, the p-values are strictly increasing and continuous in

μ_{j}

, such that

lim_{μ_{j} \to - \infty} p_{j} (μ_{j}) = 0

and

lim_{μ_{j} \to \infty} p_{j} (μ_{j}) = 1

. Moreover, as mentioned earlier, we will assume that for all

μ_{j} \in R

the p-values

p_{j} (μ_{j})

strictly decrease with increasing evidence against

H_{j}

. This is obviously the case for the p-values in (2), since they decrease with increasing

{\hat{θ}}_{j}

. Note that the mentioned properties apply also to the shifted p-values from

t

-distributed test statistics when the non-centrality parameter increases with increasing

μ_{j}

(which is usually the case).

Our starting point is a graph like the one given in Section 1.2, that is, we have initial levels $α_{1}, \dots, α_{m} \geq 0$ , $\sum_{j = 1}^{m} α_{j} = α$ and a transition matrix $(g_{i j})_{i, j = 1, \dots, m}$ . Our goal is to obtain SCIs that: (i) reflect the structure of the testing procedure given by the graph, and (ii) are always informative (in the sense of Definition 1). Throughout this article, we will assume that the graph is complete in the following sense.

Assumption 1

We assume that for all $i = 1, \dots, m$ , the transition weights starting from $H_{i}$ sum up to one, that is, $\sum_{j = 1}^{m} g_{i j} = 1$ (and that all $g_{i i} = 0$ ).

We will explain in Remark 8 how our procedure can be adapted if the graph is not complete.

Basic projection method

By modifying the given graph, we construct weighted Bonferroni tests for the intersection of shifted hypotheses $H^{μ} = H_{1}^{μ_{1}} \cap \dots \cap H_{m}^{μ_{m}}$ for each $μ = (μ_{1}, \dots, μ_{m}) \in R^{m}$ . We will reject the intersection hypothesis $H^{μ}$ globally if and only if at least one of the hypotheses $H_{j}^{μ_{j}}$ can be rejected at its local level $α_{j}^{μ}$ , where $\sum_{j = 1}^{m} α_{j}^{μ} = α$ for all $μ$ . We can then define the $m$ -dimensional confidence set
$C = {μ \in R^{m} : H^{μ} is not rejected} = {μ \in R^{m} : min_{\begin{matrix} j = 1, \dots, m \\ α_{j}^{μ} > 0 \end{matrix}} p_{j} (μ_{j}) / α_{j}^{μ} > 1}$
By construction, the coverage probability of $C$ is at least $(1 - α)$ .

We next construct the simultaneous confidence bounds $L_{j}$ , $j = 1, \dots, m$ , by projection, that is, the SCI $(L_{1}, \infty) \times \dots \times (L_{m}, \infty)$ is the smallest rectangle containing the open set $C$ . Since it contains $C$ , its coverage probability is not smaller than $1 - α$ .

Dual graphs and resulting weighted Bonferroni tests

For each $μ \in R^{m}$ we will construct local levels $α_{1}^{μ}, \dots, α_{m}^{μ}$ summing up to $α$ . They will depend on a parameter $q \in (0, 1)$ , which we will call information weight. We explain the significance of $q$ in Section 2.3.

For given $μ \in R^{m}$ , the local levels are constructed in two steps: in the first step, we define a dual graph that contains all shift null hypotheses $H_{j}^{μ_{j}}$ and some (not necessarily all) initial null hypotheses as nodes; in the second step, we reject all initial null hypotheses in this graph to obtain the levels $α_{j}^{μ}$ for $H_{j}^{μ_{j}}$ , $j = 1, \dots, m$ .

First step. We define a new graph $G^{μ}$ by modifying the given graph $G$ as follows:
for all $j$ with $μ_{j} \leq 0$ : –
delete all paths starting at $H_{j}$ , that is, set $g_{j i}^{μ} = 0$ for all $i$ ;
–
replace $H_{j}$ by $H_{j}^{μ_{j}}$ .

for all $j$ with $μ_{j} > 0$ : –
add a node for the hypothesis $H_{j}^{μ_{j}}$ to the graph with local level $0$ ;
–
introduce an arrow from $H_{j}$ to $H_{j}^{μ_{j}}$ with transition weight $q^{μ_{j}}$ ;
–
change all transition weights starting from $H_{j}$ to $g_{j i}^{μ} = g_{j i} (1 - q^{μ_{j}})$ .

The resulting graph $G^{μ}$ contains $m$ nodes with the starting levels $α_{1}, \dots, α_{m}$ plus up to $m$ nodes with initial level zero. Note that all shifted hypotheses $H_{j}^{μ_{j}}$ are contained in the graph. It is still a valid graph within the framework of Bretz et al.,¹¹ that is, the row sums of the transition matrix are equal to $1$ . Figure 2 shows an example of a graph $G$ and the resulting graph $G^{μ}$ .

Figure 2.
From the original graph (a) to a graph (b), where the shifted hypotheses $H_{1}^{μ_{1}}$ and $H_{2}^{μ_{2}}$ with $μ_{1}, μ_{2} > 0$ are added, while the null hypothesis $H_{3}$ is transformed to the shifted hypothesis $H_{3}^{μ_{3}}$ with $μ_{3} \leq 0$ : (a) original graph $G$ and (b) modified graph $G^{μ}$ .

The rationale for the graph $G^{μ}$ is that we can imagine it as the test for a hypothesis $H^{μ} = H_{1}^{μ_{1}} \cap \dots \cap H_{m}^{μ_{m}}$ that has not yet been rejected. If $μ_{j} \leq 0$ for some $j$ , then also the null hypothesis $H_{j} = H_{j}^{0}$ has not yet been rejected and, therefore, full level has to be given to $H_{j}^{μ_{j}}$ and no transfer to other hypotheses takes place. If, however, $μ_{j} > 0$ , then testing $H_{j}^{μ_{j}}$ increases the information on $μ_{j}$ after the null hypothesis $H_{j}$ has been rejected. To this end, the graph contains $H_{j}$ and $H_{j}^{μ_{j}}$ with a non-zero transition weight (namely $q^{μ_{j}}$ ) from $H_{j}$ to $H_{j}^{μ_{j}}$ .

Second step. We reject in $G^{μ}$ all null hypotheses $H_{j} = H_{j}^{0}$ , performing the update algorithm of Bretz et al.¹¹ The final graph contains only the shifted hypotheses $H_{j}^{μ_{j}}$ with some new levels $α_{j}^{μ}$ and no arrows between these hypotheses. We will explain in the Appendix why the local levels $α_{j}^{μ}$ always satisfy
$\sum_{j = 1}^{m} α_{j}^{μ} = α$
(3)
Basically, this is because we started with a valid graph $G^{μ}$ and performed only rejections according to Bretz et al.,¹¹ and because of Assumption 1. However, some more technical issues have to be discussed for a proof. To summarize, we have obtained a level- $α$ test for each $μ \in R^{m}$ , which rejects $H^{μ}$ if at least one of the local p-values satisfies $p_{j} (μ_{j}) \leq α_{j}^{μ}$ .

According to the projection method described at the beginning of this section, the new lower simultaneous confidence bounds are then defined as follows.
Definition 3
Given the p-values $p_{j} (μ_{j})$ , $μ \in R^{m}$ and $j = 1, \dots, m$ , the initial graph $G^{0}$ , we define, based on the local levels $α_{j}^{μ}$ resulting from the above two steps with the dual graph $G^{μ}$ , the lower simultaneous bounds
$L_{j} = max {μ_{j} : p_{j} (μ_{j}) \leq α_{j}^{μ^{'}} for all μ^{'} \in R^{m} with μ_{j}^{'} = μ_{j}}, j = 1, \dots, m$
(4)
which have simultaneous coverage probability of at least $1 - α$ by construction.
2.2. Numerical algorithm to calculate the confidence bounds

We now prove a property of the local levels $α_{j}^{μ}$ , which allows us to obtain the bounds $L_{j}$ by a simple numerical algorithm without the need to test the continuous family of intersection hypotheses $H^{μ}$ for $μ \in R^{m}$ and calculate the maximum in (4).

Proposition 4
For all $μ = (μ_{1}, \dots, μ_{m}) \in R^{m}$ , the local levels $α_{j}^{μ}$ derived in Section 2.1 are of the form
$α_{j}^{μ} = q^{μ_{j} \lor 0} ν_{j} (μ) α$
where $ν_{j} : R^{m} \to R_{\geq 0}$ is continuous and non-decreasing in each component.
Proof.
We know or can see from Bretz et al.¹¹ and the up-date algorithm therein that the graphical test algorithm has the following properties:
The order in which hypotheses are rejected does not influence the local levels of the final graph.

After every rejection step, the new transition weights are continuous and non-decreasing functions of the old transition weights.

After every rejection step, the new level of a hypothesis and the new weights starting from this hypothesis are independent of the old levels of any other, non-rejected hypothesis and independent of all old transition weights that go from or to the other, non-rejected hypotheses.

The new level of a hypothesis is a linear combination of $α_{1}, \dots, α_{m}$ .
Fix $j$ and $μ$ . As already described, the level $α_{j}^{μ}$ for $H_{j}^{μ_{j}}$ arises from rejecting all null hypotheses $H_{k} = H_{k}^{0}$ in the graph $G^{μ}$ . If $μ_{j} \leq 0$ , then only $H_{k}^{0}$ with $k \neq j$ are rejected. By properties 2. and 3., the final local level for $H_{j}^{μ_{j}}$ is then a continuous and non-decreasing function of one or more $g_{k l} (1 - q^{μ_{k}})$ for $k \neq l$ , and by property 4., it is a multiple of $α$ . Hence, it is of the form $ν_{j} (μ) α$ for a continuous and non-decreasing function $ν_{j}$ .

If $μ_{j} > 0$ , then by property 1., we can w.l.o.g. reject at first all $H_{k}$ with $k \neq j$ and finally reject $H_{j}$ . This last step multiplies the level $ν_{j} (μ) α$ by $q^{μ_{j}}$ so that the desired representation is obtained.

We remind that $H^{μ} = H_{1}^{μ_{1}} \cap \dots \cap H_{m}^{μ_{m}}$ is rejected if, for some $j$ , we have $p_{j} (μ_{j}) \leq α_{j}^{μ}$ , that is, by Proposition 4, if
$p_{j} (μ_{j}) q^{- (μ_{j} \lor 0)} \leq ν_{j} (μ) α$
(5)

Note that both, the left-hand side and the right-hand side of (5), are non-decreasing in $μ_{j}$ . Based on this property, we can define an iterative algorithm for the calculation of the lower confidence bounds.
Algorithm 1
Find a starting vector $μ^{(0)} \in R^{m}$ such that
$p_{j} (μ_{j}^{(0)}) q^{- (μ_{j}^{(0)} \lor 0)} \leq ν_{j} (μ^{(0)}) α for all j = 1, \dots, m$
(6)
Given $μ^{(k)}$ , define $μ^{(k + 1)}$ as solution of
$p_{j} (μ_{j}^{(k + 1)}) q^{- (μ_{j}^{(k + 1)} \lor 0)} = ν_{j} (μ^{(k)}) α for all j = 1, \dots, m$
(7)
Stop if $‖ μ^{(k + 1)} - μ^{(k)} ‖_{2} < ε$ for some predefined threshold $ε > 0$ , where $‖ v ‖_{2} := \sqrt{\sum_{j = 1}^{m} v_{j}^{2}}$ for $v = (v_{1}, \dots, v_{m}) \in R^{m}$ is the Eucledian norm.

By continuity and monotonicity, the first iterated value $μ^{(1)}$ exists and is larger than (or equal to) $μ^{(0)}$ component-wise (i.e. $μ_{j}^{(1)} \geq μ_{j}^{(0)}$ $\forall j = 1, \dots, m$ ). Since (7) is satisfied and $ν_{j}$ is non-decreasing, it follows that (5) holds also for $μ^{(1)}$ . By induction, we obtain that $(μ^{(k)})_{k \geq 1}$ is a non-decreasing sequence of vectors satisfying (5). Because of the monotony of $ν_{j}$ , the sequence $(μ^{(k)})_{k \geq 1}$ is bounded from above by $p_{j}^{- 1} (α_{j})$ , $j = 1, \dots, m$ . Therefore, the sequence converges and the algorithm stops finally. We denote by $μ^{\infty}$ the limiting value. In particular, if for some $k \in N_{0}$ equality (7) holds for $μ^{(k + 1)} = μ^{(k)}$ , then the algorithm stops with $μ^{\infty} = μ^{(k)}$ .

Our next goal is to show that $μ^{\infty}$ is equal to the bounds (4) of Definition 3. This results from the following theorem, which we prove in the Appendix.
Theorem 5
The following properties are satisfied in Algorithm 1: (a)
If $μ^{(0)} \leq L$ (component-wise), then also $μ^{(k)} \leq L$ for all $k \in N$ .
(b)
The limiting value is independent of the starting value of the algorithm.
(c)
For any given starting value $μ^{(0)}$ , the limiting value satisfies $μ^{\infty} \geq L$ .
We obtain from (a) to (c) that $μ^{(k)}$ converges to the confidence bounds $L = (L_{1}, \dots, L_{m})$ defined in (4), if we find a valid starting value of Algorithm 1.
Remark 6 How to find a starting value

As a starting value $μ^{(0)} \leq L$ we can choose $μ_{j}^{(0)} = min {0, p_{j}^{- 1} (α_{j})}$ , where we formally put $p_{j}^{- 1} (0) = - \infty$ . This vector obviously satisfies (6) and $μ^{(0)} \in [- \infty, 0]^{m}$ with $μ_{j}^{(0)} > - \infty$ whenever $α_{j} > 0$ .

2.3. Properties of the confidence bounds

The main property of the new confidence bounds is that they are always informative in the sense of Definition 1. This is an essential advantage over the SCIs proposed by Strassburger and Bretz.⁷

Proposition 7
The simultaneous confidence bounds obtained by Algorithm 1 (or equivalently, by projection as described in Section 2.1) satisfy the conditions (a) and (b) of Definition 1, that is, they are always informative.
Proof.
We start showing (a) of Definition 1. As discussed in Remark 6, there exists a starting value $μ_{j}^{(0)} > - \infty$ if $α_{j} > 0$ , hence $L_{j} > - \infty$ by Theorem 5.

If $α_{i} = 0$ , the starting value was set to $μ_{i}^{(0)} = - \infty$ . If for some $k \geq 1$ and $j \neq i$ , a positive level is shifted from $H_{j}$ to $H_{i}^{μ_{i}^{(k)}}$ in the graph $G^{μ^{(k)}}$ , then $α_{i}^{μ^{(k)}} = ν_{i} (μ^{(k)}) α > 0$ . By solving (7) in Algorithm 1 we obtain $μ_{i}^{(k + 1)} > - \infty$ and by (a) of Theorem 5 that also $L_{i} > - \infty$ . Hence, whenever at least one $H_{j}$ , $j \neq i$ , is rejected and, as a consequence, positive level is passed to $H_{i}$ , then we will obtain $L_{j} > - \infty$ . This shows property (a).

We consider now property (b). Let $L$ be the vector of confidence bounds defined in (4) for an arbitrary, but given data set. By Definition 3 and Proposition 4, we have $p_{j} (L_{j}) \leq α_{j}^{L} = q^{L_{j} \land 0} ν_{j} (L) α$ for all $j = 1, \dots, m$ . Assume now that, by a change in the data, the evidence against $H_{i}$ is increased while it remains constant or also increases for all $H_{j}$ with $j \neq i$ . By our assumptions on the individual shifted p-values, this implies that $p_{i} (L_{i})$ is strictly decreased while all other $p_{j} (L_{j})$ remain constant or are decreased as well. Hence, with the new data, we obtain $p_{i} (L_{i}) < α_{i}^{L} = q^{L_{i} \land 0} ν_{i} (L) α$ and $p_{j} (L_{j}) \leq α_{j}^{L} = q^{L_{j} \land 0} ν_{j} (L) α$ for all $j \neq i$ . As a consequence, $μ^{(0)} = L$ can serve as starting point for the calculation of the new confidence bounds $L^{'}$ for the changed data with Algorithm 1. According to (7) and the monotonicity assumption of $p_{i} (μ_{i})$ in $μ_{i}$ , the first step of the algorithm results in $μ_{i}^{(1)} > μ_{i}^{(0)} = L_{i}$ , and by (a) of Theorem 5, the new confidence bound $L_{i}^{'}$ must be strictly larger than $L_{i}$ . Therefore, $L_{i}$ increases with increasing evidence against $H_{i}$ when the evidence against the other $H_{j}$ remains the same or increases as well.

To assure the informativeness of the SCIs, we pay a price in terms of a slightly reduced expected number of rejections to the underlying original graphical procedure. We can, however, control the desired power by the choice of the information weight $q \in (0, 1)$ . Larger values of $q$ are in favor of sharper confidence bounds, while smaller values of $q$ lead to more rejections. This effect will be illustrated in more detail for some examples in Section 3. The boundary case $q = 1$ yields the weighted Bonferroni intervals with weights $α_{1}, \dots, α_{m}$ .

One can also generalize the approach by choosing individual information weights $q_{1}, \dots, q_{m} \in (0, 1)$ , depending on the importance of a large bound $L_{j}$ compared to rejecting as many hypotheses as possible. The method may even be applied with any positive, non-increasing, continuous function $Q_{j} (μ_{j})$ (replacing $q^{μ_{j} \lor 0}$ ) for $j = 1, \dots, m$ that is equal to $1$ for $μ_{j} \leq 0$ and tends to $0$ for $μ_{j} \to \infty$ . All arguments concerning the construction and properties of the SCIs work for functions $Q_{j} (μ_{j})$ with these characteristics, as well.
Remark 8
Assumption 1 can be weakened so that the approach is also applicable for graphical procedures with non-complete graphs. We only need to adapt the transition weights in $G^{μ}$ for arrows from $H_{j}$ to $H_{j}^{μ_{j}}$ if $μ_{j} > 0$ . Instead of $q^{μ_{j}}$ , we define the weight
$1 - (1 - q^{μ_{j}}) \sum_{i = 1}^{m} g_{j i}$
which reduces to $q^{μ_{j}}$ if the original graph is complete. It can be easily seen that $G^{μ}$ is a complete graph so that (3) is satisfied. The expression $q^{- μ_{j} \lor 0}$ in Algorithm 1 has then to be modified accordingly. As stated above, all arguments concerning the properties of the resulting SCIs remain valid with this modified weight function.

Alternatively to this modification, one could of course also add arrows in the original graph so that Assumption 1 is satisfied. This can be done if one wants to increase power for certain hypotheses rather than improve confidence assertions. However, the latter is not always possible, for example, in hierarchical testing.
3. Examples

3.1. Bonferroni-Holm procedure

The graph in Figure 2(a) represents the weighted Bonferroni-Holm procedure for three hypotheses. If $g_{j k} = 1 / 2$ for $j, k = 1, 2, 3$ , $j \neq k$ , then the unweighted Holm test evolves. A generalization for $m$ instead of three hypotheses is straightforward. The SCIs of Strassburger and Bretz⁷ compatible with this procedure have the drawback that the bounds for rejected hypotheses are only informative if all hypotheses are rejected. Simple SCIs are possible for the Bonferroni procedure which, however, is not as powerful as the Holm procedure. The SCIs presented here propose a compromise, which is more powerful than Bonferroni and produces always informative confidence bounds.

The same features hold for the approach of penalized SCIs introduced by Schmidt and Brannath⁴ where attention is restricted to the unweighted Holm procedure and specific union intersection tests. The penalized SCIs for the Holm test are constructed via dual weighted Bonferroni tests, where the weights are based on a so-called “penalization function.” The penalization function $λ_{i} (μ_{i}) = \exp (a μ_{i} \lor 0)$ was proposed, with $a > 0$ as adjusting parameter for the interpolation between the importance of sharp confidence bounds ( $a = 0$ corresponds to Bonferroni) and high power ( $a \to \infty$ corresponds to Holm). The information weight $q \in (0, 1)$ of the new approach has a similar meaning as $- \log (a)$ . Due to the different weighting scheme, the penalized SCIs are not exactly equal to the intervals introduced here. However, their properties are closely related, as illustrated by a simulation next.

The scenario of the simulation is the following: five null hypotheses (e.g. the effects of different treatments compared to placebo) are tested with equal weights. We assume a scenario where all true effects are equally large. Other scenarios led to similar results. The significance level is $α = 2.5 %$ and the standard errors in (2) are assumed to be $1$ . We further assume that the study is powered such that the probability to reject any individual hypothesis at significance level $α / 5$ is $80 %$ . We made $10, 000$ simulation runs and determined the confidence bounds for both approaches with several values of the parameters $q$ and $a$ , respectively. Figure 3 compares the two procedures with respect to their trade-off between the mean confidence bound—which is high for larger information weight $q$ resp. smaller $a$ , and the average number of rejected hypotheses—which is high for smaller $q$ resp. larger $a$ . We see that the trade-off curves are almost the same for both approaches.

Figure 3.

Trade-off between mean confidence bound $L_{i}$ and average number of rejected hypotheses. Comparison between the new simultaneous confidence intervals (SCIs) for graphical procedures and the penalized SCIs with exponential penalizing function in dependence of different values of the information weight $q$ resp. $a$ . The true effects are $θ_{i} = c$ , $i = 1, \dots, 5$ , with $c > 0$ such that the Bonferroni test has power $80 %$ for each separate hypothesis at significance level $α = 0.025$ . The value $q = 1$ corresponds to the Bonferroni intervals.

3.2. Fixed sequence and fallback procedure

According to Schmidt and Brannath,^5,6 we have introduced iSCIs for the hierarchical and the fallback procedure, respectively. We show in the Appendix that Algorithm 1 produces exactly the same SCIs. We have discussed properties of these SCIs in our previous works. In particular, the SCIs are always informative. This holds for all graphical iSCIs, as we have shown in Proposition 7. The case of gatekeepers discussed in Definition 1 and Remark 6 is of particular relevance for the fixed sequence: If some $H_{i}$ with $i < j$ is accepted, then no level is shifted to test $H_{j}$ and hence $L_{j} = - \infty$ .

A nice feature of the iSCIs is that they somehow respect the ordering in the hierarchical and the fallback procedure, in the sense that there is no power loss compared to the original procedure for the first hypothesis, which is normally the most important one. The price that has to be paid for more information is a slight power loss for $H_{2}, \dots, H_{m}$ . As for the more general iSCIs introduced here, one has some control over the information–power trade-off by the choice of the information weight $q$ (see Section 2.3).

3.3. A clinical trial example

We consider now a hypothetical clinical trial example that is in line with the RELY trial, reported by Connolly et al.¹³ In this trial, two doses of the thrombin inhibitor dabigatran were compared to warfarin (active control) in a randomized and semi-blinded, multi-arm clinical trial. The primary treatment goal is the risk reduction of strokes or systematic embolisms in patients with atrial fibrillation. There are a primary efficacy and a safety parameter (both hazard ratios). The data indicated that the lower dabigatran dose is non-inferior and the higher dose even superior to warfarin with regard to efficacy (i.e. hazard for a stroke or systematic embolism), and that the low dabigatran dose seems to be superior to warfarin with regard to safety (hazard for major bleeding). Since the multiplicity adjustments anticipated in the trial were only with regard to the two non-inferiority null hypotheses in efficacy, superiority claims with regard to efficacy and claims on safety are not strictly confirmative. iSCIs based on as graph as shown in Figure 1, but with two doses instead of three, permits strictly confirmative claims also with regard to superiority in the efficacy and safety endpoints. We will illustrate below how the study could have been planned to include such a procedure. We will assume, for simplicity, that for the estimates for efficacy and safety are normally distributed with known standard errors (which is in line with the common asymptotic approximation for the log-hazard rates).

As indicated in Figure 1, effectiveness of the doses is primarily tested by non-inferiority. Accordingly, the hypotheses are $H_{E_{j}} = H_{j}^{- δ_{j}} : θ_{E_{j}} \leq - δ_{j}$ with given non-inferiority margins $δ_{j} > 0$ for $j = 1, 2$ , where $θ_{E_{j}}$ are the efficacy parameters. Here $j = 1$ represents the low dose treatment and $j = 2$ the high dose treatment. After non-inferiority of a treatment $j$ has been shown, its safety is investigated by the superiority hypothesis $H_{S_{j}}$ , $j = 1, 2$ , for major bleeding.

Of course, it would be valuable to also show superiority over the active comparator for the efficacy endpoint if the effect estimate is large. To this end, the simultaneous confidence bounds would need to be informative, such that $L_{j} > 0$ is a possibility whenever non-inferiority has been shown. With the new approach, we obtain informative confidence bounds for all hypotheses and thus also have the chance to prove superiority. In contrast, the SCIs of Strassburger and Bretz⁷ would give, for example, $L_{1} = - δ_{1}$ if $H_{E_{1}}$ is rejected but not $H_{E_{2}}$ .

3.3.1. Trial planning

We now discuss how the trial could have been planned with our iSCIs. To this end, we fix the one-sided overall level $α = 0.025$ and assign the two non-inferiority null hypotheses the initial levels $α_{1} = α_{2} = α / 2 = 0.0125$ . The non-inferiority margin for the log-hazard rate is chosen, like in the RELY study, as $δ_{1} = δ_{2} = δ_{n} := \log (1.46) = 0.378$ for both doses. For a power of $80 %$ , that is, $β = 0.2$ , (with the initial levels) we need to recruit patients until the total information $I_{n} = (z_{α_{j}} + z_{β})^{2} / δ_{n}^{2} = 66.37$ is reached for the efficacy endpoint, where $z_{u}$ is the $(1 - u)$ -quantile of the standard normal distribution. Assuming that superiority in efficacy is powered at $δ_{e} = 1.3 δ_{n} = 0.492$ (compare Brannath et al.⁴), we can assign the smaller level $α_{e} = 1 - Φ (δ_{e} I_{n} - z_{β}) = 0.00077$ to achieve the power $80 %$ with the information $I_{n}$ . From this, we derive the corresponding $q = q_{E_{j}}$ for the calculation of the SCIs for the hypotheses $H_{E_{j}}$ , $j = 1, 2$ . We have:

α_{E_{j}} = q_{E_{j}}^{0 + δ_{n}} α / 2 \Leftrightarrow q \approx 0.00063

With this choice of

q_{E_{j}}

we ensure that

L_{E_{j}} \geq 0

with probability

80 %

for

θ_{E_{j}} = δ_{e}

. The values of the information weight

q

for the two safety hypotheses

H_{S_{j}}

j = 1, 2

remain to be chosen. We want to explore the behavior of the SCI bounds in dependence of these parameters in a simulation. Given the symmetry of the study design, an equal choice of the information weight

q

for both dose groups seems reasonable. Gaining knowledge of the size of the effect in the safety endpoints is clearly most important, in case of no superiority in the efficacy endpoints. This leads to the scenario where

θ_{E_{1}} = θ_{E_{2}} = 0

θ_{S_{1}} = θ_{S_{2}} = δ_{e} = 0.492

. We want to investigate the influence of

q_{S_{1}} = q_{S_{2}}

and vary it over

(0, 1)

. For each choice of

q_{S_{1}} = q_{S_{2}}

, we simulated 100,000 trials with the above parameters. For each of the simulation replicas, we calculated the iSCIs as well as the compatible SCIs (cSCIs) by Strassburger and Bretz.⁷ Since by Connolly et al.¹³ no correlation between the test statistics for

H_{E_{j}}

and

H_{S_{j}}

j = 1, 2

was given, we set this to 0. Following Di Scala and Glimm,¹⁴ the correlation between the test statistics for

H_{E_{j}}

j = 1, 2

as well as the correlation between the tests statistics for

H_{S_{j}}

j = 1, 2

were each set to

1 / 2

. The resulting relation between the probability to reject

H_{S_{j}}

and the magnitude of the confidence bound

L_{S_{j}}

is shown in Figure 4.

Figure 4.

Probability to reject $H_{S_{j}}$ over the mean informative simultaneous confidence interval (SCI) bound for $H_{S_{j}}$ based on the simulation to calibrate the information weight $q_{S_{j}}$ .

For the simulation, we used our developed R-package informativeSCI which is available from CRAN or github. The package can be used to run Algorithm 1, that is for the calculation of the lower informative confidence bounds. Additionally, it can be used to explore the behavior of the SCI bounds in dependence of the information weights by performing simulations.

From the simulation results, we can do the trade-off between higher power and higher information of the SCIs. For example, we could choose that value of the information weight $q$ that ensures, that the probability to reject $H_{S_{j}}$ with the iSCIs is still $81 %$ (which amounts to $q_{S_{j}} = 0.00041$ in this setup) or $80 %$ (which amounts to $q_{S_{j}} = 0.59$ in this setup). For planning a specific clinical trial, one would then investigate the power and magnitude of the SCI bounds for those values of $q$ under different scenarios and determine the value of $q_{S_{j}}$ to be used in the trial based on the overall performance of the SCIs in those scenarios. Results of such considerations are given in Table 2 in the Appendix. We observe, that the loss in power compared to the cSCIs is rather small in all scenarios, where the information gain can be quite substantial. The choice of $q = 0.00041$ seems to be favorable, given that the power loss for $q = 0.59$ amounts to up to $4 %$ , depending on the scenario.

3.3.2. Application to trial data

As a next step we want to apply the proposed method to calculate iSCIs to the data reported by Connolly et al.¹³ The calculation of the iiSCIs can be done with our R-package informativeSCI using the code provided in the Supplemental Material. We compare the iSCIs to the ciSCIs as proposed by Strassburger and Bretz.⁷ The results can be found in Table 1.

We can see that with the cSCIs, the two non-inferiority hypothesis for the efficacy endpoints $H_{E_{1}}$ and $H_{E_{2}}$ , as well as the superiority hypothesis $H_{S_{1}}$ can be rejected, while we cannot reject $H_{S_{2}}$ . This leads to the confidence bounds for the first three parameters to stick to the boundary of the null hypothesis. In comparison to that, the iSCI bounds exceed their compatible counterparts for the parameters tested in $H_{E_{1}}$ , $H_{E_{2}}$ , and $H_{S_{1}}$ , conveying more information, while only being slightly smaller for the last hypothesis, where close to no information is lost in comparison to the compatible bounds. The information gain for testing the efficacy endpoint in the high dose group ( $H_{E_{2}}$ ) is so profound, that with the iSCIs, we can even claim superiority of dabigatran over warfarin with regard to the hazard ratio for a stroke or systematic embolism which was not possible with the cSCIs. This underlines the great potential in information gain the proposed iSCIs offer.

While in this setup one-sided SCIs are sufficient to answer the trial question, in many trials, two-sided intervals are desired. Since there is no hierarchy amongst the hypotheses regarding the upper limits in our example, we use Bonferroni-adjusted SCIs to derive upper limits. These are then intersected with the iSCIs to obtain two-sided intervals. This approach gives the following two-sided 95% SCIs: ( $- 0.182, 0.345$ ) for $θ_{E_{1}}$ , (0.086, 0.666) for $θ_{E_{2}}$ , (0.047, 0.400) for $θ_{S_{1}}$ , and ( $- 0.082, 0.250$ ) for $θ_{S_{2}}$ .

Table 1.
Comparison of the compatible and informative lower 97.5% SCI bounds on the RELY data.

Hypothesis Estimated log-hazard ratio cSCI bound iSCI bound

$H_{E_{1}}$ 0.09 −0.378 −0.182

$H_{E_{2}}$ 0.42 −0.378 0.086

$H_{S_{1}}$ 0.22 0.000 0.047

$H_{S_{2}}$ 0.07 −0.067 −0.082

Hypothesis	Estimated log-hazard ratio	cSCI bound	iSCI bound
$H_{E_{1}}$	0.09	−0.378	−0.182
$H_{E_{2}}$	0.42	−0.378	0.086
$H_{S_{1}}$	0.22	0.000	0.047
$H_{S_{2}}$	0.07	−0.067	−0.082

cSCI: compatible simultaneous confidence interval; iSCI: informative simultaneous confidence interval.

4. Discussion

The approach of informative graphical SCIs proposes a way out of the conflict between wishing to reject as many hypotheses as possible and at the same time obtaining relevant information on the parameters of interest by confidence intervals. The SCIs can be constructed for all graphical procedures from Bretz et al.¹¹ and will always give more information than the pure (non-)rejection of the hypotheses, except in the case where gatekeepers are not rejected and thus the hypothesis test and interval estimation of the parameter is considered to be of no interest. As usual, there is no gain without costs. Compared to the original graphical procedures, the iSCIs usually pay a price in terms of rejections. The possibility to adapt the SCIs to trial specific priorities concerning information and/or power is via the choice of the information weight $q$ . High values of $q$ (i.e. close to $1$ ) give more iSCIs, while small values (close to $0$ ) lead to a higher probability to reject more hypotheses. The choice of the information weight $q$ can also be made separately for each hypothesis. An example how $q$ can be determined in a clinical trial has been given in Section 3.3. In that example, we also showed a simple approach to obtain valid two-sided SCIs. While, for illustrative purposes, we used a simple Bonferroni-adjustment for the upper SCI limits, this procedure could also be generalized. In principle, any procedure delivering upper bounds that maintain the coverage probability and do not fall below the unadjusted lower bounds, could be used to determine upper bounds (including methods accounting for possible correlation between the test statistics). The resulting two-sided intervals can then be obtained by intersection with the lower SCIs proposed in this article. This intersection is non-empty, since by (3) the proposed lower SCI bounds always fall below the unadjusted SCI bounds.

Extension to $ε$ -graphs

So far, we have not explicitly considered graphs that use the $ε$ -notation introduced by Bretz et al.,¹¹ which makes it possible to shift level between families of hypotheses. For example, one may define a procedure where the significance level $α$ is first shifted between $H_{1}$ and $H_{2}$ , and only if both hypotheses are rejected, the level is passed to $H_{3}$ (see Figure 9 by Bretz et al.¹¹). When applying our procedure to this graph, the modified graph contains the shifted hypotheses $H_{1}^{μ_{1}}$ and $H_{2}^{μ_{2}}$ . Since these hypotheses remain after rejecting the null hypotheses, the level kept for $H_{1}$ and $H_{2}$ will never be shifted to $H_{3}$ . Hence, the resulting SCIs will always be the same as if $ε$ equals zero. This means that the construction of SCIs comes at the cost of not being able to exploit the improvements in power that the introduction of $ε$ -edges may yield. It is certainly possible to extenuate this effect by modifying the graph $G^{μ}$ appropriately.

Extension to group sequential designs

It is straightforward to extend the SCIs presented in this article to group sequential designs (GSDs). This can be done, for example, by splitting the total level alpha across the stages of the GSD and calculate the boundaries in each stage based on the new stage-wise level. Valid simultaneous confidence bounds can then be obtained for each parameter, by taking the maximum of the stage-wise bounds. While this procedure will produce valid SCIs it does not take into account the correlation of the stage-wise tests statistics for the individual hypotheses and might therefore leave room for improvement.

Restrictions and outlook

As we have seen in Section 3.1, the new method is more flexible than the penalized intervals from Brannath and Schmidt⁴ in that all graphical procedures of Bretz et al.¹¹ can be adapted to informative graphical SCIs. However, it has to be recognized that an advantage of our previous approach was that it could be generalized to other step-down tests like the Dunnett procedure, which accounts for correlations between the test statistics and thus gains in power. A future aspect of work may, therefore, be to incorporate correlations also in the informative graphical SCIs, similar to the work of Bretz et al.¹⁵ for the graphical procedure.

Supplemental Material

sj-R-1-smm-10.1177_09622802251393666 - Supplemental material for Informative simultaneous confidence intervals for graphical test procedures

Supplemental material, sj-R-1-smm-10.1177_09622802251393666 for Informative simultaneous confidence intervals for graphical test procedures by Werner Brannath, Liane Kluge and Martin Scharpenberg in Statistical Methods in Medical Research

Footnotes

Acknowledgements

We thank Serhat Günay for his help in the development of R programs for our simulation study. This research was supported by the DFG, grant BR 3737/1-1.

ORCID iDs

Werner Brannath

Liane Kluge

Martin Scharpenberg

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors gratefully acknowledge the support of the Leibniz Science Campus Bremen Digital Public Health (www.digital-public-health.de), which is jointly funded by the Leibniz Association (W72/2022), the Federal State of Bremen, and the Leibniz Institute for Prevention Research and Epidemiology – BIPS.

Declaration of conflicting interest

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Supplemental material

Supplemental material for this article is available online.

Appendix A. Proof of (3)

We will explain that (3) holds for any starting graph $G$ . First, as argued in the main article, the modified graph $G^{μ}$ is complete with local levels summing up to $α$ by construction and because of Assumption 1. How can level get lost while rejecting the null hypotheses in $G^{μ}$ ? The answer can be derived from (1). If all updated transition weights are computed by the first case of (1), then no level gets lost. Indeed, it can be easily calculated that $\sum_{l = 1}^{m} α_{l}^{new} = α$ and $\sum_{l \neq j} g_{j l}^{new} = 1$ if $\sum_{l = 1}^{m} α_{l} = α$ and $\sum_{l \neq j} g_{j l} = 1$ . The only way to loose level is a situation where $H_{i}$ is rejected and for some $j$ the updated transition weights $g_{j l}^{new}$ are given by the second case of (1), that is, equal zero, because $g_{j i} g_{i j} = 1$ . Intuitively, $H_{i}$ and $H_{j}$ form a loop spending all level to each other without connection to the other hypotheses. If then both $H_{i}$ and $H_{j}$ are rejected, their level cannot be transferred further.

In the modified graph $G^{μ}$ , however, every hypothesis $H_{i}$ is connected to its shifted hypothesis $H_{i}^{μ_{i}}$ with the positive transition weight $q^{μ_{i}}$ . Since the shifted hypotheses are not rejected, the above situation cannot occur. If two null hypotheses form a loop with no connection to another null hypothesis and both are rejected, then all remaining level will be transferred to their respective shifted hypotheses. Therefore, no level can get lost and (3) is satisfied.

References

Hochberg

Tamhane

. Multiple comparison procedures. New York, Chichester, Brisbane, Toronto, Singapore: Wiley, 1987.

Dickhaus

. Simultaneous statistical inference. Berlin, Heidelberg: Springer, 2014. DOI: 10.1007/978-3-642-45182-9.

European Medicines Agency. Guideline on multiplicity issues in clinical trials. Technical Report EMA/CHMP/44762/2017, European Medicines Agency, 2017. https://www.ema.europa.eu/documents/scientific-guideline/draft-guideline-multiplicity-issues-clinical-trials_en.pdf.

Brannath

Schmidt

. A new class of powerful and informative simultaneous confidence intervals. Stat Med 2014; 33: 3365–3386.

Schmidt

Brannath

. Informative simultaneous confidence intervals in hierarchical testing. Methods Inf Med 2014; 53: 278–283.

Schmidt

Brannath

. Informative simultaneous confidence intervals for the fallback procedure. Biometr J 2015; 57: 712–719.

Strassburger

Bretz

. Compatible simultaneous lower confidence bounds for the Holm procedure and other Bonferroni-based closed tests. Stat Med 2008; 27: 4914–4927.

Guilbaud

. Simultaneous confidence regions corresponding to Holm’s step-down procedure and other closed-testing procedures. Biometr J 2008; 50: 678–692.

Guilbaud

. Simultaneous confidence regions for closed tests, including Holm-, Hochberg-, and Hommel-related procedures. Biometr J 2012; 54: 317–342.

10.

Guilbaud

. Simultaneous confidence intervals compatible with sequentially rejective graphical procedures. Stat Biopharm Res 2018; 10: 220–232.

11.

Bretz

Maurer

Brannath

, et al. A graphical approach to sequentially rejective multiple test procedures. Stat Med 2009; 28: 586–604.

12.

Bauer

Brannath

Bretz

. Multiple testing for identifying effective and safe treatments. Biomet J 2001; 43: 605–616.

13.

Connolly

Ezekowitz

Salim

, et al. Dabigatran versus warfarin in patients with atrial fibrillation. New Engl J Med 2009; 1139–1151. DOI: 10.1056/NEJMoa0905561.

14.

Di Scala

Glimm

. Time-to-event analysis with treatment arm selection at interim. Stat Med 2011; 3067–3081. DOI: 10.1002/sim.4342.

15.

Bretz

Posch

Glimm

, et al. Graphical approaches for multiple comparison procedures using weighted Bonferroni, Simes, or parametric tests. Biometr J 2011; 53: 894–913.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

Informative simultaneous confidence intervals for graphical test procedures

Abstract

Keywords

1. Introduction

1.1. Informative simultaneous confidence intervals (iSCIs)

2.1. Definition of confidence bounds

Basic projection method

Dual graphs and resulting weighted Bonferroni tests

2.3. Properties of the confidence bounds

3.1. Bonferroni-Holm procedure

3.3. A clinical trial example

3.3.1. Trial planning

Table 1. Comparison of the compatible and informative lower 97.5% SCI bounds on the RELY data. Hypothesis Estimated log-hazard ratio cSCI bound iSCI bound H E 1 0.09 −0.378 −0.182 H E 2 0.42 −0.378 0.086 H S 1 0.22 0.000 0.047 H S 2 0.07 −0.067 −0.082

Extension to ε -graphs

Extension to group sequential designs

Restrictions and outlook

Supplemental Material

sj-R-1-smm-10.1177_09622802251393666 - Supplemental material for Informative simultaneous confidence intervals for graphical test procedures

Footnotes

Acknowledgements

ORCID iDs

Funding

Declaration of conflicting interest

Supplemental material

Appendix A. Proof of (3)

References

Supplementary Material

Table 1.
Comparison of the compatible and informative lower 97.5% SCI bounds on the RELY data.

Hypothesis Estimated log-hazard ratio cSCI bound iSCI bound

$H_{E_{1}}$ 0.09 −0.378 −0.182

$H_{E_{2}}$ 0.42 −0.378 0.086

$H_{S_{1}}$ 0.22 0.000 0.047

$H_{S_{2}}$ 0.07 −0.067 −0.082

Extension to $ε$ -graphs