Abstract
Statistical issues concerning multiple response criteria, multiple treatment groups, and multiple subgroups in clinical trials require careful attention in order to avoid an inappropriately high prevalence of chance findings, as well as to avoid unsatisfactorily low power to detect real treatment differences. An underlying goal is using a 0.050 significance level as often as possible for separate assessments while maintaining a 0.050 level for all assessments taken together, so statistical power is not compromised. For multiple response criteria, a useful assessment strategy is composite ranking as a single criterion first and then its individual components. Multiple treatment comparisons can often be effectively addressed with closed testing procedures with hierarchical evaluation. This hierarchy must be well specified since significance at its first stage is required before testing is allowed at the next stage.
In most clinical trials, subgroups are of supportive interest after statistical significance for all patients is shown. A subgroup hierarchy, however, permits primary evaluation in conjunction with all patients through significance level spending function methods as in interim analyses, for example, the O'Brien-Fleming method. The rationale is the analogy between a subgroup hierarchy and the patient hierarchy at successive interim analyses. With this method, the significance level for all patients' evaluation typically ranges between 0.040 and 0.045 and that for subgroups ranges from 0.005-0.020.
The methods outlined here for multiple response criteria, multiple treatment groups, and subgroups, or related counterparts, should be prespecified in the protocol for a clinical trial. If not in the protocol, they should be incorporated in the analysis plan prior to study unmasking.
Keywords
Get full access to this article
View all access options for this article.
