An Assessment of the Effectiveness of Multiple Hypothesis Testing for Geographical Anomaly Detection

Abstract

The practice of multiple significance testing is reviewed, and an alternative to the frequently used Bonferroni correction is considered. Rather than controlling the family-wise error rate (FWER)—the probability of a false positive in any of the significance tests—this alternative due to Benjamini and Hochberg controls the false discovery rate (FDR). This is the proportion of tests reporting a significant result that are actually ‘false alarms’. The methods (and some variants) are demonstrated on a procedure to detect clusters of full-time unpaid carers based on UK census data, and are also assessed using simulation. Simulation results show that the FDR-based corrections are typically more powerful than FWER-based ones, and also that the degree of conservatism in FWER-based procedures is quite extreme, to the extent that the standard Bonferroni procedure intended to constrain the FWER to be below 0.05 actually has a FWER of around 6 × 10⁻⁵. We conclude that in situations where one is scanning for anomalies, the extreme conservatism of FWER-based approaches results in a lack of power, and that FDR-based approaches are more appropriate.

Get full access to this article

View all access options for this article.

References

Anselin

, 1995, “Local indicators of spatial association—LISA” Geographical Analysis 27 93–115

Anselin

Bera

Florax

Yoon

, 1996, “Simple diagnostic tests for spatial dependence” Regional Science and Urban Economics 26 77–104

Benjamini

Hochberg

, 1995, “Controlling the false discovery rate: a practical and powerful approach to multiple testing” Journal of the Royal Statistical Society Series B 57 289–300

Benjamini

Hochberg

, 2000, “On the adaptive control of the false discovery rate in multiple testing with independent statistics” Journal of Educational and Behavioral Statistics 25 60–83

Benjamini

Yekutieli

, 2001, “The control of the false discovery rate in multiple testing under dependency” The Annals of Statistics 29 1165–1188

Benjamini

Kreiger

Yekutieli

, 2006, “Adaptive linear step-up procedures that control the false discovery rate” Biometrika 93 491–507

Bonferroni

C E

, 1935, “Il calcolo delle assicurazioni su gruppi di teste”, in Studi in Onore del Professore Salvatore Ortu Carboni (Rome) pp 13–60

Caldas de Castro

Singer

, 2006, “Controlling the false discovery rate: a new application to account for multiple and dependent tests in local statistics of spatial association” Geographical Analysis 38 180–208

Chung

Bohme

Mecklenbrauker

Hero

, 2005, “Multiple signal detection using the Benjamini — Hochberg procedure”, in Computational Advances in Multi-sensor Adaptive Processing 2005 (IEEE, New York) pp 209–212

10.

Doran

Drever

Whitehead

, 2003, “Health of young and elderly informal carers: analysis of UK census data” British Medical Journal 327 1388

11.

Dunn

, 1961, “Multiple comparisons among means” Journal of the American Statistical Association 56 52–64

12.

Games

P A

, 1977, “An improved t table for simultaneous control on g contrasts” Journal of the American Statistical Association 72 531–534

13.

General Register Office for Scotland, 2001, “Scotland's Census. Edinburgh: Gros, 2001”, http://www.gro-scotland.gov.uk/census

14.

Getis

Ord

, 1992, “The analysis of spatial association by use of distance statistics” Geographical Analysis 24 189–206

15.

Greve

Gaston

K J

van Rensburg

B J

Chown

S L

, 2008, “Environmental factors, regional body size distributions and spatial variation in body size of local avian assemblages” Global Ecology and Biogeography 17 514–523

16.

Holm

, 1979, “A simple sequentially rejective multiple test procedure” Scandinavian Journal of Statistics 6 65–70

17.

NHS: The Information Centre, 2007, “2006/2007 general practice workload survey”, http://www.ic.nhs.uk/webfiles/publications/gp/GPWorkloadReport.pdf

18.

Northern Ireland Statistics and Research Agency, 2001, “Northern Ireland Census 2001 Output. Belfast, NISRA 2001”, http://www.nisranew.nisra.gov.uk/census/start.html

19.

ONS, 2003, “Census 2001: [cd supplement to the national report for England and Wales and key statistics for local authorities in England and Wales]”, Office for National Statistics, http://www.ons.gov.uk/census/index.html

20.

Openshaw

Charlton

Wymer

Craft

, 1987, “A mark 1 geographical analysis machine for the automated analysis of point data sets” International Journal of Geographical Information Systems 1 335–358

21.

Ord

J K

Getis

, 1995, “Local spatial autocorrelation statistics: distributional issues and an application” Geographical Analysis 27 286–306

22.

Rogerson

Yamada

, 2009 Statistical Detection and Surveillance of Geographical Clusters (Chapman and Hall/CRC, Boca Raton, FL)

23.

Schweder

Spjøtvoll

, 1982, “Plots of p-values to evaluate many tests simultaneously” Biometrika 69 492–502

24.

Šidàk

, 1967, “Rectangular confidence region for the means of multivariate normal distributions” Journal of the American Statistical Association 62 626–633

25.

Yamada

Rogerson

P A

Lee

, 2009, “GeoSurveillance: a GIS-based system for the detection and monitoring of spatial clusters” Journal of Geographical Systems 11 155–173