Sage Journals: Discover world-class research

Abstract

It has been observed that statistical tests are infre quently applied in analysing differences in the performance of different retrieval methods. We believe this is explained on the one hand by the complexity of the subject, and on the other hand by the desire to avoid misleading conclusions. Because practical retrieval methods cannot be explained by simple models, parametric statistical tests are generally not suitable. Some non-parametric tests require a symmetry in the null hypothesis that seems inappropriate to the required task. A second class of non-parametric tests comprise the bootstrap methods. Here, the null hypothesis seems appro priate to practical testing, but the bootstrap assumption (that the sample may adequately represent the whole population) may be in question. If the bootstrap assumption is false, one may be led to erroneous conclusions (type I or type II errors). Here, by use of a mathematical model [11] which approxi mates the behaviour of practical retrieval systems, we show that bootstrap methods perform well in performance com parisons based on actual test sets used in practice. Type I error is appropriately predictable and the power loss of the tests, when compared with the theoretically most power ful test in the most realistic setting, may not exceed ten per centage points. We conclude that the bootstrap methods provide a practical approach to statistical testing in the field of retrieval performance analysis.

Get full access to this article

View all access options for this article.

References

C. Buckley , Implementation of the SMART Information Retrieval System (Technical Report 85-686 (Department of Computer Science, Cornell University , 1985).

C. Buckley and A.F. Lewit , Optimization of inverted vector searches. In: Proceedings of the Eighth International ACM Conference on Research and Development in Information Retrieval (Montreal, Quebec, 1985), pp. 97-110.

W.B. Croft , Experiments with Representation in a Document Retrieval System (Technical Report 82-21 (COINS, University of Massachussetts, Amherst, MA, 1982).

B. Efron , Bootstrap methods: another look at the jack-knife, Annals of Statistics, 7 (1979) 1-26.

E.A. Fox (ed.), Virginia Disc One (Nimbus Records, Virginia Polytechnic Institute and State University, 1990).

D. Hull , Using statistical testing in the evaluation of retrieval experiments . In: Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , edited by R. Korfhage, E. Rasmussen and P. Willett (ACM Press, Pittsburgh, PA , 1993), pp. 329-338.

E.M. Keen , Presenting results of experimental retrieval comparisons, Information Processing and Management 28(4) (1992) 491-502.

M. Kendall and A. Stuart , The Advanced Theory of Statistics: Vol. 2. 4th ed. (Charles Griffin & Co. Ltd, London, 1979).

H.J. Larson , Introduction to Probability Theory and Statistical Inference. 3rd ed. (John Wiley & Sons, New York, 1982).

10.

D. Lucarella , A document retrieval system based on nearest neighbor searching, Journal of Information Science 14 (1988) 25-33.

11.

D.B. McCarn and C.M. Lewis , A mathematical model of retrieval system performance, Journal of the American Society for Information Science 41(7) (1990) 495-500.

12.

E.W. Noreen , Computer Intensive Methods for Testing Hypotheses (John Wiley & Sons, New York , 1989 ).

13.

J.W. Pratt and J.D. Gibbons , Concepts of Nonparametric Theory (Springer-Verlag , New York, 1981).

14.

S.J. Press , Bayesian Statistics: Principles, Models and Applications ( John Wiley & Sons, New York, 1989).

15.

C.J. van Rijsbergen , Information Retrieval ( Butterworths , London, 1979).

16.

G. Salton , Automatic Text Processing (Addison-Wesley, Reading, MA, 1989).

17.

G. Salton , The state of retrieval system evaluation, Information Processing and Management 28(4) (1992) 441-449.

18.

G. Salton and M. McGill , Introduction to Modern Information Retrieval ( McGraw-Hill , New York, 1983).

19.

T. Saracevic , Individual differences in organizing, searching and retrieving information, Proceedings of the 54th Annual Meeting of the American Society of Information Science 28 (1991) 82-86.

20.

J. Schmuller , Editorial: on the brink, PC AI (July/August 1993).

21.

S. Siegel and N.J. Castellan , Nonparametric Statistics for the Behavioral Sciences. 2nd ed. (McGraw-Hill, New York, 1988).

22.

A. Swets , Effectiveness of Information Retrieval Methods ( Bolt, Beranek and Newman, Cambridge, MA, 1967).

23.

J.M. Tague , The pragmatics of information retrieval experimentation. In: Information Retrieval Experiment, edited by K. Sparck Jones (Butterworths, London, 1981), pp. 59-104.

24.

J. Tague-Sutcliffe , The pragmatics of information retrieval experimentation , revisited, Information Processing and Management 28(4) (1992) 467-490.

25.

W.J. Wilbur , An information measure of retrieval performance , Information Systems 17(4) (1992) 283-298.

Non-parametric significance tests of retrieval performance comparisons

Abstract

Get full access to this article

References