The characteristics of conflation algorithms are discussed and examples given of some algorithms which have been used for information retrieval systems. Comparative experiments with a range of keyword dictionaries and with the Cranfield document test collection suggest that there is relatively little difference in the performance of the algorithms despite the widely disparate means by which they have been developed and by which they operate.
J.H. Ashford and D.I. Matkin, Report of a study of the potential users and application areas for free text information storage and retrieval systems in Britain, 1979-1981, Program 14 (1980) 14-23.
2.
L.J. Anthony (Ed.), Microprocessors and Intelligence (Aslib, London , 1979).
3.
J.B. Whitehead , Developments in word processing systems and their application to information needs, Aslib Proceedings32 (1980) 118-133.
4.
J.B. Lovins , Development of a stemming algorithm, Mech. Trans. Comput. Linguis.11 ( 1968) 22-31.
5.
B.J. Field , Semi-automatic development of thesauri using free-language vocabulary analysis, British Library Res. Develop. Dept. Rep.5260 (1975).
6.
C.J. Overhage and J.F. Reintjes, Project Intrex: A general review, Inform. Storage and Retrieval10 (1974) 157-188.
7.
T.C. Lowe, D.C. Roberts, and P. Kurtz, Additional text processing for on-line retrieval (The RADCOL System), Tech. Rep. RADC-TR-73-337 (1973).
8.
M.F. Porter , An algorithm for suffix stripping, Program14 (1980) 130-137.
9.
J.L. Dolby and H.L. Resnikoff, On the structure of written English , Language40 ( 1964) 167-196.
10.
J.L. Dawson , Suffix removal and word conflation, Assoc. Lit. Ling. Comput. Bull.2 ( 1974) 33-46.
11.
C.P. Bourne and D.F. Ford, A study of methods for systematically abbreviating English words and names, J. ACM8 (1961) 538-552.
12.
P. Willett , Document retrieval experiments using indexing vocabularies of varying size, II. Hashing, truncation, digram and trigram encoding of index terms, J. Documentation35 (1979) 296-305.
13.
B.D. Tarry, Automatic suffix generation and word segmentation for information retrieval , M.Sc. thesis, University of Sheffield ( 1978).
14.
D. Cooper and M.F. Lynch, Compression of Wiswesser line notations using variety generation, J. Chem. Inform. Comput. Sci.19 (1979) 165-169.
15.
M.A. Hafer and S.F. Weiss, Word segmentation by letter successor varieties , Inform. Storage and Retrieval10 (1974) 371-385.
16.
K. Sparck Jones , Automatic Keyword Classification and Information Retrieval (Butterworths, London , 1971).
17.
J. Minker , E. Peltola and G.A. Wilson, Document retrieval experiments using cluster analysis, J. Amer. Soc. Inform. Sci.24 (1973) 247-257.
18.
G.W. Adamson and J. Boreham, The use of an associative measure based on character structure to identify semantically related pairs of words and document titles, Inform. Storage and Retrieval10 (1974) 253-260.
19.
C.P. Bourne , Frequency and impact of spelling errors in bibliographic data bases, Inform. Process. Management13 (1977) 1-12.
20.
J.B. Lovins , Error evaluation for stemming algorithms as clustering algorithms. J. Amer. Soc. Inform. Sci.22 (1971) 28-40.
21.
C.J. Van Rijsbergen , Information Retrieval (Butterworths , London, 1979).
22.
S. Siegel, Nonparametric Statistics for the Behavioural Sciences ( McGraw Hill, Tokyo, 1956).
23.
C. Landauer and C. Mah, Message extraction through estimation of relevance, Paper presented at the ACM-BCS Symp. on Research and Development, in: Information Retrieval, Cambridge, 23-26 June 1980 , to appear.
24.
P. Willett , A fast procedure for the calculation of similarity coefficients in automatic classification, Inform. Process. Management17 (1981) 53-60.