A new technique is described for retrieving infor mation by finding the best match or matches between a textual 'query' and a textual database. The technique uses principles of beam search with a measure of probability to guide the search and prune the search tree. Unlike many methods for comparing strings, the method gives a set of alternative matches, graded by the 'quality' of the matching achieved.
Get full access to this article
View all access options for this article.
References
1.
A.V. Aho, Algorithms for finding patterns in strings. In: Handbook of Theoretical Computer Science, edited by J. van Leeuwen, (Elsevier Science Publishers, Amsterdam, 1990) 255-300. (Chapter 5).
2.
P.M.G. Apers , C.A. van den Berg, J. Flokstra, P.W.P.J. Grefen, M.L. Kersten and A.N. Wilschut, PRIMA/DB: a parallel main memory relational DBMS, IEEE Transactions on Knowledge and Data Engineering4(6) (1992) 541-554.
3.
J. Ashford and P. Willett, Text Retrieval and Document Databases (Chartwell-Bratt , Bromley, UK, 1988).
G.J. Barton , Protein multiple sequence alignment and flexible pattern matching, Methods in Enzymology183 (1990) 403-428.
6.
R. Bellman and S. Dreyfus, Applied Dynamic Programming (Princeton University Press , Princeton, NJ, 1962).
7.
A.A. Bertossi , F Luccio, L. Pagli and E. Lodi, A parallel solution to the Approximate String Matching problem, Computer Journal35(5) ( 1992) 524-526.
8.
H. Boral, W. Alexander, L. Clay, G. Copeland, S. Danforth, M. Franklin, B. Hart, M. Smith and P. Valduriez, Prototyping Bubba, a highly parallel database system, IEEE Transactions on Knowledge and Data Engineering2(1) (1990) 4-24.
9.
D.M. Carroll , C.A. Pogue and P. Willett, Bibliographic pattern matching using the ICL Distributed Array Processor, Journal of the American Society for Information Science39(6) (1988) 390-399.
10.
S.C. Chan , A.K.C. Wong and D.K.Y. Chiu, A survey of multiple sequence comparison methods, Bulletin of Mathematical Biology54(4) (1992) 563-598.
11.
J.K. Cringean , R. England, G.A. Manson and P. Willett, Network design for the implementation of text searching using a multicomputer, Information Processing and Management27(4) (1991) 265-283.
12.
J.K. Cringean , R. England, G.A. Manson and P. Willett, Parallel text searching in serial files using a processor farm. In Proceedings of the 13th International Conference on Research and Development in Information Retrieval , Brussels, September 1990, 429-454.
13.
J.K. Cringean , G.A. Manson, P. Willett and G.A. Wilson.Efficiency of text scanning in bibliographic databases using microprocessor-based, multiprocessor networks, Journal of Information Science14 ( 1988) 335-345.
14.
W.H.E. Day and F.R. McMorris, Critical comparison of consensus methods for molecular sequences, Nucleic Acids Research20(5) (1992) 1093-1099.
15.
Z. Galil and R. Giancarlo, Data structures and algorithms for approximate string matching, Journal of Complexity4 (1988) 33-72.
16.
G.H. Gonnet and R. Baeza-Yates, Text algorithms. In: Handbook of Algorithms and Data Structures in Pascal and C2nd ed. (Addison-Wesley , Wokingham , UK, 1991) 251-288. (Chapter 7).
17.
D.S. Hirshberg , A linear space algorithm for computing maximal common subsequences, Communications of the ACM18(6) (1975) 341-343.
18.
J.W. Hunt and T.G. Szymanski, A fast algorithm for computing longest common subsequences, Communications of the ACM20(5) (1977) 350-353.
19.
G. Jacobson and K-P. Vo , Heaviest increasing/common subsequence problems . In: Proceedings of the Combinatorial Matching Conference, Tucson, Arizona, April 1992.
20.
S. Lavington , Parallel associative processing for knowledge bases, Lecture Notes in Computer Science 503 (1991) 112-123.
21.
D.L. Lee , ALTEP - a cellular processor for high-speed pattern matching, New Generation Computing4(3) (1986) 225-244.
22.
D. Maier , The complexity of some problems on subsequences and supersequences, Journal of the ACM22(2) (1978) 322-336.
23.
V. W-K. Mak, K.C. Lee and O. Frieder.Exploiting parallelism in pattern matching: an information retrieval application, ACM Transactions on Information Systems9(1) (1991) 52-74.
24.
W.J. Masek and M.S. Paterson, How to compute string-edit distances quickly. In [27].
25.
G.D. Oosthuizen , The use of a lattice for fast pattern matching , South African Computer Journal1 (1990) 31-35.
26.
K. Pirklbauer , A study of pattern-matching algorithms, Structured Programming13 ( 1992 ) 89-98.
27.
D. Sankoff and J.B. Kruskall (eds), Time Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparisons (Addison-Wesley , Reading, MA, 1983).
28.
C.E. Shannon and W. Weaver, The Mathematical Theory of Communication (University of Illinois Press, Urbana, IL, 1949).
29.
C. Stanfill and R. Thau, Information retrieval on the Connection Machine: 1 to 8192 gigabytes, Information Processing and Management27(4) (1991) 285-310.
30.
W.R. Taylor , Pattern matching methods in protein sequence comparison and structure prediction , Protein Engineering2(2) ( 1988) 77-86.
31.
R.A. Wagner and M.J. Fischer, The string-to-string correction problem , Journal of the ACM21(1) (1974) 168-173.
32.
P. Willett , Software and hardware techniques for string searching in serial document databases, World Patent Information10(2) ( 1988) 120-129.