Abstract
We have developed simple algorithms that allow adjacency and proximity searching in Google and the Science Citation Index (SCI). The SCI algorithm exploits the fact that SCI stopwords in a search phrase function as a placeholder. Such a phrase serves effectively as a fixed adjacency condition determined by the numbern of adjacent stopwords (i.e. retrieve all records where word A and word B are separated byn words in at least one location). The algorithm integrates over search phrases with different numbers of adjacent stopwords to provide a flexible adjacency or proximity capability (i.e. retrieve all records where word A and word B are separated byn or fewer words in at least one location, wheren is the maximum separation desired between A and B in at least one location). The Google algorithm exploits the fact that asterisks (in Google) separating words in a phrase function like word wildcards. The difference between two such phrases (the first phrase containing one fewer asterisk than the second phrase) serves effectively as a fixed adjacency or proximity condition, with the number of separating words equal to the number of asterisks in the first phrase. The algorithm integrates over these phrase differentials to provide a flexible adjacency or proximity capability (i.e. retrieve all records where word A and word B are separated byn or fewer words in at least one location, wheren is the maximum separation desired between A and B in at least one location).
Keywords
Get full access to this article
View all access options for this article.
