Abstract
A study is reported where the rankings assigned by a search engine (Google) to the top 100 hits for seven queries are compared with the ratings of human judges (six per query). As assessed by correlations, agreement between judges was relatively good across all the queries, with a mean (Pearson) correlation of .475. In contrast, the correlations between Google rankings and human judges in this study were significantly lower, with a mean of .153. This study suggests that a high priority should be placed on finding improved methods for ranking the initial set of search engine results, so that they are more in line with human judgments of relevance.
Get full access to this article
View all access options for this article.
