A graph-based clustering algorithm in large transaction databases

Abstract

Clustering in transaction databases can find potentially useful patterns to improve the product profit. Unfortunately, most clustering algorithms based on metric distances are not appropriate for transaction data. In this paper, we study the problem of item clustering in large transaction databases. We first present a definition of similarity measure between items based on large itemsets presented in transaction databases, which not only captures the co-occurrence relationship of items but also remains insensitive to noise. We represent the similarity relationship by an undirected graph and transform the clustering problem into discovering connected components of the graph. We also discuss the evaluation of clustering quality and develop an automatic optimizer for the optimum thresholds search, finding the item clustering which optimizes the quality.

Keywords

clustering large itemset connected component

Get full access to this article

View all access options for this article.