Abstract
Mining richly structured heterogeneous datasets represents a key emerging challenge for data mining. When traditional clustering methods are applied, heterogeneous networks consisting of multiple entities must first be converted to homogeneous networks consisting of entities of only one type, causing information loss. This paper proposes a three-phase general framework to directly handle the information contained in extended star-structured heterogeneous data. First, entity resolution of objects of all types is conducted based on their attribute values. Then, central objects are clustered in terms of their relation strengths. Finally, groups of attribute objects are detected according to the clustering assignment of their connected central objects. A numerical example is provided to illustrate the modeling idea and working principle of the proposed method, and experiments on a real-world dataset show the effectiveness of the three proposed algorithms.
Get full access to this article
View all access options for this article.
