Abstract
Data is shared among different organizations for mutual benefit. Data mining techniques are utilized to discover valuable knowledge for decision-making. However, data mining poses a threat to disclose the sensitive information. Thus, the sensitive knowledge should be concealed before releasing data. The pervious works either address the association rule or utility itemsets hiding problem. This paper focuses on preserving the sensitive utility and frequent itemsets, and a sanitization approach named HUFI is presented. The sensitive itemsets are hidden by reducing their support or utility below the minimum thresholds. For a sensitive itemset, the concept of maximum boundary value is introduced to determine the hidden strategy. Then, a transaction supporting minimal number of non-sensitive itemsets is selected to be sanitized. In such a transaction, a weight is assigned to each item contained in the sensitive itemset, and an item with the highest weight is selected to be modified. We compared HUFI with the state of the art algorithms on various databases. The experiment results show that HUFI outperforms the other algorithms in minimizing the side effects on non-sensitive knowledge and maintaining the database quality after the sanitization process. In addition, the impact of database density on sanitization approaches is observed.
Get full access to this article
View all access options for this article.
