Automated error detection using association rules

Abstract

High data quality is important for every application. Inaccurate or inadequate data can lead to inappropriate assumptions, misleading results, bias and ultimately poor policy and decision making. Finding errors and cleaning data is a time consuming process. This paper presents a framework for automatically detecting unusual and erroneous data values in datasets. The main idea is to generate association rules with very high confidence and to identify the cases that are exceptions to these rules. Experimental results show that the proposed framework is able to successfully identify erroneous values in large datasets.

Keywords

Data quality data cleaning error detection outlier detection association rules data mining market basket

Get full access to this article

View all access options for this article.