Abstract
Data editing is the process of checking and correcting data. In practise, these processes are often automated. A large number of constraints needs to be handled in many applications. This article shows that data editing can benefit from automated constraint simplification techniques. Performance can be improved, which broadens the scope of applicability of automatic data editing. Flaws in edit rule formulation may be detected, which improves the quality of automatic edited data.
References
1.
Banff Support Team. 2008. Functional Description of the BANFF System for Edit and Imputation. Ottawa: Statistics Canada (Technical report).
2.
Bertsimas, D. and J.N. Tsitsiklis. 1997. Introduction to Linear Optimization. Nashua: Athena Scientific.
3.
Bruni, R. 2005. “Error Correction for Massive Data Sets.” Optimization Methods and Software 20: 291-310. Doi: http://dx.doi.org/10.1080/10556780512331318281.
4.
Bruni, R. and G. Bianchi. 2012. “A Formal Procedure for Finding Contradictions into a Set of Rules.” Applied Mathematical Sciences 6: 6253-6271.
5.
Chen, D., R.G. Batson, and Y. Dang. 2010. Applied Integer Programming; Modelling and Solution. Hoboken: John Wiley & Sons. Doi: http://dx.doi.org/10.1002/9781118166000.
6.
Chinneck, J.W. 1997. “Finding a Useful Subset of Constraints for Analysis in an Infeasible Linear Program.” INFORMS Journal on Computing 9: 164-174. Doi: http://dx.doi.org/ 10.1287/ijoc.9.2.164. Available at: http://www.sce.carleton.ca/faculty/chinneck/docs/ UsefulSubset.pdf (accessed January 2017).
7.
Chmeiss, A., V. Krawczyk, and L. Sais. 2008. “Redundancy in CSPs.” In Proceedings of the 18th European Conference on Artificial Intelligence (ECAI 2008), August 21-25, 2008. Patras, Greece. Amsterdam: IOS Press. Doi: http://dx.doi.org/10.3233/978-1-58603-891-5-907.
8.
De Jonge, E. and M. van der Loo. 2015. Editrules: R Package for Parsing and manipulating of Edit Rules and Error Localization. R Package Version 2.9-0. Available at: http://cran.r-project.org/package=editrules (accessed May 2017).
9.
De Waal, T., J. Pannekoek, and S. Scholtus. 2011. Handbook of Statistical Data Editing and Imputation. New York: John Wiley & Sons. Doi: http://dx.doi.org/10.1002/9780470904848.
10.
Dillig, I., T. Dillig, and A. Aiken. 2010. “Small Formulas for Large Programs: On-Line Constraint Simplification in Scalable Static Analysis.” In Proceedings of the 17th international conference on Static analysis (SAS’10), September 14-16, 2010 Perpignan, France. Berlin Heidelberg: Springer-Verlag. Available at: http://theory.stanford.edu/~aiken/publications/papers/sas10.pdf (accessed January 2017).
11.
Felfernig, A., C. Zehentner, and P. Blazek. 2011. “CoreDiag: Eliminating Redundancy in Constraint Sets.” Proceedings of 22nd International Workshop on Principles of Diagnosis, October 4-7, 2011, Murnau, Germany. Available at: http://www.ist.tugraz.at/felfernig/images/stories/home/dx_corediag.pdf (accessed March 2017).
12.
Fellegi, I.P. and D. Holt. 1976. “A Systematic Approach to Automatic Edit and Imputation.” Journal of the American Statistical Association 71: 17-35. Doi: http://dx.doi.org/10.1080/01621459.1976.10481472.
13.
Hooker, J. 2000. Logic-Based Methods for Optimization: Combining Optimization and Constraint Satisfaction. New York: John Wiley & Sons. Doi: http://dx.doi.org/10.1002/9781118033036.
14.
Konis, K. 2016. lpSolveAPI: R Interface for lpsolve. Version 5.5.2.0-17 R package version 5.5.2.0. Available at: https://cran.r-project.org/web/packages/lpSolveAPI/index.html (accessed January 2017).
15.
Pannekoek, J., S. Scholtus, and M. van der Loo. 2013. “Automated and Manual Data Editing: a View on Process Design and Methodology.” Journal of Official Statistics 29: 511-537. Doi: http://dx.doi.org/10.2478/jos-2013-0038.
16.
Paulraj, S. and P. Sumathi. 2010. “A Comparative Study of Redundant Constraints Identification Methods in Linear Programming Problems.” Mathematical Problems in Engineering. Article ID 723402. Doi: http://dx.doi.org/10.1155/2010/723402.
17.
Piette, C. 2008. “Let the Solver Deal with Redundancy.” In Proceedings of the 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’08), November 3-5, 2008, Dayton, Ohio. Washington DC: IEEE Computer Society. Doi: http://dx.doi.org/10.1109/ICTAI.2008.38. Available at: https://hal.archives-ouvertes.fr/hal-00865304/document (accessed March 2017).
18.
Telgen, J. 1983. “Identifying Redundant Constraints and Implicit Equalities in Systems of Linear Constraints.” Management Science 29: 1209-1222. Doi: http://dx.doi.org/10.1287/mnsc.29.10.1209.
