Automatic Editing for Business Surveys: An Assessment of Selected Algorithms
de Waal, Ton ; Coutinho, Wieger
Internat. Statist. Rev., Tome 73 (2005) no. 1, p. 73-102 / Harvested from Project Euclid
Statistical offices are responsible for publishing accurate statistical information about many different aspects of society. This task is complicated considerably by the fact that data collected by statistical offices generally contain errors. These errors have to be corrected before reliable statistical information can be published. This correction process is referred to as statistical data editing. Traditionally, data editing was mainly an interactive activity with the aim to correct all data in every detail. For that reason the data editing process was both expensive and time-consuming. To improve the efficiency of the editing process it can be partly automated. One often divides the statistical data editing process into the error localisation step and the imputation step. In this article we restrict ourselves to discussing the former step, and provide an assessment, based on personal experience, of several selected algorithms for automatically solving the error localisation problem for numerical (continuous) data. Our article can be seen as an extension of the overview article by Liepins, Garfinkel & Kunnathur (1982). All algorithms we discuss are based on the (generalised) Fellegi-Holt paradigm that says that the data of a record should be made to satisfy all edits by changing the fewest possible (weighted) number of fields. The error localisation problem may have several optimal solutions for a record. In contrast to what is common in the literature, most of the algorithms we describe aim to find all optimal solutions rather than just one. As numerical data mostly occur in business surveys, the described algorithms are mainly suitable for business surveys and less so for social surveys. For four algorithms we compare the computing times on six realistic data sets as well as their complexity.
Publié le : 2005-04-14
Classification:  Branch-and-bound,  Cutting planes,  Error localisation,  Fellegi-Holt method,  Fellegi-Holt paradigm,  Fourier-Motzkin elimination,  Integer programming,  Statistical data editing,  Vertex generation
@article{1112304813,
     author = {de Waal, Ton and Coutinho, Wieger},
     title = {Automatic Editing for Business Surveys: An Assessment of Selected Algorithms},
     journal = {Internat. Statist. Rev.},
     volume = {73},
     number = {1},
     year = {2005},
     pages = { 73-102},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1112304813}
}
de Waal, Ton; Coutinho, Wieger. Automatic Editing for Business Surveys: An Assessment of Selected Algorithms. Internat. Statist. Rev., Tome 73 (2005) no. 1, pp.  73-102. http://gdmltest.u-ga.fr/item/1112304813/