Data Mining for Fun and Profit
Hand, David J. ; Blunt, Gordon ; Kelly, Mark G. ; Adams, Niall M.
Statist. Sci., Tome 15 (2000) no. 1, p. 111-131 / Harvested from Project Euclid
Data mining is defined as the process of seeking interesting or valuable information within large data sets. This presents novel challenges and problems, distinct from those typically arising in the allied areas of statistics, machine learning, pattern recognition or database science. A distinction is drawn between the two data mining activities of model building and pattern detection. Even though statisticians are familiar with the former, the large data sets involved in data mining mean that novel problems do arise. The second of the activities, pattern detection, presents entirely new classes of challenges, some arising, again, as a consequence of the large sizes of the data sets. Data quality is a particularly troublesome issue in data mining applications, and this is examined. The discussion is illustrated with a variety of real examples.
Publié le : 2000-05-01
Classification:  Data mining,  knowledge discovery,  large data sets,  computers,  databases
@article{1009212753,
     author = {Hand, David J. and Blunt, Gordon and Kelly, Mark G. and Adams, Niall M.},
     title = {Data Mining for Fun and Profit},
     journal = {Statist. Sci.},
     volume = {15},
     number = {1},
     year = {2000},
     pages = { 111-131},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1009212753}
}
Hand, David J.; Blunt, Gordon; Kelly, Mark G.; Adams, Niall M. Data Mining for Fun and Profit. Statist. Sci., Tome 15 (2000) no. 1, pp.  111-131. http://gdmltest.u-ga.fr/item/1009212753/