Mining Large Data Sets on Grids: Issues and Prospects
David Skillicorn ; Domenico Talia
Computing and Informatics, Tome 28 (2012) no. 1, p. 347-362 / Harvested from Computing and Informatics
When data mining and knowledge discovery techniques must be used to analyze large amounts of data, high-performance parallel and distributed computers can help to provide better computational performance and, as a consequence, deeper and more meaningful results. Recently grids, composed of large-scale, geographically distributed platforms working together, have emerged as effective architectures for high-performance decentralized computation. It is natural to consider grids as tools for distributed data-intensive applications such as data mining, but the underlying patterns of computation and data movement in such applications are different from those of more conventional high-performance computation. These differences require a different kind of grid, or at least a grid with significantly different emphases. This paper discusses the main issues, requirements, and design approaches for the implementation of grid-based knowledge discovery systems. Furthermore, some prospects and promising research directions in datacentric and knowledge-discovery oriented grids are outlined.
Publié le : 2012-01-26
Classification:  Grid computing; data mining; distributed knowlege discovery; datacentric models; high-performance computing; data-intensive systems
@article{cai488,
     author = {David Skillicorn and Domenico Talia},
     title = {Mining Large Data Sets on Grids: Issues and Prospects},
     journal = {Computing and Informatics},
     volume = {28},
     number = {1},
     year = {2012},
     pages = { 347-362},
     language = {en},
     url = {http://dml.mathdoc.fr/item/cai488}
}
David Skillicorn; Domenico Talia. Mining Large Data Sets on Grids: Issues and Prospects. Computing and Informatics, Tome 28 (2012) no. 1, pp.  347-362. http://gdmltest.u-ga.fr/item/cai488/