A Global Approach to the Comparison of Clustering Results
Osvaldo Silva ; Helena Bacelar-Nicolau ; Fernando C. Nicolau
Biometrical Letters, Tome 49 (2012), p. 135-147 / Harvested from The Polish Digital Mathematics Library

The discovery of knowledge in the case of Hierarchical Cluster Analysis (HCA) depends on many factors, such as the clustering algorithms applied and the strategies developed in the initial stage of Cluster Analysis. We present a global approach for evaluating the quality of clustering results and making a comparison among different clustering algorithms using the relevant information available (e.g. the stability, isolation and homogeneity of the clusters). In addition, we present a visual method to facilitate evaluation of the quality of the partitions, allowing identification of the similarities and differences between partitions, as well as the behaviour of the elements in the partitions. We illustrate our approach using a complex and heterogeneous dataset (real horse data) taken from the literature. We apply HCA based on the generalized affinity coefficient (similarity coefficient) to the case of complex data (symbolic data), combined with 26 (classic and probabilistic) clustering algorithms. Finally, we discuss the obtained results and the contribution of this approach to gaining better knowledge of the structure of data.

Publié le : 2012-01-01
EUDML-ID : urn:eudml:doc:268732
@article{bwmeta1.element.doi-10_2478_bile-2013-0010,
     author = {Osvaldo Silva and Helena Bacelar-Nicolau and Fernando C. Nicolau},
     title = {A Global Approach to the Comparison of Clustering Results},
     journal = {Biometrical Letters},
     volume = {49},
     year = {2012},
     pages = {135-147},
     zbl = {1286.62060},
     language = {en},
     url = {http://dml.mathdoc.fr/item/bwmeta1.element.doi-10_2478_bile-2013-0010}
}
Osvaldo Silva; Helena Bacelar-Nicolau; Fernando C. Nicolau. A Global Approach to the Comparison of Clustering Results. Biometrical Letters, Tome 49 (2012) pp. 135-147. http://gdmltest.u-ga.fr/item/bwmeta1.element.doi-10_2478_bile-2013-0010/

Bacelar-Nicolau H. (1980): Contributions to the Study of Comparison Coefficients in Cluster Analysis, PhD Th. (in Portuguese), Univ. Lisbon.

Bacelar-Nicolau H. (1988): Two Probabilistic Models for Classification of Variables in Frequency Tables. In: Classification and Related Methods of Data Analysis, H.-H. Bock (ed.), North Holland: Elsevier Sciences Publishers B.V.: 181-186. | Zbl 0729.62546

Bacelar-Nicolau H. (2000): The Affinity Coefficient. In: Analysis of Symbolic Data Exploratory Methods for Extracting Statistical Information from Complex Data, H.H. Bock, E. Diday (Eds.), Springer: 160-165. | Zbl 0977.62066

Bacelar-Nicolau H., Nicolau F.C., Sousa A., Bacelar-Nicolau L. (2009): Measuring Similarity of Complex and Heterogeneous Data in Clustering of Large Data Sets. Biocybernetics and Biomedical Engineering 29(2): 9-18. | Zbl 1286.62060

Bacelar-Nicolau H., Nicolau F.C., Sousa A., Bacelar-Nicolau L. (2010): Clustering Complex Heterogeneous Data Using a Probabilistic Approach. Proceedings of Stochastic Modeling Techniques and Data Analysis International Conference (SMTDA2010), Chania Crete Greece, 8-11 June 2010 - published on the CD Proceedings of SMTDA2010 (electronic publication). | Zbl 1286.62060

Carvalho F., Souza R. (2009): Unsupervised Pattern Recognition Models for Mixed Feature-Type Symbolic Data. Pattern Recognition Letters 31(5): 430-443.[WoS]

Gordon A.D. (1999): Classification, 2nd. Chapman &Hall, London.

Lerman I.C. (1981): Classification et Analyse Ordinale des Données. Dunod, Paris, 1981. | Zbl 0485.62051

Nicolau F.C. (1983): Cluster Analysis and Distribution Function. Meth. Oper. Res. 45: 431-433.

Nicolau F.C., Bacelar-Nicolau H. (1998): Some Trends in the Classification of Variables. In: Data Science, Classification, and Related Methods, C. Hayashi, N. Ohsumi, K. Yajima, Y. Tanaka, H. H. Bock, Y. Baba (Eds.), Springer-Verlag: 89-98. | Zbl 0894.62075

Silva O., Bacelar-Nicolau H., Nicolau F.C. (2010): Global Approach for Evaluating the Quality of Clustering Results. In: Programme and Abstracts CFE 10 & ERCIM 10 (4th CSDA Intern. Conference on Computational and Financial Econometrics and 3rd Conference of the ERCIM Working Group on Computing and Statistics): 40.

Silva O. (2011): Contributions for Comparing and Evaluating Partitions in Hierarchical Cluster Analysis. PhD. Th. (in Portuguese), Azores University.