Center-based l₁-clustering method
Kristian Sabo
International Journal of Applied Mathematics and Computer Science, Tome 24 (2014), p. 151-163 / Harvested from The Polish Digital Mathematics Library

In this paper, we consider the l₁-clustering problem for a finite data-point set which should be partitioned into k disjoint nonempty subsets. In that case, the objective function does not have to be either convex or differentiable, and generally it may have many local or global minima. Therefore, it becomes a complex global optimization problem. A method of searching for a locally optimal solution is proposed in the paper, the convergence of the corresponding iterative process is proved and the corresponding algorithm is given. The method is illustrated by and compared with some other clustering methods, especially with the l₂-clustering method, which is also known in the literature as a smooth k-means method, on a few typical situations, such as the presence of outliers among the data and the clustering of incomplete data. Numerical experiments show in this case that the proposed l₁-clustering algorithm is faster and gives significantly better results than the l₂-clustering algorithm.

Publié le : 2014-01-01
EUDML-ID : urn:eudml:doc:271908
@article{bwmeta1.element.bwnjournal-article-amcv24i1p151bwm,
     author = {Kristian Sabo},
     title = {Center-based l1-clustering method},
     journal = {International Journal of Applied Mathematics and Computer Science},
     volume = {24},
     year = {2014},
     pages = {151-163},
     zbl = {1292.62097},
     language = {en},
     url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-amcv24i1p151bwm}
}
Kristian Sabo. Center-based l₁-clustering method. International Journal of Applied Mathematics and Computer Science, Tome 24 (2014) pp. 151-163. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-amcv24i1p151bwm/

[000] Angulo, J. and Serra, J. (2007). Modelling and segmentation of colour images in polar representations, Image and Vision Computing 25(4): 475-495.

[001] Äyrämö, S. (2006). Knowledge Mining Using Robust Clustering, Ph.D. thesis, University of Jyväskylä, Jyväskylä.

[002] Bagirov, A.M. and Ugon, J. (2005). An algorithm for minimizing clustering functions, Optimization 54(4-5): 351-368. | Zbl 1122.90059

[003] Bagirov, A.M., Ugon, J. and Webb, D. (2011). Fast modified global k-means algorithm for incremental cluster construction, Pattern Recognition 44(4): 886-876. | Zbl 1213.68514

[004] Bezdek, J.C. (1981). Pattern Recognition with Fuzzy Objective Function Algorithms, Kluwer Academic Publishers, Norwell, MA. | Zbl 0503.68069

[005] Boyd, D.L. and Vandenberghe, L. (2004). Convex Optimization, Cambridge University Press, Cambridge. | Zbl 1058.90049

[006] Chaovalitwongse, W.A., Butenko, S. and Pardalos, P.M., (Eds.) (2009). Clustering Challenges in Biological Networks, World Scientific, London.

[007] Choulakian, V. (2001). Robust q-mode principal component analysis in L₁, Computational Statistics & Data Analysis, 37(2): 135-150. | Zbl 1030.62050

[008] Clarke, F. H., (1990). Optimization and Nonsmooth Analysis, SIAM, Philadelphia, PA. | Zbl 0696.49002

[009] Cominetti, R. and Michelot, C. (1997). Sufficient conditions for coincidence in l₁-minisum multifacility location problems, Operations Research Letters 20(4): 179-185. | Zbl 0879.90131

[010] Cord, A., Ambroise, C. and Cocquerez, J.-P. (2006). Feature selection in robust clustering based on Laplace mixture, Pattern Recognition Letters 27(6): 627-635.

[011] Cupec, R., Grbić, R., Sabo, K. and Scitovski, R. (2009). Three points method for searching the best least absolute deviations plane, Applied Mathematics and Computation 215(3): 983-994. | Zbl 1176.65017

[012] Duda, R., Hart, P. and Stork, D. (2001). Pattern Classification, Wiley, New York, NY. | Zbl 0968.68140

[013] Finkel, D.E. and Kelley, C.T. (2006). Additive scaling and the DIRECT algorithm, Journal of Global Optimization 36(4): 597-608. | Zbl 1142.90488

[014] Floudas, C.A. and Gounaris, C.E. (2009). A review of recent advances in global optimization, Journal of Global Optimization 45(4): 3-38. | Zbl 1180.90245

[015] Frąckiewicz, M. and Palus, H. (2011). KHM clustering techique as a segmentation method for endoscopic colour images, International Journal of Applied Mathematics and Computer Science 21(1): 203-209, DOI: 10.2478/v10006-011-0015-0.

[016] Gan, G., Ma, C. and Wu, J. (2007). Data Clustering: Theory, Algorithms, and Applications, SIAM, Philadelphia, PA. | Zbl 1185.68274

[017] Grbić, R., Nyarko, E.K. and Scitovski, R. (2012). A modification of the direct method for Lipschitz global optimization for a symmetric function, Journal of Global Optimization, 57(4): 1193-1212, DOI: 10.1007/s10898-012-0020-3. | Zbl 1279.65076

[018] Grbić, R., Scitovski, K., Sabo, K. and Scitovski, R. (2013). Approximating surfaces by the moving least absolute deviations method, Applied Mathematics and Computation 219(9): 4387-4399. | Zbl 06447249

[019] Gurwitz, C. (1990). Weighted median algorithms for l₁ approximation, BIT 30(2): 301-310. | Zbl 0704.65044

[020] Hathaway, R.J. and Bezdek, J.C. (2001). Fuzzy c-means clustering of incomplete data, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 31(5): 735-744.

[021] Hubert, L. and Arabie, P. (1985). Comparing partitions, Journal of Classification 2(1): 193-218. | Zbl 0587.62128

[022] Jain, A. (2010). 50 years beyond k-means, Pattern Recognition Letters 31(8): 651-666.

[023] Jajuga, K. (1987). A clustering method based on the L₁-norm, Computational Statistics & Data Analysis 5(4): 357-371. | Zbl 0624.62058

[024] Jajuga, K. (1991). L₁-norm based fuzzy clustering, Fuzzy Sets and Systems 39(1): 43-50. | Zbl 0714.62052

[025] Iyigun, C. (2007). Probabilistic Distance Clustering, Ph.D. thesis, Graduate School, Rutgers, New Brunswick, NJ.

[026] Jones, D.R., Perttunen, C.D. and Stuckman, B.E. (1993). Lipschitzian optimization without the Lipschitz constant, Journal of Optimization Theory and Applications 79(1): 157-181. | Zbl 0796.49032

[027] Jörnsten, R. (2004). Clustering and classification based on the L₁ data depth, Journal of Multivariate Analysis 90(1): 67-89. | Zbl 1047.62064

[028] Kogan, J. (2007). Introduction to Clustering Large and High-Dimensional Data, Cambridge University Press, Cambridge. | Zbl 1183.62106

[029] Leisch, F. (2006). A toolbox for k-centroids cluster analysis, Computational Statistics & Data Analysis 51(2): 526-544. | Zbl 1157.62439

[030] Li, X. Hu, W., Wang, H. and Zhang, Z. (2010). Linear discriminant analysis using rotational invariant L₁ norm, Neurocomputing 73(13-15): 2571-2579.

[031] Scitovski, R. and Scitovski, S. (2013). A fast partitioning algorithm and its application to earthquake investigation, Computers and Geosciences 59(1): 124-131.

[032] Simiński, K. (2012). Neuro-rough-fuzzy approach for regression modelling from missing data, International Journal of Applied Mathematics and Computer Science 22(2): 461-476, DOI: 10.2478/v10006-012-0035-4. | Zbl 1283.93165

[033] Späth, H. (1976). L₁-cluster analysis, Computing 16(4): 379-387. | Zbl 0322.65008

[034] Späth, H. (1987). Using the L₁-norm within cluster analysis, in Y. Dodge (Ed.), Proceedings of the First International Conference on Statistical Data Analysis Based on the L₁-Norm and Related Methods, University of Neuchatel/Switzerland, August 31-September 04, 1987, Elsevier, Amsterdam, pp. 427-434.

[035] Malinen, M.I. and Fränti, P. (2012). Clustering by analytic functions, Information Sciences 217(1): 31-38.

[036] Meng, D., Zhao, Q and Xu, Z. (2012). Improve robustness of sparse PCA by L₁-norm maximization, Pattern Recognition 45(1): 487-497. | Zbl 1225.68202

[037] Pintér, J.D. (1996). Global Optimization in Action (Continuous and Lipschitz Optimization: Algorithms, Implementations and Applications), Kluwer Academic Publishers, Dordrecht. | Zbl 0842.90110

[038] Ruszczynski, A (2006). Nonlinear Optimization, Princeton University Press, Princeton/Oxford, NJ. | Zbl 1108.90001

[039] Sabo, K. and Scitovski, R. (2008). The best least absolute deviations line-properties and two efficient methods, ANZIAM Journal 50(2): 185-198. | Zbl 1182.65023

[040] Sabo, K., Scitovski, R. and Vazler, I. (2011). Searching for a best LAD-solution of an overdetermined system of linear equations motivated by searching for a best LAD-hyperplane on the basis of given data, Journal of Optimization Theory and Applications 149(2): 293-314. | Zbl 1219.90125

[041] Sabo, K., Scitovski, R. and Vazler, I. (2012). One-dimensional center-based l₁-clustering method, Optimization Letters 7(1): 5-22 | Zbl 1283.90034

[042] Sabo, K., Scitovski, R., Vazler, I. and Zekić-Sušac, M. (2011). Mathematical models of natural gas consumption, Energy Conversion and Management 52(3): 1721-1727.

[043] Teboulle, M. (2007). A unified continuous optimization framework for center-based clustering methods, Journal of Machine Learning Research 8(1): 65-102. | Zbl 1222.68318

[044] Vardi, Y., Zhang, C. H. (2000). The multivariate L₁-median and associated data depth, Proceedings of the National Academy of Sciences, United States of America 97(4): 1423-1426. | Zbl 1054.62067

[045] Vazler, I., Sabo, K. and Scitovski, R. (2012). Weighted median of the data in solving least absolute deviations problems, Communications in Statistics-Theory and Methods 41(8): 1455-1465. | Zbl 1319.62141

[046] Zhang, J., Peng, L., Zhao, X. and Kuruoglu E.E. (2012). Robust data clustering by learning multi-metric lq-norm distances, Expert Systems with Applications 39(1): 335-349.