Effect of choice of dissimilarity measure on classification efficiency with nearest neighbor method

Tomasz Górecki

Tomasz Górecki

Discussiones Mathematicae Probability and Statistics, Tome 25 (2005), p. 217-239 / Harvested from The Polish Digital Mathematics Library

Access to full text
Full (PDF)

Résumé

In this paper we will precisely analyze the nearest neighbor method for different dissimilarity measures, classical and weighed, for which methods of distinguishing were worked out. We will propose looking for weights in the space of discriminant coordinates. Experimental results based on a number of real data sets are presented and analyzed to illustrate the benefits of the proposed methods. As classical dissimilarity measures we will use the Euclidean metric, Manhattan and post office metric. We gave the first two metrics weights and now these measures are not metrics because the triangle inequality does not hold. Howeover, it does not make them useless for the nearest neighbor classification method. Additionally, we will analyze different methods of tie-breaking.

Publié le : 2005-01-01

Zbl 1102.62066

EUDML-ID : urn:eudml:doc:287745

@article{bwmeta1.element.bwnjournal-article-doi-10_7151_dmps_1070,
     author = {Tomasz G\'orecki},
     title = {Effect of choice of dissimilarity measure on classification efficiency with nearest neighbor method},
     journal = {Discussiones Mathematicae Probability and Statistics},
     volume = {25},
     year = {2005},
     pages = {217-239},
     zbl = {1102.62066},
     language = {en},
     url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-doi-10_7151_dmps_1070}
}

Tomasz Górecki. Effect of choice of dissimilarity measure on classification efficiency with nearest neighbor method. Discussiones Mathematicae Probability and Statistics, Tome 25 (2005) pp. 217-239. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-doi-10_7151_dmps_1070/

Bibliographie

[000] [1] C. Blake, E. Keogh and C. Merz, UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/ mlearn/MLRepository.html, Univeristy of California, Irvine, Department of Information and Computer Sciences.

[001] [2] T. Cover and P. Hart, Nearest neighbor pattern classification, IEEE Trans. Information Theory 13 (1) (1967), 21-27. | Zbl 0154.44505

[002] [3] L. Devroye, L. Gy[o]rfi and G. Lugosi, Probabilistic Theory of Pattern Recognition, Springer New York 1996.

[003] [4] R. Gnanadeskian, Methods for Statistical Data Analysis of Multivariate Observations, John Wiley & Sons London Second, New York 1997.

[004] [5] R.A. Johnson and D.W. Wichern, Applied Multivariate Statistical Analysis, Prentice-Hall, New Jersey 1982. | Zbl 0499.62002

[005] [6] W.J. Krzanowski and F.H.C. Marriott, Multivariate Analysis, 1 Distributions, Ordination and Inference, Edward Arnold London 1994. | Zbl 0855.62036

[006] [7] W.J. Krzanowski and F.H.C. Marriott, Multivariate Analysis, 2 Classification, Covariance Structures and Repeated Measurements, London 1995. | Zbl 0949.62537

[007] [8] D.F. Morrison, Multivariate statistical analysis, PWN, Warszawa 1990.

[008] [9] R. Paredes and E. Vidal, A class-dependent weighted dissimilarity measure for nearest neighbor classification problems, Pattern Recognition Letters 21 (2000), 1027-1036. | Zbl 0967.68143

[009] [10] G.A.F. Seber, Multivariate Observations, John Wiley & Sons, New York 1984. | Zbl 0627.62052