On the order equivalence relation of binary association measures

Mariusz Paradowski

International Journal of Applied Mathematics and Computer Science, Tome 25 (2015), p. 645-657 / Harvested from The Polish Digital Mathematics Library

Access to full text
Full (PDF)

Résumé

Over a century of research has resulted in a set of more than a hundred binary association measures. Many of them share similar properties. An overview of binary association measures is presented, focused on their order equivalences. Association measures are grouped according to their relations. Transformations between these measures are shown, both formally and visually. A generalization coefficient is proposed, based on joint probability and marginal probabilities. Combining association measures is one of recent trends in computer science. Measures are combined in linear and nonlinear discrimination models, automated feature selection or construction. Knowledge about their relations is particularly important to avoid problems of meaningless results, zeroed generalized variances, the curse of dimensionality, or simply to save time.

Publié le : 2015-01-01

Zbl 1322.62176

EUDML-ID : urn:eudml:doc:271770

@article{bwmeta1.element.bwnjournal-article-amcv25i3p645bwm,
     author = {Mariusz Paradowski},
     title = {On the order equivalence relation of binary association measures},
     journal = {International Journal of Applied Mathematics and Computer Science},
     volume = {25},
     year = {2015},
     pages = {645-657},
     zbl = {1322.62176},
     language = {en},
     url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-amcv25i3p645bwm}
}

Mariusz Paradowski. On the order equivalence relation of binary association measures. International Journal of Applied Mathematics and Computer Science, Tome 25 (2015) pp. 645-657. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-amcv25i3p645bwm/

Bibliographie

[000] Batagelj, V. and Bren, M. (1995). Comparing resemblance measures, Journal of Classification 12(1): 73-90. | Zbl 0833.62054

[001] Buczyński, A. (2004). Text Acquisition from the Internet for Linguistic Research, Master's thesis, Warsaw University, Warsaw, (in Polish).

[002] Chapelle, O. and Wu, M. (2010). Gradient descent optimization of smoothed information retrieval metrics, Information Retrieval 13(3): 216-235.

[003] Cheetham, A.H. and Hazel, J.E. (1969). Binary (presence-absence) similarity coefficients, Journal of Paleontology 43(5): 1130-1136.

[004] Choi, S.-S., Cha, S.-H. and Tappert, C.C. (2010). A survey of binary similarity and distance measures., Journal of Systemics, Cybernetics & Informatics 8(1): 43-48.

[005] Clarke, K.R., Somerfield, P.J. and Chapman, M.G. (2006). On resemblance measures for ecological studies, including taxonomic dissimilarities and a zero-adjusted Bray-Curtis coefficient for denuded assemblages, Journal of Experimental Marine Biology and Ecology 330(1): 55-80.

[006] Consonni, V. and Todeschini, R. (2012). New similarity coefficients for binary data, Match-Communications in Mathematical and Computer Chemistry 68(2): 581. | Zbl 1273.92080

[007] Dice, L.R. (1945). Measures of the amount of ecologic association between species, Ecology 26(3): 297-302.

[008] Duarte, J.M., Santos, J.B.d. and Melo, L.C. (1999). Comparison of similarity coefficients based on RAPD markers in the common bean, Genetics and Molecular Biology 22(3): 427-432.

[009] Friedman, J.H. (1997). On bias, variance, 0/1-loss, and the curse-of-dimensionality, Data Mining and Knowledge Discovery 1(1): 55-77.

[010] Gower, J.C. and Legendre, P. (1986). Metric and Euclidean properties of dissimilarity coefficients, Journal of Classification 3(1): 5-48. | Zbl 0592.62048

[011] Hoang, H.H., Kim, S.N. and Kan, M.-Y. (2009). A re-examination of lexical association measures, Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, Singapore, pp. 31-39.

[012] Hubalek, Z. (1982). Coefficients of association and similarity, based on binary (presence-absence) data: An evaluation, Biological Reviews 57(4): 669-689.

[013] Jaccard, P. (1912). The distribution of the flora in the alpine zone 1, New Phytologist 11(2): 37-50.

[014] Johnson, R.A. and Wichern, D.W. (2007). Applied Multivariate Statistical Analysis, 6th Edn., Pearson International Edition, Prentice Hall, Upper Saddle River, NJ. | Zbl 1269.62044

[015] Kazienko, P. (2009). Mining indirect association rules for web recommendation, International Journal of Applied Mathematics and Computer Science 19(1): 165-186, DOI: 10.2478/v10006-009-0015-5. | Zbl 1176.68208

[016] Kekäläinen, J. (2005). Binary and graded relevance in IR evaluations-comparison of the effects on ranking of IR systems, Information Processing & Management 41(5): 1019-1033.

[017] Liu, T.-Y. (2009). Learning to rank for information retrieval, Foundations and Trends in Information Retrieval 3(3): 225-331.

[018] Nieddu, L. and Rizzi, A. (2007). Proximity measures in symbolic data analysis, Statistica 63(2): 195-211. | Zbl 1099.62002

[019] Pecina, P. (2005). An extensive empirical study of collocation extraction methods, Proceedings of the Association for Computational Linguistics Student Research Workshop, Ann Arbor, MI, USA, pp. 13-18.

[020] Pecina, P. (2008). A machine learning approach to multiword expression extraction, Proceedings of the Language Resources and Evaluation Workshop Towards a Shared Task for Multiword Expressions, Marrakech, Morocco, pp. 54-61.

[021] Pecina, P. (2010). Lexical association measures and collocation extraction, Language Resources and Evaluation 44(1-2): 137-158.

[022] Pecina, P. and Schlesinger, P. (2006). Combining association measures for collocation extraction, Proceedings of the COLING/Association for Computational Linguistics on Main Conference, Sydney, Australia, pp. 651-658.

[023] Petrović, S., Šnajder, J. and Bašić, B.D. (2010). Extending lexical association measures for collocation extraction, Computer Speech & Language 24(2): 383-394.

[024] Rifqi, M., Lesot, M.-J. and Detyniecki, M. (2008). Fuzzy order-equivalence for similarity measures, Annual Meeting of the North American Fuzzy Information Processing Society, NAFIPS 2008, New York, NY, USA, pp. 1-6.

[025] Segond, M. and Borgelt, C. (2011). Item set mining based on cover similarity, in J.Z. Huang, L. Cao and J. Srivastava (Eds.), Advances in Knowledge Discovery and Data Mining, Springer, Berlin/Heidelberg, pp. 493-505.

[026] Tan, P.-N., Kumar, V. and Srivastava, J. (2004). Selecting the right objective measure for association analysis, Information Systems 29(4): 293-313.

[027] Tversky, A. (1977). Features of similarity, Psychological Review 84(4): 327.

[028] Washtell, J. and Markert, K. (2009). A comparison of windowless and window-based computational association measures as predictors of syntagmatic human associations, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, Vol. 2, pp. 628-637.

[029] Wolda, H. (1981). Similarity indices, sample size and diversity, Oecologia 50(3): 296-302.