Multi-label classification using error correcting output codes

Tomasz Kajdanowicz; Przemysław Kazienko

Tomasz Kajdanowicz ; Przemysław Kazienko

International Journal of Applied Mathematics and Computer Science, Tome 22 (2012), p. 829-840 / Harvested from The Polish Digital Mathematics Library

Access to full text
Full (PDF)

Résumé

A framework for multi-label classification extended by Error Correcting Output Codes (ECOCs) is introduced and empirically examined in the article. The solution assumes the base multi-label classifiers to be a noisy channel and applies ECOCs in order to recover the classification errors made by individual classifiers. The framework was examined through exhaustive studies over combinations of three distinct classification algorithms and four ECOC methods employed in the multi-label classification problem. The experimental results revealed that (i) the Bode-Chaudhuri-Hocquenghem (BCH) code matched with any multi-label classifier results in better classification quality; (ii) the accuracy of the binary relevance classification method strongly depends on the coding scheme; (iii) the label power-set and the RAkEL classifier consume the same time for computation irrespective of the coding utilized; (iv) in general, they are not suitable for ECOCs because they are not capable to benefit from ECOC correcting abilities; (v) the all-pairs code combined with binary relevance is not suitable for datasets with larger label sets.

Publié le : 2012-01-01

Zbl 1251.68180

EUDML-ID : urn:eudml:doc:244515

@article{bwmeta1.element.bwnjournal-article-amcv22z4p829bwm,
     author = {Tomasz Kajdanowicz and Przemys\l aw Kazienko},
     title = {Multi-label classification using error correcting output codes},
     journal = {International Journal of Applied Mathematics and Computer Science},
     volume = {22},
     year = {2012},
     pages = {829-840},
     zbl = {1251.68180},
     language = {en},
     url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-amcv22z4p829bwm}
}

Tomasz Kajdanowicz; Przemysław Kazienko. Multi-label classification using error correcting output codes. International Journal of Applied Mathematics and Computer Science, Tome 22 (2012) pp. 829-840. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-amcv22z4p829bwm/

Bibliographie

[000] Boutell, M.R., Luo, J., Shen, X. and Brown, C.M. (2004). Learning multi-label scene classification, Pattern Recognition 37(9): 1757-1771.

[001] Clare, A. and King, R.D. (2001). Knowledge discovery in multi-label phenotype data, in L.D. Raedt and A. Siebes (Eds.), PKDD: 5th European Conference on Machine Learning and Knowledge Discovery, Lecture Notes in Computer Science, Vol. 2168, Springer, Berlin/Heidelberg, pp. 42-53. | Zbl 1009.68730

[002] Crammer, K. and Singer, Y. (2003). A family of additive online algorithms for category ranking, Journal of Machine Learning Research 3: 1025-1058. | Zbl 1061.68543

[003] Dietterich, T.G. and Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes, Journal of Artificial Intelligence Research 2: 263-286. | Zbl 0900.68358

[004] Diplaris, S., Tsoumakas, G., Mitkas, P. and Vlahavas, I. (2005). Protein classification with multiple algorithms, in P. Bozanis and E.N. Houstis (Eds.), 10th Panhelllenic Conference on Informatics (PCI 2005), Lecture Notes in Computer Science, Vol. 3746, Springer-Verlag, Berlin/Heidelberg, pp. 448-456.

[005] Duan, K., Keerthi, S.S., Chu, W., Shevade, S.K. and Poo, A.N. (2003). Multi-Category Classification by Soft-Max Combination of Binary Classifiers, Lecture Notes in Computer Science, Vol. 2709, Springer, Berlin/Heidelberg. | Zbl 1040.68617

[006] Elisseeff, A. and Weston, J. (2001). A kernel method for multi-labelled classification, in T.G. Dietterich, S. Becker and Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems 14, MIT Press, Cambridge, MA, pp. 681-687.

[007] Ferng, C.-S. and Lin, H.-T. (2011). Multi-label classification with error-correcting codes, Journal of Machine Learning Research 20: 281-295.

[008] Ghamrawi, N. and McCallum, A. (2005). Collective multi-label classification, in O. Herzog, H.-J. Schek, N. Fuhr, A. Chowdhury and W. Teiken (Eds.), International Conference on Information and Knowledge Management, CIKM, ACM, New York, NY, pp. 195-200.

[009] Hong, J., Min, J., Cho, U. and Cho, S. (2008). Fingerprint classification using one-vs-all support vector machines dynamically ordered with naive Bayes classifiers, Pattern Recognition 41(2): 662-671. | Zbl 1131.68513

[010] Hullermeier, E., Furnkranz, J., Cheng, W. and Brinker, K. (2008). Label ranking by learning pairwise preferences, Artificial Intelligence 172(16-17): 1897-1916. | Zbl 1184.68403

[011] Jankowski, N. (2012). Graph-based generation of a meta-learning search space. International Journal of Applied Mathematics and Computer Science 22(3): 647-667, DOI: 10.2478/v10006-012-0049-y.

[012] Kajdanowicz, T. and Kazienko, P. (2009a). Hybrid repayment prediction for debt portfolio, in N.T. Nguyen, R. Kowalczyk and S.-M. Chen (Eds.), Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems, Lecture Notes in Artificial Intelligence, Vol. 5796, Springer, Berlin/Heidelberg, pp. 850-857.

[013] Kajdanowicz, T. and Kazienko, P. (2009b). Prediction of sequential values for debt recovery, in E. Bayro-Corrochano and J.-O. Eklundh (Eds.), Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, Lecture Notes in Computer Science, Vol. 5856, Springer, Berlin/Heidelberg, pp. 337-344.

[014] Kajdanowicz, T., Wozniak, M. and Kazienko, P. (2011). Multiple classifier method for structured output prediction based on error correcting output codes, in N. Nguyen, C.-G. Kim and A. Janiak (Eds.), Intelligent Information and Database Systems, Lecture Notes in Computer Science, Vol. 6592, Springer, Berlin/Heidelberg, pp. 333-342.

[015] Kuncheva, L.I. (2005). Using diversity measures for generating error-correcting output codes in classifier ensembles, Pattern Recognition Letters 26(1): 83-90.

[016] Kuriata, E. (2008). Creation of unequal error protection codes for two groups of symbols, International Journal of Applied Mathematics and Computer Science 18(2): 251-257, DOI: 10.2478/v10006-008-0023-x. | Zbl 1245.94111

[017] Loza Mencia, E. and Furnkranz, J. (2008). Pairwise learning of multilabel classifications with perceptrons, Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN-08), Hong Kong, China, pp. 2900-2907.

[018] Mackay, D.J.C. (2003). Information Theory, Inference, and Learning Algorithms, Cambridge University Press, Cambridge. | Zbl 1055.94001

[019] Morelos-Zaragoza, R. (2006). The Art of Error Correcting Coding, Wiley, West Sussex.

[020] Pestian, J., Brew, C., Matykiewicz, P., Hovermale, D., Johnson, N., Bretonnel Cohen, K. and Duch, W. (2007). A shared task involving multi-label classification of clinical free text, Proceedings of ACL BioNLP, Association of Computational Linguistics, Stroudsburg, PA.

[021] Read, J., Pfahringer, B., Holmes, G. and Frank, E. (2009). Classifier chains for multi-label classification, 13th European Conference on Principles and Practice of Knowledge Discovery in Databases/20th European Conference on Machine Learning, Bled, Slovenia, pp. 254-269.

[022] Read, J., Pfahringer, B., Holmes, G. and Frank, E. (2011). Classifier chains for multi-label classification, Machine Learning 85(3): 333-359.

[023] Reed, I.S. and Chen, X. (1999). Error-Control Coding for Data Networks, Kluwer Academic Publishers, Norwell, MA.

[024] Sammut, C. and Webb, G.I. (2011). Encyclopedia of Machine Learning, Springer, Berlin/Heidelberg. | Zbl 1211.68001

[025] Schapire, R.E. and Singer, Y. (2000). Boostexter: A boosting-based system for text categorization, Machine Learning 39(2/3): 135-168. | Zbl 0951.68561

[026] Trohidis, K., Tsoumakas, G., Kalliris, G. and Vlahavas, I. (2008). Multilabel classification of music into emotions, 9th International Conference on Music Information Retrieval (ISMIR 2008), Philadelphia, PA, USA, pp. 325-330.

[027] Tsoumakas, G., Katakis, I. and Vlahavas, I. (2011). Random k-labelsets for multilabel classification, IEEE Transactions on Knowledge and Data Engineering 23(7): 1079-1089.

[028] Tsoumakas, G. and Vlahavas, I. (2007). Random k-labelsets: An Ensemble Method for Multilabel Classification, Lecture Notes in Artificial Intelligence, Vol. 4701, Springer, Berlin/Heidelberg.

[029] Zhang, M.-L. and Zhou, Z.-H. (2006). Multilabel neural networks with applications to functional genomics and text categorization, IEEE Transactions on Knowledge and Data Engineering 18(10): 1338-1351.

[030] Zhang, M. and Zhou, Z. (2007). ML-KNN: A lazy learning approach to multi-label learning, Pattern Recognition 40(7): 2038-2048. | Zbl 1111.68629

[031] Zhang, Y. and Schneider, J. (2011). Multi-label output codes using canonical correlation analysis, Journal of Machine Learning Research 15: 873-882.