The use of information and information gain in the analysis of attribute dependencies
Krzysztof Moliński ; Anita Dobek ; Kamila Tomaszyk
Biometrical Letters, Tome 49 (2012), p. 149-158 / Harvested from The Polish Digital Mathematics Library

This paper demonstrates the possible conclusions which can be drawn from an analysis of entropy and information. Because of its universality, entropy can be widely used in different subjects, especially in biomedicine. Based on simulated data the similarities and differences between the grouping of attributes and testing of their independencies are shown. It follows that a complete exploration of data sets requires both of these elements. A new concept introduced in this paper is that of normed information gain, allowing the use of any logarithm in the definition of entropy.

Publié le : 2012-01-01
EUDML-ID : urn:eudml:doc:268905
@article{bwmeta1.element.doi-10_2478_bile-2013-0011,
     author = {Krzysztof Moli\'nski and Anita Dobek and Kamila Tomaszyk},
     title = {The use of information and information gain in the analysis of attribute dependencies},
     journal = {Biometrical Letters},
     volume = {49},
     year = {2012},
     pages = {149-158},
     language = {en},
     url = {http://dml.mathdoc.fr/item/bwmeta1.element.doi-10_2478_bile-2013-0011}
}
Krzysztof Moliński; Anita Dobek; Kamila Tomaszyk. The use of information and information gain in the analysis of attribute dependencies. Biometrical Letters, Tome 49 (2012) pp. 149-158. http://gdmltest.u-ga.fr/item/bwmeta1.element.doi-10_2478_bile-2013-0011/

Bezzi M. (2007): Quantifying the information transmitted in a single stimulus. Biosystems, 89: 4-9.[Crossref][WoS]

Brunsell N.A. (2010): A multiscale information theory approach to assess spatial- temporal variability of daily precipitation. Journal of Hydrology 385: 165-172.[WoS]

Jakulin A. (2005). Machine learning based on attribute informations. PhD Dissertation. University of Ljubljana.

Jakulin A., Bratko I., Smrke D., Demsar J., Zupan B. (2003): Attribute interactions in medical data analysis. In: 9th Conference on Artificial Intelligence in Medicine in Europe (AIME 2003), October 18-22, (2003), Protaras, Cyprus.

Jakulin A., Bratko I. (2003): Analyzing attribute dependencies. In: 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2003), September 22-26, Cavtat, Croatia.

Jakulin A., Bratko I. (2004a): Quantifying and visualizing attribute interactions: an approach based on entropy. http://arxiv.org/abs/cs.AI/0308002v3.

Jakulin A., Bratko I. (2004b): Testing the significance of attribute interaction. Proc. 21st International Conference on Machine Learning. Banff, Canada.

Kang G., Yue W., Zhang J., Cui Y., Zuo Y., Zhang D. (2008): An entropy-based approach for testing genetic epistasis underlying complex diseases. Journal of Theoretical Biology 250: 362-374.[WoS]

Kullback S., Leibler R.A. (1951): On information and sufficiency. Annals of Mathematical Statistics 22(1): 79-86.[Crossref] | Zbl 0042.38403

Matsuda H. (2000): Physical nature of higher-order mutual information. Intrinsic correlation and frustration. Physical Review E, 62: 3096-3102.

McGill W.J. (1954): Multivariate information transmission. Psychometrika 19(2): 97-116.[Crossref] | Zbl 0058.35706

Moniz L.J., Cooch E.G., Ellner S.P., Nichols J.D., Nichols J.M. (2007): Application of information theory methods to food web reconstruction. Ecological Modeling 208: 145-158.

Moore J.H., Gilbert J.C., Tsai C.-T., Chiang F.-T., Holden T., Barney N., White B.C. (2006): A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. Journal of Theoretical Biology 241: 252-261.

Rajski C. (1961): A metric space of discrete probability distributions. Information and Control 4: 373-377. | Zbl 0103.35805

Shannon C. (1948): A mathematical theory of communication. Bell System Technical Journal 27: 379-423, 623-656. | Zbl 1154.94303

Yan Z. Wang Z., Xie H. (2008): The application of mutual information-based feature selection and fuzzy LS-SVM-based classifier in motion classification. Computer Methods and Programs in Biomedicine 90: 275-284.[WoS]