In this paper we propose very efficient itemset representation for frequent itemset mining from transactional databases. The combinatorial number system is used to uniquely represent frequent k-itemset with just one integer value, for any k ≥ 2. Experiments show that memory requirements can be reduced up to 300 %, especially for very low minimal support thresholds. Further, we exploit combinatorial number schema for representing candidate itemsets during iterative join-based approach. The novel algorithm maintains one-dimensional array rank, starting from k = 2nd iteration. At the index r of the array, the proposed algorithm stores unique integer representation of the r-th candidate in lexicographic order. The rank array provides joining of two candidate k-itemsets to be O(1) instead of O(k) operation. Additionally, the rank array provides faster determination which candidates are contained in the given transaction during the support count and test phase. Finally, we believe that itemset ranking by combinatorial number system can be effectively integrated into pattern-growth algorithms, that are state-of-the-art in frequent itemset mining, and additionally improve their performances.
Publié le : 2018-11-07
Classification:  Knowledge and Information Engineering; other areas of Computing and Informatics,  Frequent itemset mining, Apriori algorithm, combinatorial number system,  68P20; 68P30; 68T30
@article{cai2018_4_894,
     author = {Savo Tomovi\'c; University of Montenegro, Faculty of Mathematics and Natural Sciences, 81 000 Podgorica and Predrag Stani\v si\'c; University of Montenegro, Faculty of Mathematics and Natural Sciences, 81 000 Podgorica},
     title = {An Efficient Itemset Representation for Mining Frequent Patterns in Transactional Databases},
     journal = {Computing and Informatics},
     volume = {36},
     number = {6},
     year = {2018},
     language = {en},
     url = {http://dml.mathdoc.fr/item/cai2018_4_894}
}
Savo Tomović; University of Montenegro, Faculty of Mathematics and Natural Sciences, 81 000 Podgorica; Predrag Stanišić; University of Montenegro, Faculty of Mathematics and Natural Sciences, 81 000 Podgorica. An Efficient Itemset Representation for Mining Frequent Patterns in Transactional Databases. Computing and Informatics, Tome 36 (2018) no. 6, . http://gdmltest.u-ga.fr/item/cai2018_4_894/