Ubiquitous spam messages cause a serious waste of time and resources. This paper addresses the practical spam filtering problem, and proposes a universal approach to fight with various spam messages. The proposed active multi-field learning approach is based on: 1) It is cost-sensitive to obtain a label for a real-world spam filter, which suggests an active learning idea; and 2) Different messages often have a similar multi-field text structure, which suggests a multi-field learning idea. The multi-field learning framework combines multiple results predicted from field classifiers by a novel compound weight, and each field classifier calculates the arithmetical average of multiple conditional probabilities predicted from feature strings according to a data structure of string-frequency index. Comparing the current variance of field classifying results with the historical variance, the active learner evaluates the classifying confidence and regards the more uncertain message as the more informative sample for which to request a label. The experimental results show that the proposed approach can achieve the state-of-the-art performance at greatly reduced label requirements both in email spam filtering and short text spam filtering. Our active multi-field learning performance, the standard (1-ROCA) % measurement, even exceeds the full feedback performance of some advanced individual classifying algorithm.
Publié le : 2015-02-11
Classification:  Spam filtering, active multi-field learning, email spam, short message service spam, TREC spam track,  68T50, 68Q32, 62H30, 68T30
@article{cai2819,
     author = {Wuying Liu; School of Foreign Language, Linyi University, 276005 Linyi, Shandong and Lin Wang; School of Foreign Language, Linyi University, 276005 Linyi, Shandong and Mianzhu Yi; School of Foreign Language, Linyi University, 276005 Linyi, Shandong and Nan Xie; School of Foreign Language, Linyi University, 276005 Linyi, Shandong},
     title = {Active Multi-Field Learning for Spam Filtering},
     journal = {Computing and Informatics},
     volume = {33},
     number = {3},
     year = {2015},
     language = {en},
     url = {http://dml.mathdoc.fr/item/cai2819}
}
Wuying Liu; School of Foreign Language, Linyi University, 276005 Linyi, Shandong; Lin Wang; School of Foreign Language, Linyi University, 276005 Linyi, Shandong; Mianzhu Yi; School of Foreign Language, Linyi University, 276005 Linyi, Shandong; Nan Xie; School of Foreign Language, Linyi University, 276005 Linyi, Shandong. Active Multi-Field Learning for Spam Filtering. Computing and Informatics, Tome 33 (2015) no. 3, . http://gdmltest.u-ga.fr/item/cai2819/