In text categorization, a document is usually represented by a vector space model which can accomplish the classification task, but the model cannot deal with Chinese synonyms and polysemy phenomenon. This paper presents a novel approach which takes into account both the semantic and statistical information to improve the accuracy of text classification. The proposed approach computes semantic information based on HowNet and statistical information based on a kernel function with class-based weighting. According to our experimental results, the proposed approach could achieve state-of-the-art or competitive results as compared with traditional approaches such as the k-Nearest Neighbor (KNN), the Naive Bayes and deep learning models like convolutional networks.
Publié le : 2018-11-07
Classification:  Theoretical Foundations,  Text categorization, semantic information, statistical information, support vector machine,  68T50
@article{cai2018_4_992,
     author = {Haipeng Yao; School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing and Bo Zhang; School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing and Peiying Zhang; School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing and Maozhen Li; Department of Electronic and Computer Engineering, Brunel University London, Uxbridge, UB8 3PH},
     title = {A Novel Kernel for Text Classification Based on Semantic and Statistical Information},
     journal = {Computing and Informatics},
     volume = {36},
     number = {6},
     year = {2018},
     language = {en},
     url = {http://dml.mathdoc.fr/item/cai2018_4_992}
}
Haipeng Yao; School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing; Bo Zhang; School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing; Peiying Zhang; School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing; Maozhen Li; Department of Electronic and Computer Engineering, Brunel University London, Uxbridge, UB8 3PH. A Novel Kernel for Text Classification Based on Semantic and Statistical Information. Computing and Informatics, Tome 36 (2018) no. 6, . http://gdmltest.u-ga.fr/item/cai2018_4_992/