摘要
针对文本分类中的特征选择问题,提出了一种考虑特征之间交互作用的文本分类特征选择算法——MaxInteraction。首先,通过联合互信息(JMI),建立基于信息论的文本分类特征选择模型;其次,放松现有特征选择算法的假设条件,将特征选择问题转化为交互作用优化问题;再次,通过最大最小法避免过高估计高阶交互作用;最后,提出一个基于前向搜索和高阶交互作用的文本分类特征选择算法。实验结果表明,Max-Interaction比交互作用权重特征选择(IWFS)的平均分类精度提升了5.5%,Max-Interaction比卡方统计法(Chi-square)的平均分类精度提升了6%,MaxInteraction在93%的实验中分类精度高于对比方法,因此,Max-Interaction能有效利用交互作用提升文本分类特征选择的性能。
Focusing on the issue of feature selection in text categorization, an interaction maximum feature selection algorithm, called Max-Interaction, was proposed. Firstly, an information theoretic feature selection model was established based on Joint Mutual Information (JMI). Secondly, the assumptions of the existing feature selection algorithms were relaxed, and the feature selection problem was transformed into an interaction optimization problem. Thirdly, the maximum of the minimum method was employed to avoid the overestimation of higher-order interaction. Finally, a text categorization feature selection algorithm based on sequential forward search and high-order interaction was proposed. In the comparison experiments, the average classification accuracy of Max-Interaction over Interaction Weight Feature Selection (IWFS) was improved by 5.5%; the average classification accuracy of Max-Interaction over Chi-square was improved by 6%; and Max-Interaction outperformed other methods on 93% of the experiments. Therefore, Max-Interaction can effectively improve the performance of feature selection in text categorization.
作者
唐小川
邱曦伟
罗亮
TANG Xiaochuan;QIU Xiwei;LUO Liang(School of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu Sichuan 611731,China)
出处
《计算机应用》
CSCD
北大核心
2018年第7期1857-1861,共5页
journal of Computer Applications
基金
国家自然科学基金资助项目(61602094)~~
关键词
特征选择
文本分类
交互作用
互信息
信息测度
feature selection text
Categorization interaction Mutual Information (MI)
information measure