期刊文献+

基于交互作用的文本分类特征选择算法 被引量:2

Interaction based algorithm for feature selection in text categorization
下载PDF
导出
摘要 针对文本分类中的特征选择问题,提出了一种考虑特征之间交互作用的文本分类特征选择算法——MaxInteraction。首先,通过联合互信息(JMI),建立基于信息论的文本分类特征选择模型;其次,放松现有特征选择算法的假设条件,将特征选择问题转化为交互作用优化问题;再次,通过最大最小法避免过高估计高阶交互作用;最后,提出一个基于前向搜索和高阶交互作用的文本分类特征选择算法。实验结果表明,Max-Interaction比交互作用权重特征选择(IWFS)的平均分类精度提升了5.5%,Max-Interaction比卡方统计法(Chi-square)的平均分类精度提升了6%,MaxInteraction在93%的实验中分类精度高于对比方法,因此,Max-Interaction能有效利用交互作用提升文本分类特征选择的性能。 Focusing on the issue of feature selection in text categorization, an interaction maximum feature selection algorithm, called Max-Interaction, was proposed. Firstly, an information theoretic feature selection model was established based on Joint Mutual Information (JMI). Secondly, the assumptions of the existing feature selection algorithms were relaxed, and the feature selection problem was transformed into an interaction optimization problem. Thirdly, the maximum of the minimum method was employed to avoid the overestimation of higher-order interaction. Finally, a text categorization feature selection algorithm based on sequential forward search and high-order interaction was proposed. In the comparison experiments, the average classification accuracy of Max-Interaction over Interaction Weight Feature Selection (IWFS) was improved by 5.5%; the average classification accuracy of Max-Interaction over Chi-square was improved by 6%; and Max-Interaction outperformed other methods on 93% of the experiments. Therefore, Max-Interaction can effectively improve the performance of feature selection in text categorization.
作者 唐小川 邱曦伟 罗亮 TANG Xiaochuan;QIU Xiwei;LUO Liang(School of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu Sichuan 611731,China)
出处 《计算机应用》 CSCD 北大核心 2018年第7期1857-1861,共5页 journal of Computer Applications
基金 国家自然科学基金资助项目(61602094)~~
关键词 特征选择 文本分类 交互作用 互信息 信息测度 feature selection text Categorization interaction Mutual Information (MI) information measure
  • 相关文献

参考文献2

二级参考文献14

  • 1徐燕,李锦涛,王斌,孙春明,张森.不均衡数据集上文本分类的特征选择研究[J].计算机研究与发展,2007,44(z2):58-62. 被引量:20
  • 2寇莎莎,魏振军.自动文本分类中权值公式的改进[J].计算机工程与设计,2005,26(6):1616-1618. 被引量:25
  • 3苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:389
  • 4尚文倩,黄厚宽,刘玉玲,林永民,瞿有利,董红斌.文本分类中基于基尼指数的特征选择算法研究[J].计算机研究与发展,2006,43(10):1688-1694. 被引量:38
  • 5Dunija Mladenic,Marko Grobelnik.Feature selection on hierarchy of web documents [J] .Decision Support Systems, 2003,35:45 -87.
  • 6Zhi-hua,Zhou KaiJiang,Ming Li.Multi-instance learning based web mining[J]Applied Intelligenee, 2005,22:135-147.
  • 7林少波,杨丹.中文文本分类特征提取方法的研究与实现[D].重庆:重庆大学,2011.
  • 8FROMAN G. An extensive empirical study of feature selection met-rics for text classification [J]. Journal of Machine Learning Re-search, 2003,3(1):1289 -1305.
  • 9CALVO B, LARRARIAGA P,LOZANO J A. Feature subset selec-tion from positive and unlabelled examples [J]. Patten RecognitionLetters, 2009,30(11):1027 -1036.
  • 10JOHNSTONE I M,SILVERMAN B W. Wavelet threshold estimatorsfor data with correlated noise [ J]. Journal of Royal Statist Society,1997,59(2):319-351.

共引文献58

同被引文献14

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部