期刊文献+

结合类内集中度和最小集合覆盖的特征选择

Feature selection combined category concentration with minimal set covering
下载PDF
导出
摘要 特征选择是文本分类中的核心研究课题之一。简单分析了词频和文档频,在此基础上提出了类内集中度,把集合覆盖的思想引入粗糙集并提出了一个基于最小集合覆盖的属性约简算法,把该属性约简算法同类内集中度结合起来,提出了一个新的特征选择方法。该方法利用类内集中度进行特征初选以过滤掉一些词条来降低特征空间的稀疏性,利用所提约简算法消除冗余,从而获得较具代表性的特征子集。实验结果表明此种特征选择方法效果良好。 Feature selection is one of the core research topics in text categorization.Word frequency and document frequency are analyzed simply.Category concentration based on word frequency and document frequency is presented.Set covering is in- troduced into rough sets and an attribute reduction algorithm based on minimal set covering is provided.A new feature selec- tion method combined the provided attribute reduction algorithm with the category concentration is proposed.The new method uses the category concentration to select feature and filter out some terms to reduce the sparsity of feature spaces,and then employs the proposed attribute reduction algorithm to eliminate redundancy, so that the more representative feature subset is acquired.The experimental results show that the new method is promising.
出处 《计算机工程与应用》 CSCD 北大核心 2011年第28期124-127,共4页 Computer Engineering and Applications
基金 河南省基础与前沿技术研究计划项目(No.102300410266)
关键词 特征选择 文本分类 词频 文档频 粗糙集 属性约简 feature selection text categorization word frequency document frequency rough sets attribute reduction
  • 相关文献

参考文献12

  • 1Nguyen M H, Torte F D.Optimal feature selection for support vector machines[J].Pattern Recognition,2010,43(3) : 584-591.
  • 2Liu Hua-Wen, Sun Ji-Gui, Liu Lei.Feature selection with dynamic mutual information[J].Pattem Recognition,2009,42(7) : 1330-1339.
  • 3Zhu Hao-Dong, Zhao Xiang-Hui, Zhong Yong.Feature selection method combined optimized document frequency with improved RBF network[C]//Proc of 5th International Conference, ADMA 2009, Beijing, China, 2009 : 796-803.
  • 4XU Yan.A formal study of feature selection in text categorization[J].通讯和计算机(中英文版),2009,6(4):32-41. 被引量:15
  • 5Kalousis A, Prados J, Hilario M.Stability of feature selection algorithms: a study on high-dimensional spaces[J].Knowledge and Information Systems, 2007,12 ( 1 ) : 95-116.
  • 6Destrero A, Mosci S, Mol C D.Feature selection for high- dimensional data[J].Computational Management Science,2009, 6 ( 1 ) : 25-40.
  • 7Bakus J,Kamel M S.Higher order feature selection for text classification[J].Knowledge and Information Systems, 2006, 9(4) : 468-491.
  • 8苗夺谦,王珏.粗糙集理论中概念与运算的信息表示[J].软件学报,1999,10(2):113-116. 被引量:250
  • 9陈彩云,李治国.关于属性约简和集合覆盖问题的探讨[J].计算机工程与应用,2004,40(2):44-46. 被引量:18
  • 10陈端兵,黄文奇.一种求解集合覆盖问题的启发式算法[J].计算机科学,2007,34(4):133-136. 被引量:13

二级参考文献35

共引文献298

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部