期刊文献+

基于优化的文档频和粗糙集的特征选择方法 被引量:5

Feature Selection Method Based on Optimized Document Frequency and Rough Sets
下载PDF
导出
摘要 特征选择是文本分类的一个核心研究课题.首先给出了一个基于最小词频的文档频方法,然后把粗糙集引入进来并提出了一个属性约简算法,最后把该属性约简算法同基于最小词频的文档频方法结合起来,提出了一个综合的特征选择方法.该综合方法首先使用基于最小词频的文档频方法进行特征初选以过滤掉一些词条来降低特征空间的稀疏性,然后利用所提属性约简算法消除冗余,从而获得较具代表性的特征子集. Feature selection is the core research topic in text categorization. Firstly, a document frequency method based on minimum word frequency is presented. And then, rough sets are introduced and an attribute reduction algorithm is provided. Finally, the attribute reduction algorithm is combined with the document frequency method based on minimum word frequency and a comprehensive feature selection method is proposed. The comprehensive method firstly uses the document frequency method based on minimum word frequency to select feature and filter out some terms to reduce the sparsity of feature spaces, and then employs the attribute reduction algorithm to eliminate redundancy, so that the feature subset which are more representative is acquired.
作者 朱颢东 钟勇
出处 《湖南师范大学自然科学学报》 CAS 北大核心 2009年第3期27-31,共5页 Journal of Natural Science of Hunan Normal University
基金 四川省科技计划资助项目(2008GZ0003) 四川省科技厅科技攻关资助项目(07GG006-014)
关键词 文本分类 词频 文档频 属性约简 粗糙集 text categorization minimum word frequency document frequency attribute reduction rough set
  • 相关文献

参考文献10

二级参考文献19

  • 1曾雪强,王明文,陈素芬.一种基于潜在语义结构的文本分类模型[J].华南理工大学学报(自然科学版),2004,32(z1):99-102. 被引量:27
  • 2寇莎莎,魏振军.自动文本分类中权值公式的改进[J].计算机工程与设计,2005,26(6):1616-1618. 被引量:25
  • 3S.E.Robers and S.Walker, Okapi/Keenbow at TREC8[A] .In:E.M. Voorhees and D.K.Harmann, editor, Proceedings of the Eighth Text Retrieval Conference(TREC- 8)[C] ,Gaithershurg,2000.
  • 4Yang Yiming, Pederson Jan O. A comparative study on feature selection in text categorization [A]. Proceedings of the 14th International Conference on Machine learning[C]. Bled: Morgan Kaufmann, 1997: 258-267.
  • 5Liu Tao, Liu Shengping, Chen Zheng. An evaluation on feature selection for text clustering [A]. Proceedings of the 20th International Conference on Machine learning[C]. Washington DC:2003.
  • 6Yang Yiming,Pederson J O.A Comparative Study on Feature Selection in Text Categorization [A].Proceedings of the 14th International Conference on Machine learning[C].Nashville:Morgan Kaufmann,1997:412-420.
  • 7Y.Yang.Noise reduction in a statistical approach to text categorization[A].Proceedings of the 18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR95)[C].Seattle:ACM Press,1995:256-263.
  • 8Thorsten Joachims,Text Categorization with Support Vector Machines:Learning with Many Relevant Features[A],In:European Conferrence on Machine Learning (ECML)[C].Berlin:Springer,1998,137-142.
  • 9Mlademnic,D.,Grobelnik,M.Feature Selection for unbalanced class distribution and Nave Bayees[A].Proceedings of the Sixteenth International Conference on Machine Learning[C].Bled:Morgan Kaufmann,1999:258-267.
  • 10梁久祯 兰东俊 扈旻.基于先验知识的网页特征压缩与线性分类器设计[A]..第十二届全国神经计算学术大会论文集[C].北京:人民邮电出版社,2002.494-501.

共引文献323

同被引文献54

引证文献5

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部