期刊文献+

基于特征辨别能力和元信息的特征选择

Feature selection based on feature distinguish ability and meta-information
下载PDF
导出
摘要 特征选择是文本分类的关键步骤之一,所选特征子集的优劣直接影响文本分类的结果。在分析词频方法和文档频方法不足的基础上提出了特征辨别能力,把元信息引入粗糙集并提出了一个基于元信息的属性约简算法,给出了一个综合性特征选择方法。该方法利用特征辨别能力进行特征初选以过滤掉一些词条来降低特征空间的稀疏性,使用所提属性约简算法消除冗余,从而获得较具代表性的特征子集。实验结果表明:所提特征选择方法在一定程度上具有一定的优势。 Feature selection is one of the key steps in text categorization, the selected feature subset directly influences results of text categorization. The feature distinguish ability based on word frequency and document frequency is presented. Meta-information is introduced into rough sets and an attribute reduction algorithm based on meta-information is provided. A comprehensive feature selection method is proposed. The comprehensive method firstly uses the feature distinguish ability to select feature and filter out some terms to reduce the sparsity of feature spaces, and then employs the provided attribute reduction algorithm to eliminate redundancy, so that the more representative feature subset is acquired. The experimental results show that the comprehensive method in a certain extent has advantages.
作者 王兴 张文鹏
出处 《计算机工程与应用》 CSCD 2012年第7期128-131,共4页 Computer Engineering and Applications
基金 河南省基础与前沿技术研究计划项目(No.112300410118)
关键词 文本分类 特征选择 元信息 粗糙集 属性约筒 text categorization feature selection meta-information rough set attribute reduction
  • 相关文献

参考文献6

二级参考文献47

共引文献36

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部