期刊文献+

面向文本分类的混合特征降维策略

Mixed feature dimension reduction strategy for text categorization
下载PDF
导出
摘要 特征降维一直是文本分类的重要研究内容,针对现有特征选择方法中普遍存在误删除强区分类别能力特征而保留弱区分类别能力特征的现象,提出了一种有效的特征降维策略,该方法首先对特征进行了定义和量化,通过建立单源特征保留集,删除所有类中的公共特征,再对多源特征权值进行调整,从而达到特征削减和提高分类性能的目的。在Reuters-21578,NewsGroup语料集上进行的实验对比中表明,新的降维策略是有效可行的。 Feature dimensionality reduction has been an important research on text classification. An effective way to achieve feature dimensionality reduction is to design efficient feature selection methods. Based on the existing feature selection methods, in which the phenomenon of removing the strong features of distinction between the catego- ries ability and keeping the weak ones exists, the paper presents an efficient feature reduction algorithm, which firstly defines and quantifies features to establish the unisource feature retained set and forcibly removes the common features in all classes, and then adjusts the weights of the multi - source feature so as to achieve the target of feature reduction and improve the classification performance. Finally, a comparative analysis experiment is conducted in the Reuters - -21 578, NewsGroups corpus. The experimental result indicates that the algorithm is effective and feasible.
作者 王东
出处 《贵州师范学院学报》 2012年第6期6-10,共5页 Journal of Guizhou Education University
关键词 文本分类 单源特征 多源特征 特征降维 特征选择 text categorization uniseurce feature multisource feature feature dimensionality reduction feature selection
  • 相关文献

参考文献3

二级参考文献42

共引文献492

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部