期刊文献+

一种改进的类别区分词特征选择算法 被引量:3

An Improved Feature Selection Algorithm Based on Category Distinguished Words
下载PDF
导出
摘要 传统类别区分词特征选择算法以类间分散度和类内重要度作为度量指标,忽略了2个指标对特征评分函数的贡献权重往往不同这一事实,从而在一定程度上影响了特征选择效果。在类别区分词特征选择算法基础上,引入平衡因子,通过调节平衡因子来调整2个指标对特征评价函数的贡献权重,完成更加高效的特征选择,进而达到更好的文本分类效果。使用朴素贝叶斯算法进行文本分类,相比主流特征选择算法,改进算法在分类准确率、查准率、查全率和F1指标上都取得了可观的性能提升。 The traditional category distinguished words(CDW) feature selection algorithm, which takes inter-class dispersion degree and intra-class importance degree as comprehensive metrics, ignores the fact that contribution weights of the two indicators to feature scoring function are often different, and thus affects feature selection efficiency to some extent. A CDW feature selection algorithm combining with balance factor(ICDW) is proposed. During feature selection, the contribution weights of two indicators to feature scoring function are adjusted by continuously adjusting the value of the balance factor to complete more efficient feature selection. Using Na?ve Bayes classification algorithm for text categorization, experiments show that classification performance of ICDW algorithm not only outperforms that of CDW algorithm, but also exceeds that of ECE, IG and CHI, which are commonly used for feature selection.
作者 李富星 蒙祖强 LI Fu-xing;MENG Zu-qiang(School of Computer and Electronic Information,Guangxi University,Nanning 530004,China)
出处 《计算机与现代化》 2019年第3期73-77,共5页 Computer and Modernization
基金 广西自然科学基金资助项目(2015GXNSFAA139292)
关键词 文本分类 特征选择 平衡因子 类别区分词 text categorization feature selection balance factor category distinguished words
  • 相关文献

参考文献9

二级参考文献78

共引文献331

同被引文献26

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部