期刊文献+

一种面向不均衡数据集的CHI特征选择改进算法

An improved CHI feature selection algorithm for unbalanced data sets
下载PDF
导出
摘要 在文本分类中,不均衡数据集广泛存在.本文从特征选择优化方面出发,分析了特征项在类内和类间的分布情况以及不均衡数据集下文档的差异性对CHI特征选择影响,引入了类内词频概率因子、类间文档概率集中因子和类内均匀因子对传统卡方统计模型进行改进,提出了一种改进的CHI特征选择方法.实验结果表明,与改进前的方法相比,该方法在不均衡数据集上具有更好的分类效果. In text classification,unbalanced data sets exist widely.From the aspect of feature selection optimization,this paper analyzes the distribution of feature items within and between classes and the influence of document differences under unbalanced data sets on CHI feature selection,introduces the probability factor of word frequency within classes,the probability concentration factor of document between classes and the uniformity factor within classes to improve the traditional CHI square statistical model,and proposes an improved CHI feature selection method.The experimental results show that compared with the improved method,this method has better classification effect on the unbalanced data set.
作者 骆魁永 LUO Kuiyong(School of Information Engineering,Xinyang Agriculture and Forestry University,Xinyang 464000,China)
出处 《商丘师范学院学报》 CAS 2021年第6期9-13,共5页 Journal of Shangqiu Normal University
基金 校级青年基金资助项目(20200115)
关键词 不均衡数据集 CHI 特征选择 unbalanced data set CHI feature selection
  • 相关文献

参考文献7

二级参考文献85

共引文献376

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部