期刊文献+

基于信息增益的文本特征选择方法 被引量:1

Based on the Inllormation Gain Text Feature Selection Method
下载PDF
导出
摘要 在类和特征分布不均时,传统信息增益算法的分类性能急剧下降。针对此问题,提出一种改进的基于信息增益的文本特征选择方法。首先,降低了低频词对特征选择的影响。其次,使用离散度分析特征词在类间的文档频率,增加波动性大的特征词的权值。通过对比实验分析表明,选取的特征具有更好的分类性能,并且对于不平衡数据集表现也较好。 Due to the highly skewed distributions of classes and features, the classification accuracy of algorithms Based on tradi- tional information gain algorithm will decline sharply. This paper proposes a new feature selection method to improve the perfor- mance of traditional information gain method. Firstly, the proposed new feature selection method can decrease the interference of low frequency Words to feature selection. Secondly, it analyses the variances of inter-class document frequencies of feature Word that have large variances of inter-class document frequency. Because the feature Word have large variances is more representa- tive than other features when the distributions of classed and features are highly skewed. The comparison experiment on some re- al data sets shows that the proposed method is more effective and has better classification performance in imbalanced data set as compared with the traditional information gain method.
作者 王理冬 WANG Li-dong (Anhui Institute of Electronic Products Supervision and Inspection, Hefei 230061,China)
出处 《电脑知识与技术》 2017年第9期242-244,254,共4页 Computer Knowledge and Technology
关键词 文本分类 信息增益 特征选择 不平衡数据集 离散度分析 Text Classification Information Gain Feature Selection Imbalanced Data Set Dispersion Analyse
  • 相关文献

参考文献11

二级参考文献91

共引文献281

同被引文献18

引证文献1

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部