摘要
分析了特征选择采用互信息方法时文本分类性能较低的原因,认为与其在特征选择时倾向于选择稀有特征这一缺陷有很大关系。在此基础上,提出了一种基于分散度和平均频度的互信息特征选择方法。实验结果表明,改进后的互信息方法使文本分类性能有明显提高。
The article explains why text classification performance is low when mutual information method is adopted in feature selection,asserts that it is largely due to the flaw of selection of rare feature when making feature selections.Next a mutual information feature selection method based on distributed degree and average frequency is proposed.Experimental results show that the improved mutual information method can significantly improve the text classification performance.
出处
《计算机应用与软件》
CSCD
2011年第4期239-241,共3页
Computer Applications and Software
关键词
特征选择
互信息
文本分类
Feature selection Mutual information Text classification