摘要
通过引入类差分度,提出一种改进的互信息特征选择方法,并同时引入相对词频因子解决传统方法倾向于选择低频词的不足,合理地改善了特征选择的准确率,提高分类的精度和效率。文本分类实验结果表明,所提出方法的平均查全率和平均查准率分别提高了11.26%和8.04%,综合评价指标平均F1值提高了18.55%。
An improved mutual information feature selection method is proposed by introducing difference degree among classes.Meanwhile,relatively term frequency factor is applied to solve the traditional methods tend to choose low-frequency words.This method could improve the accuracy of feature selection,and increase the accuracy and efficiency of classification.Text classification experimental results show that the average recall rate and precision rate of the proposed method increase by 11.26% and8.04%,the average F1 increases by 18.55%.
出处
《中国科技论文》
CAS
北大核心
2015年第20期2386-2389,共4页
China Sciencepaper
关键词
计算机应用
特征选择
互信息
相对词频因子
类差分度
computer application
feature selection
mutual information
relative term frequency factor
difference factor among classes