摘要
通过分析特征词与类别间的相关性,在原有的卡方特征选择的方法上增加三个调节参数,使选出的特征词集中分布在某一类,且在某一类中尽可能地均匀分布,并使特征词在某一类中出现的次数尽可能地多。通过实验对比改进前后的卡方特征选择方法,基于方差的卡方统计(Var-CHI)方法使得查全率和查准率都得到了明显的提高。
In order to make the features selected are distributed intensively in a certain class,evenly in that certain class as much as possible,and make features appear in that certain class as many as possible,this paper added three adjusted parameters to the originally traditional CHI-square feature selection method through analyzing the relevance between features and classes.Var-CHI statistic method based on variance makes the precision and recall improved apparently by comparing the experiments of the traditional CHI-square feature selection method and the improved one.
出处
《计算机应用研究》
CSCD
北大核心
2012年第4期1304-1306,共3页
Application Research of Computers
基金
国家自然科学基金资助项目(70971059)
辽宁省创新团队资助项目(2009T045)
关键词
文本分类
特征选择
卡方统计量
方差
text classification
feature selection
CHI-square statistic
variance