期刊文献+

文本分类算法中词语权重计算方法的改进 被引量:8

Modify the Method of Feature's Weight in Text Classfication
下载PDF
导出
摘要 在自动文本分类中,TFIDF公式是常用的词语权重计算公式。该方法简单易行,但仅仅考虑了特征词出现的频率,而忽略了特征词对区分每个类的贡献。针对这个不足,该文提出了TFIDF-CHI,来修正各个特征词的权重,重新调整每个特征词对各个类别的区分度,并用KNN分类器来验证其有效性。实验证明该方法优于原来的TFIDF算法,表明了改进的策略是可行的。 In auto text classification,TFIDF is often used when the weight of a term is calculated.The method is easy,only considers the frequency of the feature and ignores the feature's contribution to each class.Aiming at this shortage,we put forward the TFIDF-CHI and use it to modify each feature's weight,read just each feature's differentiation to each class.Then the KNN classifier is used to check its validity.The method is better than traditional TFIDF and proves that the TFIDF-CHI method is feasible.
作者 赵小华 马建芬 ZHAO Xiao-hua,MA Jian-fen(Dept.of Computer and Software College,Taiyuan University of Techonology,Taiyuan 030024,China)
出处 《电脑知识与技术》 2009年第12X期10626-10628,共3页 Computer Knowledge and Technology
关键词 文本分类 特征权值 TFIDF TFIDF-CHI text classification feature weight TFIDF TFIDF-CHI
  • 相关文献

参考文献12

二级参考文献81

共引文献321

同被引文献77

引证文献8

二级引证文献78

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部