期刊文献+

文本分类特征权重改进算法 被引量:26

Improved Feature Weighting Algorithm for Text Categorization
下载PDF
导出
摘要 TF-IDF是一种在文本分类领域获得广泛应用的特征词权重算法,着重考虑了词频与逆文档频等因素,但无法把握特征词在类间与类内的分布情况。为提高在同类中频繁出现、类内均匀分布的具有代表性的特征词权重,引入特征词分布集中度系数改进IDF函数、用分散度系数进行加权,提出TF-IIDF-DIC权重函数。实验结果表明,基于TF-IIDF-DIC权重算法的K-NN文本分类宏平均F1值比TF-IDF算法提高了6.79%。 TF-IDF as one of feature weighting schemes in Vector Space Model(VSM) is widely used and makes good results in the realm of text categorization.Although traditional algorithms consider about term frequency and inverse document frequency,Term Frequency/Inverse Document Frequency(TF-IDF) is oblivious to the term distribution information among and inside class.A new feature weighting algorithm based on the improved IDF and distribution coefficient is put forward to enhance the feature weighting of high frequency and homogeneous distribution in the same class.Experimental results show that compared with the conventional TF-IDF algorithm,f1 based on TF-IIDF-DIC raises by 6.79%.
作者 台德艺 王俊
出处 《计算机工程》 CAS CSCD 北大核心 2010年第9期197-199,202,共4页 Computer Engineering
基金 安徽省高校省级自然科学基金资助项目(KJ2008B120)
关键词 向量空间模型 文本分类 特征权重 特征分布 Vector Space Model(VSM) text categorization feature weighting feature distribution
  • 相关文献

参考文献10

  • 1Sebastiani F.Machine Learning in Automated Text Categoriza-tion[J].ACM Computing Surveys,2002,34(1):1-47.
  • 2鲁松,李晓黎,白硕,王实.文档中词语权重计算方法的改进[J].中文信息学报,2000,14(6):8-13. 被引量:120
  • 3唐焕玲,孙建涛,陆玉昌.文本分类中结合评估函数的TEF-WA权值调整技术[J].计算机研究与发展,2005,42(1):47-53. 被引量:26
  • 4Shankar S,Karypis G.A Feature Weight Adjustment Algorithm for Document Categorization[C]//Proc.of KDD'00.New York,USA:ACM Press,2000.
  • 5陆玉昌,鲁明羽,李凡,周立柱.向量空间法中单词权重函数的分析和构造[J].计算机研究与发展,2002,39(10):1205-1210. 被引量:126
  • 6Forman G.BNS Feature Scaling:An Improved Representation over TF-IDF for SVM Text Classification[C]//Proc.of the 12th ACM Conference on Information and Knowledge Management.Napa Valley,CA,USA:ACM Press,2008:26-30.
  • 7Zhang Yuntao,Gong Ling,Wang Yongcheng.An Improved TF-IDF Approach for Text Classification[J].Journal of Zhejiang University,2005,6A(1):49-55.
  • 8Rocchio J.The SMART Retrieval System:Experiments in Automatic Document Processing[M].Englewood Cliffs,USA:Prentice-Hall,1971.
  • 9Salton G,Buckley C.Term Weighting Approaches in Automatic Text Retrieval[J].Information Processing and Management,1988,24(5):513-523.
  • 10Salton G.Developments in Automatic Text Retrieval[J].Science,1991,253(5023):974-979.

二级参考文献2

共引文献254

同被引文献217

引证文献26

二级引证文献195

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部