期刊文献+

基于风险决策的文本特征选择方法

Text Feature Selection Approach Based on Venture Decision
下载PDF
导出
摘要 在中文文本分类中,特征词的选择会严重影响文本分类的准确率。针对这一问题,提出了基于风险决策的文本特征选择方法,通过构造效用函数来评价文本中每个特征词对分类结果的效用值,再采用风险决策方法计算出每个特征词的损失期望,最终选择部分损失期望小的特征词以达到降维的目的。将该方法应用于中文垃圾邮件过滤与网页分类中,实验结果表明,该方法可以选取出对分类结果影响更大的特征词,使文本分类的各项指标明显提高。 The selection of feature words would severely affect the accuracy of text categorization. In view of this situation, this paper proposes a novel text feature selection approach based on dynamic venture decision. This approach uses utility function to evaluate the utility value of each feature word in text categorization, then uses venture decision method to work out the loss of each feature word, finally selects some feature words with lower losses for reducing dimensions. The proposed approach is applied to the spam filtering and Web category in Chinese. The experimental results on several benchmark datasets show that the proposed feature selection approach can select those feature words which will influence the classification results greatly. In so doing, the accuracy of text classification can be improved significantly.
出处 《计算机科学与探索》 CSCD 2013年第10期933-941,共9页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金Nos.60975035 61273291 山西省回国留学人员科研基金No.2012008~~
关键词 文本分类 特征选择 风险决策 text categorization feature selection venture decision
  • 相关文献

参考文献3

二级参考文献28

  • 1尚文倩,黄厚宽,刘玉玲,林永民,瞿有利,董红斌.文本分类中基于基尼指数的特征选择算法研究[J].计算机研究与发展,2006,43(10):1688-1694. 被引量:38
  • 2Kaplan S. How antispam software works. Wired M-agazine, 2003, 11(4):43
  • 3Vaughan-Nichols S J. Saving private e-mail. IEEE Spectrum Magazine,2003,40(8): 40-44
  • 4William,Cohen W. Learning Rules that Classify E-mail. In:Proceedings of the 1996 AAAI Spring Sympo-sium in Information Access, 1996
  • 5Freund Y, SchapireR E. Game Theory, Ol-line Prediction, and Boosting. In: Proceedings of the Ni-nth Annual Conference on Computational Learning Theory, 1996
  • 6William,Cohen W. Fast effective rule induction. M-achine Learning. In: Proceeding of the 12th Int. Conf. , 1995
  • 7Sahami M,Dumais S,Heckerman D. A Bayesian Approach to Filtering Junk E-Mail. AAAI'98 Workshop on Learning for Text- Categorization,Madison, 1998
  • 8Crist-ianini N,Shawe-Talor J.支持向量机导论.李国正,王猛,曾华军.电子工业出版社,2004
  • 9Vapnik V, Golowich S, Smola A. Support vector me-thod for function approximation, regression estimat-imation, and signal processing. In: M. Mozer, M. Jordan, T. Petsche, eds. Advanced in Neural Information Pro-cessing Systems 9. Cambridge, MA: MIT Press, 1997
  • 10Joachims T. Text Categorization with Suppo-rt Vector Machine: Learning with Many Relevant F-eatures. In: European Conference on MachineLearning, 1998

共引文献39

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部