期刊文献+

二值文本分类中基于Bayes推理的特征选择方法 被引量:10

Feature Selection Method Based on Bayes Reasoning in Two-class Text Classification
下载PDF
导出
摘要 针对二值文本分类算法中的特征选择问题,本文提出了基于贝叶斯推理的评估函数算法来替代常用的、以IG或MI为评估函数的算法;同时,提出了以评估函数值的累计贡献率表示置信度,并以此确定特征选择维度的可量化的方法。对比实验显示,本文提出的新方法具有简便易行、高效实用的优点,此算法不仅对文本分类问题,对其它各类二值分类问题中的特征选择方法研究也都具有很好的参考、借鉴价值。 Feature Selection is important for the text classification. The paper issued a new algorithm based on Bayes Reasoning to process the Feature Selection on alternative text classification. The experiments showed it had much better effect than the widely-used Mutual Information (MI) algorithm. And the paper also submitted a quantitative algorithm to decide the dimension of Feature Selection.
出处 《计算机科学》 CSCD 北大核心 2008年第7期173-176,共4页 Computer Science
关键词 特征选择 数据挖掘 贝叶斯推理 文本分类 Feature selection,Data mining,Bayes reasoning,Text classification
  • 相关文献

参考文献5

  • 1Tan Pang-Ning,Stenbach M,Kumar V.Introduction to Data Mining(M)(数据挖掘导论).范明,范宏建,译.北京:人民邮电出版社,2006(5):30-33.
  • 2和亚丽,陈立潮.Web文本挖掘中的特征选取方法研究[J].计算机工程,2005,31(5):181-182. 被引量:14
  • 3Androutsopoulos I, Koutsias J, Chandrinos K V, et al. Spyropoulos: An Evaluation of Naive Bayesian Anti-Spam Filtering [C]// Proceedings of the Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning. Barcelona, Spain, 2000 : 9 17.
  • 4Lai C C. An empirical study of three machine learning methods for spam filtering(J ), Knowledge-Based System (J) (2006), doi: 10. 1016/j. knosys. 2006.05. 016.
  • 5Zorkadis V, Karras D A, Panayotou M. Efficient information theoretic strategies for classifier combination, feature extraction and performance evaluation in improving false positives and false negatives for spam e-mail filtering [J]. Neural Networks, 2005, 18,799-807.

二级参考文献4

  • 1Yang Y, Wilbur W J. Using Corpus Statistics to Remove Redundant Words in Text Categorization. In J. Amer. Soc. Inf Sci.,1996.
  • 2Yang Y, Pedersen J O. A Comparative Study on Feature Selection in Text Categorization. KDD-2000 Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston,MA,UA, 2000.
  • 3Galavotti L, Sebastiani F, Simi M. Feature Selection and Negative Evidence in Automated Text Categorization. KDD-2000 Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston,MA, UA, 2000.
  • 4Mena J. Data Mining Your Website. America, 2000:368.

共引文献13

同被引文献117

引证文献10

二级引证文献89

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部