期刊文献+

一种基于朴素贝叶斯分类的特征选择方法 被引量:24

A Feature Selection Method for NB-based Classifier
下载PDF
导出
摘要 由于朴素贝叶斯文本分类中的独立假设前提,使得在特征选择步骤能否准确有效地选出能代表文本的特征显得尤为重要,而特征选择标准中的MI标准与TFIDF标准其优缺正好互补,因此在用朴素贝叶斯文本分类方法中的多项式模型实现了一个web页面分类系统———WEBCAT的基础上,提出将MI标准与TFIDF标准结合进行特征选择。实验显示:用改进的方法可以更准确地选出能代表文本的特征,文本分类结果也比单独使用TFIDF标准或单独使用MI标准进行特征选择的分类结果更加精确。 An effective feature selection is very important for an NB-based classifier which uses the conditional independence assumption. MI and TFIDF are two general feature selection methods in text categorization. Considering their each deficiency in representing documents or categories,they are combined into a two-stage selection method and applied to WEBCAT——a web page NB-based classifier using the multinomial model. Experiments show that this new method works more effectively than that only using MI or TFIDF in selecting those representative features and in categorizing.
作者 余芳 姜云飞
出处 《中山大学学报(自然科学版)》 CAS CSCD 北大核心 2004年第5期118-120,共3页 Acta Scientiarum Naturalium Universitatis Sunyatseni
基金 国家自然科学基金资助项目(60173039) 暨南大学自然科学基金资助项目
关键词 朴素贝叶斯分类 特征选择 MI标准 TFIDF标准 Naive Bayes classifier feature selection MI TFIDF
  • 相关文献

参考文献7

  • 1LEWIS D D. Representation and learning in information retrieval[D]. Maassachusetts: Graduate School of the University of Maassachusetts, 1992.
  • 2LEWIS D D, RINGUETIE M. A comparison of two learning algorithms for text categorization[ M]. Proceedings of SDAIR -94, 3rd Annual Symposium on Document Analysis and Information Retrieval , 1994: 81 - 93.
  • 3MitchellTM著 曾华军 张银奎译.机器学习[M].北京:机械工业出版社,2003..
  • 4YANG Yi-ming, PEDERSEN J O. A comparative study on feature selection in text categorization [ M ]. Proceedings of ICML- 97, 14th International Conference on Machine Learning, 1998.
  • 5SALTON G, BUCKLEY C. Weighting approaches in automatic text retrieval [ J ]. Information Processing and Management, 1988, 24(5) :513 - 523.
  • 6McCALLUM A, NIGAM K. A comparison of event models for Naive Bayes text classification [ M ]. Proceedings of AAAI 98 Workshop on Learning for Text Categorization, 1998.
  • 7CRAVENM, DiPASQUOD, FREITAGD,etal. Leaming to extract symbolic knowledge from the World Wide Web [ M ].Proceedings of the Fifteenth National Conference on Artificial Intellligence (AAAI98), 1998: 509 - 516.

共引文献45

同被引文献128

引证文献24

二级引证文献67

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部