期刊文献+

基于结构特征的nBayes双层过滤模型 被引量:4

Structure-based bi-layer nBayes filtering model
下载PDF
导出
摘要 由于算法的简单和效果的出色,Na ve Bayes被广泛地应用到了垃圾邮件过滤当中。通过理论与实验分析发现,结构差异较大的邮件集特征分布差异也较大,这种特征分布差异影响到了Na ve Bayes算法的效果。在此基础上,论文提出了一种基于结构特征的双层过滤模型,对不同结构的邮件使用不同的Na ve Bayes分类器分开训练和学习。实验分析表明,Na ve Bayes使用该模型之后效果有明显的提高,已经与SVM非常接近。 Naǐve Bayes algorithm has been widely used in spam filtering, due to its simple mechanism and excellent performance. But when the structures are very different between two emall corpus, the feature distributions vary a lot. The diversity of the feature distributions also affects the performance of Naǐve Bayes. The problem above was analyzed, and a structure-based 2-layers nBayes filtering model was provided, which used different nBayes filter to train and classify mall of different structure, Experiments show that Naive Bayes algorithm's performance improves a lot with this model.
出处 《计算机应用》 CSCD 北大核心 2006年第1期191-194,共4页 journal of Computer Applications
基金 国家973计划资助项目(2004CB318109)
关键词 机器学习 朴素贝叶斯 文本分类 垃圾邮件 基于内容的过滤 machine learning naive Bayes text categorization spam content-based filtering
  • 相关文献

参考文献13

  • 1ANDROUTSOPOULOS I, KOUTSIAS J, CHANDRINOS KV, et al.An Evaluation of Naive Bayesian Anti-Spam Filtering[A]. Proc. of the Workshop on Machine Learning in the New Information Age,Ⅱth European Conference on Machine Learning ( ECML'00)[C].2000.9-17.
  • 2DRUCKER H, WU D, VAPNIK VN. Support Vector Machines for Spam Categorization[J]. IEEE Transactions on Neural Networks,1999,20(5):1048 - 1054.
  • 3FAWCETF T. "In vivo" spam filtering: A challenge problem for data mining[J]. KDD Explorations, 2003, 5(2) : 203 -231.
  • 4LEWIS DD, RINGUETIE M. Comparison of two learning algorithms for text categorization[A]. Proceedings of SDAIR[C]. 1994. 81 -93.
  • 5CRANOR LF, LAMACCHIA BA. Spam![M]. ACM Press, 1998.74 - 83.
  • 6McCALLUM A, ROSENFELD R, MITCHELL T, et al. Improving text classification by shrinkage in a hierarchy of classes[A]. Proceedings of the Fifteenth International Conference on Machine Learning[C]. 1998. 359 -367.
  • 7MITCHELL TM. Machine Learning[M]. McGrawHill, 1997.
  • 8潘文峰.[D].中国科学院计算技术研究所,2004.
  • 9RISH I, HELLERSTEIN J, JAYRAM T. An analysis of data characteristics that affect naive Bayes performance[R]. Technical Report RC21993, IBM T. J. Watson Research Center, 2001.
  • 10RISH I. An Empirical Study of the Naive Bayes Classifier[A].Proceedings of IJCAI-01 Workshop on Empirical Methods in Artificial Intelligence[C]. 2001.

共引文献1

同被引文献30

引证文献4

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部