摘要
由于算法的简单和效果的出色,Na ve Bayes被广泛地应用到了垃圾邮件过滤当中。通过理论与实验分析发现,结构差异较大的邮件集特征分布差异也较大,这种特征分布差异影响到了Na ve Bayes算法的效果。在此基础上,论文提出了一种基于结构特征的双层过滤模型,对不同结构的邮件使用不同的Na ve Bayes分类器分开训练和学习。实验分析表明,Na ve Bayes使用该模型之后效果有明显的提高,已经与SVM非常接近。
Naǐve Bayes algorithm has been widely used in spam filtering, due to its simple mechanism and excellent performance. But when the structures are very different between two emall corpus, the feature distributions vary a lot. The diversity of the feature distributions also affects the performance of Naǐve Bayes. The problem above was analyzed, and a structure-based 2-layers nBayes filtering model was provided, which used different nBayes filter to train and classify mall of different structure, Experiments show that Naive Bayes algorithm's performance improves a lot with this model.
出处
《计算机应用》
CSCD
北大核心
2006年第1期191-194,共4页
journal of Computer Applications
基金
国家973计划资助项目(2004CB318109)