摘要
为降低对合法邮件的误判,提出一种基于朴素贝叶斯和层次聚类的两阶段垃圾邮件过滤方法。该方法将邮件划分为"合法邮件"、"可疑邮件"和"垃圾邮件"3类,在第一阶段,利用朴素贝叶斯算法速度快、分类性能好的优点,对邮件进行初步分类;在第二阶段,基于垃圾邮件的发送特征,利用层次聚类算法进行相似性比较。实验表明,该方法能够显著提高垃圾邮件的查准率,降低对合法邮件的误判,更加符合实际应用需求。
To reduce misclassification rate of legitimate emails, proposed a two-stage spare email filtering method based on naive Bayes and hierarchical clustering. This method classifies emails as Legitimate, Unsure and Spare. At first stage, it classifies email as Legitimate and Unsure by using naive Bayesian classifier. At second stage, a hierarchical clustering method is used to find similar email in the pre-collected spam emails set. The experiment showed that, this method can increase the precision of spam detection, lower the misclassification of legitimate emails, which is more viable in practice.
出处
《微电子学与计算机》
CSCD
北大核心
2007年第8期1-3,7,共4页
Microelectronics & Computer
基金
国家"863"计划项目(2003AA148010)
国家火炬计划项目(2005EB011484)
关键词
朴素贝叶斯
层次聚类
垃圾邮件过滤
naive bayes
hierarchical clustering
spam email filtering