摘要
探讨了基于概率阈值的贝叶斯邮件过滤模型的局限性:由于很少考虑所设定阈值的适用性和实用性,损失了一定的召回率。改进贝叶斯决策,提出了基于随机变量的较小错误分类决策方法;针对邮件处理的特殊性,进一步提出了基于随机变量的较小风险分类决策方法。实验结果表明,处理普通文本分类问题时,前者的分类决策效果更好;而后者在处理邮件问题时性能更优,能够在保持较小误判风险的同时,提高贝叶斯邮件过滤器的召回率以及F值。
This paper confers in depth to the limitations of the traditional Bayesian anti-spam mechanism. It seldom thinks about whether the threshold is suitable or not, so the recalling is reduced. Aiming at this question, the paper proposes a lower-error policy decision based on chance variable; and considering the particularity of email classification, a lower-risk policy decision based on chance variable is proposed. The experimental results show that the former one maybe a better way to classify the common text; and the latter one makes better performance on recalling and F value when dealing with emails, at the same time it keeps a lower risk of error judging.
出处
《计算机工程与应用》
CSCD
2013年第7期98-101,125,共5页
Computer Engineering and Applications
关键词
垃圾邮件
邮件过滤
概率
阈值
分类决策
spam email
email filter
probability
threshold
classify decision