摘要
垃圾邮件泛滥已成为网络时代的一个重要问题,随着垃圾邮件的伪装技术的不断更新,以前主要的几种垃圾邮过滤技术面临着新的挑战。文中提出一种新的基于判别模型的垃圾邮件过滤方法,邮件分类器通过不断的学习来更新特征项的权重,当新的信息到达时,计算所有特征项的权重之和,并将其转化为一个概率值,如果此概率值超过某一阈值时,就认定此信息为垃圾邮件;同时将此方法应用到实时邮件处理环境中。实验结果表明,此方法能明显地提高准确度,有效地降低误判率。
Spam e-mail is increasingly becoming a great problem in the Internet age.As the latest generation of spam incorporates sophisticated tactics,previous spam filtering technologies face a new challenge.Proposed a novel online spam filter based on discriminative model.Spam classifier updates the weights of features by continual learning.When a new message arrives,compute the sum of all weights and convert it to a probability.If that probability is over some threshold,predict that the message is spam,then applied the technique to online processing environment. Experimental results demonstrate that it can significantly raise the filtering accuracy,effectively reduce false positire.
出处
《计算机技术与发展》
2010年第1期181-184,共4页
Computer Technology and Development
基金
山东省自然科学基金(Q2006G03)
山东省科技攻关项目(2009GG10001008)
山东省软科学研究计划项目(2009RKA285)
关键词
互信息
判别模型
垃圾邮件过滤
梯度下降法
mutual information
discriminative model
spam filter
gradient descent