摘要
为了减少将合法邮件误判为垃圾邮件的误报率及将垃圾邮件误判为合法邮件的漏报率的损失,首先基于现有的文本特征提取评估函数:期望交叉熵及互信息提出一种新的评估函数。利用此函数可提取到更具有代表性的邮件特征向量。在此之上提出一种基于贝叶斯公式可减少损失的垃圾邮件过滤方法。经过仿真测试后,发现基于新评估函数的新方法可有效降低误报率和漏报率。
To minimize the cost of wrong report rate that mistake the legal mails as spare and missing report rate that mistake the spam as legal mails,flrst a new evaluation function which based on existing evaluation function of text feature extraetion; expectation cross entropy and mutual information is brought forward in this paper. Using this function,we can get more representational eigenvector from email. And then this paper presents a minimizing cost anti- spare filtering algorithm based on Bayesian. After some simulation tests, it found that new algorithm based on new evaluation function can cut down wrong report rate and missing report rate efficiently.
出处
《现代电子技术》
2006年第24期55-57,共3页
Modern Electronics Technique
基金
湖北省自然科学基金(2005ABA238)资助