摘要
考虑到反垃圾邮件本身特点,借鉴文本分类中的已有技术,将其应用到垃圾邮件的屏蔽中来.因为将合法邮件判别为垃圾邮件对于邮件用户造成的损失明显大于相反的操作,所以定义了一个损失函数,将其与朴素贝叶斯算法结合,实现了基于最小损失的垃圾邮件屏蔽算法.在一个公认的垃圾数据集上的实验结果验证了引入损失函数的有效性.
Due to the characteristics of anti-spam,the technology of text categorization is introduced into anti-spam filtering.Since the cost of mistaking the legal mails as spam is obviously higher than the reverse, a cost function is defined.Compining the cost function with Nave Bayes algorithm,an anti-spam filtering algorithm based on cost minimization is presented.The results of experiments on a well-known spam collection have proved the efficiency of this method.
出处
《华中科技大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2005年第z1期352-355,共4页
Journal of Huazhong University of Science and Technology(Natural Science Edition)
关键词
垃圾邮件屏蔽
最小损失
贝叶斯分类
文本分类
anti-spam filtering
cost minimizing
bayes categorization
text categorization