摘要
在各种基于机器学习的垃圾邮件过滤系统中,特征选择是基础且非常关键的一个环节,它对整个系统的性能和效率都有直接的影响。通过对垃圾邮件特点的分析,提出了一种基于贝叶斯推理的特征选择评估函数方法。新方法运算开销较小,且能够区分出不同的特征词在体现垃圾邮件特征时所存在的差异性,因而在进行特征选择时较其它常用方法更具针对性,非常利于提高过滤系统的准确性和运行效率。
FS(Feature Selection) is a basal but crucial step within anti-spam classifiers based on ML(Machine Learning) algorithms.Nowadays FS based on Mutual Information(MI) is widely used.In this paper,by analyzing characteristic of spam emails,a new FS approach based on Bayes reasoning is presented.Experiments show that it can achieve much higher performance and efficiency than MI approach.
出处
《计算机工程与应用》
CSCD
北大核心
2008年第33期105-107,137,共4页
Computer Engineering and Applications