摘要
研究垃圾邮件过滤准确率问题,电子邮件是一种高维、复杂的特殊文本,单一支持向量机、K近邻等传统模型均难以识别垃圾邮件,导致过滤正确率低。为了提高了垃圾邮件过滤正确率,提出一种K近邻和支持向量机相融合的垃圾邮件过滤模型(SVM-KNN)。首先将邮件特征向量输入到支持向量机学习,找到支持向量集,然后计算待识别邮件与最优超平面间的距离,距离大于阈值,便采用支持向量机识别邮件类型,否则用K近邻识别邮件类型。仿真结果表明,SVM-KNN很好地解决单一模型存在的难题,提高了垃圾邮件过滤正确率,是一种有效的电子邮件管理的手段。
Research on spam filtering accuracy problems. Email is a high-dimensional, complex special text, single support vector machine, K nearest neighbors and other traditional models are difficult to identify spam filter, so the accuracy is very low. In order to improve the spam filtering accuracy, this paper presented a spare filtering mod- el based on K neighbor and support vector machine (SVM-KNN). Firstly, the mail feature vectors were input to a support vector machine to learn and find support vector set, and then the distance of recognition mail and the optimal hyper plane was calculated. If distance is greater than the threshold, support vector machine was used to identify the email type, otherwise K nearest neighbor was used to identify the email type. The simulation results show that the proposed model is a good solution for single model problems and improve the spam filtering accuracy, so SVM-KNN is an effective management means.
出处
《计算机仿真》
CSCD
北大核心
2013年第5期370-373,407,共5页
Computer Simulation
关键词
电子邮件
垃圾邮件
支持向量机
近邻算法
过滤
Email
Spam
Support vector machine ( SVM )
Nearest neighbor algorithm
Filtering