摘要
近几年,随着微信的快速发展和普及,微信已经成为智能移动设备必备的应用之一,但与之同时也出现了大量微信诈骗信息、垃圾广告等,给人们带来了极大的困扰。本文将从搜狗微信搜索中抽取微信文章样本,将微信垃圾文章识别看做文本分类问题,采用支持向量机对样本进行分类模型的训练,并应用改进的遗传算法对支持向量机的参数进行优化。文中详细的介绍了改进遗传算法在支持向量机上的应用,相比传统的支持向量机,采用改进遗传算法对支持向量机参数进行优化,提升了模型准确率和优化效率。在文章的最后进行了由15000篇微信文章所形成的测试集上的分类模型效果实验,实现结果表明,本方法能够达到94.7%的准确率,非常准确的识别微信垃圾文章。
In recent years, along with the rapid development and popularization of Wechat, it becomes one of the essential applications on smart mobile device. Meanwhile, it brings tremendous troubles that a large number of swindling messages and rubbish ads on Weehat appeared. Extracting Wechat articles from Sogou & Wechat search as samples, this paper regards the recognition of spam in Weehat as a question of text classification, uses the support vector machine to do the disaggregated model training of samples, and applies the improving genetic algorithm to optimize parameters on support vector machine. The author introduces particularly the application of improving genetic algorithm on the support vector machine. Comparing to traditional support vector machine, support vector machine with improving genetic algorithm could improve the accuracy rate of model and its optimization efficiency. Finally, this paper conducts the classification model experiment of which test set is constituted of 15000 articles on Wechat. The result shows accuracy rate of this method could reach to 94. 7% which is accurate extremely to recognize spam articles on Wechat.
出处
《计算技术与自动化》
2015年第4期137-141,共5页
Computing Technology and Automation
关键词
支持向量机
遗传算法
特征选择
参数优化
垃圾文章
support vector machine
genetic algorithm
feature selection
parameter optimization
spare