期刊文献+

基于改进遗传算法的支持向量机微信垃圾文章识别 被引量:2

Recognition of Spam in Wechat Based on the Support Vector Machine with Improving Genetic Algorithm
下载PDF
导出
摘要 近几年,随着微信的快速发展和普及,微信已经成为智能移动设备必备的应用之一,但与之同时也出现了大量微信诈骗信息、垃圾广告等,给人们带来了极大的困扰。本文将从搜狗微信搜索中抽取微信文章样本,将微信垃圾文章识别看做文本分类问题,采用支持向量机对样本进行分类模型的训练,并应用改进的遗传算法对支持向量机的参数进行优化。文中详细的介绍了改进遗传算法在支持向量机上的应用,相比传统的支持向量机,采用改进遗传算法对支持向量机参数进行优化,提升了模型准确率和优化效率。在文章的最后进行了由15000篇微信文章所形成的测试集上的分类模型效果实验,实现结果表明,本方法能够达到94.7%的准确率,非常准确的识别微信垃圾文章。 In recent years, along with the rapid development and popularization of Wechat, it becomes one of the essential applications on smart mobile device. Meanwhile, it brings tremendous troubles that a large number of swindling messages and rubbish ads on Weehat appeared. Extracting Wechat articles from Sogou & Wechat search as samples, this paper regards the recognition of spam in Weehat as a question of text classification, uses the support vector machine to do the disaggregated model training of samples, and applies the improving genetic algorithm to optimize parameters on support vector machine. The author introduces particularly the application of improving genetic algorithm on the support vector machine. Comparing to traditional support vector machine, support vector machine with improving genetic algorithm could improve the accuracy rate of model and its optimization efficiency. Finally, this paper conducts the classification model experiment of which test set is constituted of 15000 articles on Wechat. The result shows accuracy rate of this method could reach to 94. 7% which is accurate extremely to recognize spam articles on Wechat.
作者 梁阔洋
出处 《计算技术与自动化》 2015年第4期137-141,共5页 Computing Technology and Automation
关键词 支持向量机 遗传算法 特征选择 参数优化 垃圾文章 support vector machine genetic algorithm feature selection parameter optimization spare
  • 相关文献

参考文献8

  • 1ANDROUTSPOULOS I, PALIOURAS G, KARKALETSIS V,et al. Learning to filter spam e-mail: A Comparison of a Naive Bayesian and a Memory Based Approaeh[C]. Proceed- ings of the workshop on machine learning and textual infor- mation access, 4th European conference on principles and practice of knowledge discovery in databases. Lyon, France: Esn. I. 2000:1--13.
  • 2ANDROUTSOPOULOS I, KOUTSIAS J, CHANDRINOS K, et al. An evaluation of nave Bayesian anti-spam filtering [C]. Proceedings of the llth European conference on ma- chine learning. Barcelona, Spain:[sn. ]. 2000:9 17.
  • 3CARRERAS X,MARQUEZ L. Boosting trees for anti spare email filtering [C]. The Forth International Conference on Recent Advances in Natural Language Processing. Bulgaria: Tzigov Chark. 2001 : 58-- 64.
  • 4CORTES C,VAPNIK V. Support vector networks[J]. Ma- chine Learning. 1995,20(1) :273 - 329.
  • 5KUBAT T M,MATWIN S. Addressing the Curse of Imbal- anced Training Sets: One-Side Selection[C]. Proceedings of the 14th International Conference on Machine Learning. USA~ Nashville. 1997:217--225.
  • 6李人厚.智能控制理论和方法[M].陕西:西安电子科技大学出版社,2005.
  • 7施聪莺,徐朝军,杨晓江.TFIDF算法研究综述[J].计算机应用,2009,29(B06):167-170. 被引量:218
  • 8DASH M, LIU H. Feature Selection for Classification[J]. Intelligent Data Analysis, 1997, 1(3):131-156.

二级参考文献12

共引文献218

同被引文献16

引证文献2

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部