
贝叶斯分类算法在社交网站信息过滤中的应用分析 被引量:5

Analysis and Application of Bayes Classification Algorithm in the Social Networking Site Information Filtering
摘要 对文档进行分类并鉴别出垃圾信息是一个非常有实用价值的研究领域,越来越多的网站开始关注这种技术。采用智能算法对垃圾信息进行有效分析,寻找垃圾制作者,并通过网络日志和所发表的内容,判断哪些是广告用户和垃圾信息的发布者,并将其删除。认为对垃圾信息的甄别其实是一种把信息分成有用信息和无用信息的过程,试用贝叶斯分类算法把信息分成不同的类。针对基于规则的分类方法和通过分析广告链接网址来剔除垃圾信息的方法的缺陷,给出贝叶斯分类算法及机器训练方法,从实验结果看,本方法优于基于规则的分类法。 The classification of the document and identify the spam is a very valuable research field. More and more websites began to pay attention to this technology. This paper uses the intelligent algorithm to effectively analyze the garbage information, looking for spammers ; through web logs and the published content, determine which advertisers and garbage information promulgator, and delete it. Screening for spam is in fact a process of dividing information into useful information and useless information, the paper attempts to use Bayes classification algorithm to put information into different categories, so the information can be filtered to different classes. The main contribution of the article is aiming at the de- fects of classification based on rules and method to weed out spam through the analysis of the advertising links, and gives the Bayes classification algorithm and machine learning methods. The experiment results show that, this method is superior to the one based on classification rules.
出处 《图书情报工作》 CSSCI 北大核心 2014年第13期100-106,共7页 Library and Information Service
基金 广东省哲学社会科学基金项目"基于网络日志的用户行为分析与网站信息组织优化研究"(项目编号:GD11CTS02)研究成果之一
关键词 贝叶斯分类 社交网站 信息过滤 Bayes classifcation social networking sites information filtering
  • 相关文献


  • 1Turtle H, Croft W B. Inference networks for document retrieval [ C]//Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Dewlopment in Information Retrieval.Netherlands:ACM, 1989:1-24.
  • 2Callan J. Document filtering with inference networks [ C ]//Pro- ceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval. Netherlands: Springer, ACM, 1996 : 262 - 269.
  • 3Sahami M, Dumais S, Heckerman D, et al. A Bayesian approach to filtering junk e - mail [ C ]//Learning for Text Categorization: Papers from the 1998 Workshop. Menlo Park: the AAAI Press, 1998, 62 : 98 - 105.
  • 4Hovold J. Naive bayes spam filtering using word - position - based attributes [C]//The 2nd Conference on Email and Anti -Spam. Mountain View: Betascript Publishing,2005.
  • 5Metsis V, Androutsopoulos I, Paliouras C. Spam filtering with naive bayes - which naive bayes? [ C ]// The Third Conference on Email and Anti -Spam. Mountain View: Betascript Publishing, 2006:27 - 28.
  • 6蒋永辉.基于贝叶斯算法的垃圾短信过滤系统的设计与实现[J].电脑知识与技术,2012,8(5X):3665-3667. 被引量:1
  • 7Liu Wuying, Wang Ting. Unimodel - based multi - source portable spam filtering [ C ]//Fifth International Conference on Fuzzy Sys- tems and Knowledge Discovery. Chongqing: Chongqing University of Posts and Telecommunications, 2008:540 -544.
  • 8潘志方.基于朴素贝叶斯学习的电子商务网站客户兴趣分类的应用研究[J].计算机科学,2007,34(6):214-215. 被引量:4
  • 9Sehuhz M G, Eskin E,Zadok E,et al. Data mining methods for de- tection of new malicious executables [ C //Titsworth F M. The Pro- ceedings of 2001 IEEE Symposium on Security and Privacy. Flori- da: The Printing House, 2001 : 38 -49.
  • 10赖英旭,杨震.改进贝叶斯算法在未知恶意软件识别中的研究[J].北京工业大学学报,2011,37(5):766-772. 被引量:3


  • 1钟延辉,傅彦,陈安龙,关娜.基于抽样的垃圾短信过滤方法[J].计算机应用研究,2009,26(3):933-935. 被引量:15
  • 2张波云,殷建平,张鼎兴,嵩敬波.基于K-最近邻算法的未知病毒检测[J].计算机工程与应用,2005,41(6):7-10. 被引量:15
  • 3曹渝昆,李云峰,汪成亮,周明强.改进型模糊神经网络在顾客分类中的应用研究[J].计算机工程与应用,2006,42(19):218-221. 被引量:2
  • 4邓维维,彭宏.移动环境下的垃圾短信过滤系统的研究[J].计算机应用,2007,27(1):221-224. 被引量:14
  • 5[1]Nicholas J Belkin, W Bruce Croft. Information filtering and information retrieval: Two sides of the same coin? Communications of ACM, 1992, 35(12): 29~38
  • 6[2]Tak W Yan, Hector Garcia-Molina. SIFT-A tool for wide-area information dissemination. In: Proc of the 1995 USENIX Technical Conf. 1995. 177~186
  • 7[3]J Mostafa et al. A multilevel approach to intelligent information filtering: Model, system, and evaluation. ACM Trans on Information Systems, 1997, 15(4): 368~399
  • 8[4]Demet Aksoy et al. Research in data broadcast and dissemination. In: InfoComm '98. 1998
  • 9[5]David Heckman. A tutorial on learning with Bayesian networks. Tech Rep: MSR-TR-95-06,1995
  • 10[6]Gammerman. Probabilistic Reasoning and Bayesian Networks. Alfred Waller Limited Publisher, 1995












使用帮助 返回顶部