期刊文献+

一种基于逻辑回归算法的水军识别方法 被引量:10

The Spammer Detection based on Logistic Regression
下载PDF
导出
摘要 随着诸如twitter和微博等新媒体的发展,由于网络公关与营销等原因,网络水军也出现并呈现出急剧增加的态势。造成大量的网络资源和普通用户的时间遭到侵占,同时也对舆情真实性产生了重要影响。文章建立一种基于逻辑回归算法的水军识别模型,,利用累计分布函数(CDF)对对新浪微博用户行为属性以及账号属性进行分析和选取,将合适的属性包括好友数、粉丝数、文本相似度、URL率等作为输入参数,用以训练基于逻辑回归算法的分类模型,得到相应系数,从而完成对网络水军识别模型的构建。实验结果证明了模型的准确性和有效性。 With the appearance of the new media like twit er and Weibo, the number of spammer has increased sharply, which makes the network resource and the time of non-spammer has been largely occupied. This phenomenon has also produced a huge impact on the authenticity of the network environment. In this paper, the at ributes of Sina Weibo’s user behaviors and account have been col ected and preprocessed in order to establish data set in the experiment. Analyzing the features of the CDF (cumulative distribution function), appropriate at ributes such as the number of friends, the numbers of fans, text similarity, and URL rate have been selected as input parameters for logistic regression model. Using the logistic regression model, we could get the corresponding coef icient, thus completing the construction of detection model about spammer. Experimental results could demonstrate the accuracy and feasibility of detection model.
出处 《信息安全与技术》 2015年第4期57-62,共6页
基金 国家973项目(No.2013CB329604) 国家自然科学基金项目(No.61472433)资助
关键词 TWITTER 新浪微博 CDF 逻辑回归 水军检测 twitter microblog cdf logistic regression spammer detection
  • 相关文献

参考文献12

  • 1Kyumin Lee,Brian David Eoff,James Caverlee.Seven months with the devils: Along-term study of content polluters on twitter. AAAI Int’’l Conference on Weblogsand Social Media (ICWSM) . 2011
  • 2Chu Z,Gianvecchio S,Wang H.Who is Tweeting on Twitter: Human, Bot,or Cyborg. Proceedings of the26th Annual Computer SecurityApplications Conference . 2010
  • 3Gao H,Chen Y,Lee K,et al.Towards online spam filtering in social networks. Symposium on Network and Distributed System Security (NDSS) . 2012
  • 4Le Zhang,Jingbo Zhu,Tianshun Yao.An evaluation of statistical spam filtering techniques[J].ACM Transactions on Asian Language Information Processing (TALIP).2004(4)
  • 5Yang C,Harkreader R C,Gu G.Die free or live hard? empirical evaluation and newdesign for fighting evolving twitter spammers. Recent Advances in IntrusionDetection . 2011
  • 6Fabricio Benevenuto,Tiago Rodrigues,Meeyoung Cha,Virgilio Almeida.Characterizing User Behaviorin Online Social Networks. IMC’’09 . 2009
  • 7C. Grier,K. Thomas,V. Paxson, et al.@spam: the underground on140characters or less. ACM conference on Computer and communications security . 2010
  • 8Sangho Lee,Jong Kim.WARNING BIRD:Detecting Suspicious URLs in Twitter Stream. Network & Distributed System Security (NDSS) . 2012
  • 9Song J,Lee S,Kim J.Spam filtering in twitter using senderreceiver relationship. Recent Advances in Intrusion Detection . 2011
  • 10Mc Cord M,Chuah M.Spam detection on twitter using traditional classifiers. Autonomic and Trusted Computing . 2011

二级参考文献21

  • 1李孝明,曹万华.文本信息检索的精确匹配模型[J].计算机科学,2004,31(9):100-102. 被引量:7
  • 2马克·波斯特.《第二媒介时代》,南京:南京大学出版社,2001年,第34、45、46页.
  • 3K.haled M Hammouda,Mohamed S Kamel.Efficient phrase-based document indexing for web document clustering[J].IEEE Transactions on Knowledge and Data Engineering,2004,16(10):1279- 1296.
  • 4Joshua Zhexue Huang, Michael K Ng, Hongqiang Rong, et al. Automated variable weighting in k-means type clustering [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(5):657-668.
  • 5Shehroz S Khan,Amir Ahmad.A cluster center initialization algorithm for k-means clustering[J].Pattem Recognition Letters, 2004,25(11):1293-1302.
  • 6Ramiz M Aliguliyev.Clustering of document collection- a weighting approach [J]. Expert Systems with Applications, 2009,36(4) :7904-7916.
  • 7Tapas Kanungo,David M Mount,Nathan S Net-anyahu,et al.An efficient k-means clustering algorithm [J]. Analysis and Implementation,IEEE Transactions on Pattern Analysis and Machine InteUigence,2002,24(7):881-892.
  • 8Ajith Abraham, Swagatam Das, Amit Konar. Document clustering using differential evolution[C].Vancouver, BC:IEEE Congress on Evolutionary Computation,2006:1784-1791.
  • 9Richard Nock, Frank Nielsen.On weighting clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006,28(8): 1223-1235.
  • 10Slonim N,Tishby N.Document clustering using word clusters via the information bottleneck method[C].Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM Press,2000:208-215.

共引文献45

同被引文献65

引证文献10

二级引证文献71

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部