期刊文献+

新浪微博反垃圾中特征选择的重要性分析 被引量:8

Feature importance analysis for spammer detection in Sina Weibo
下载PDF
导出
摘要 微博中的垃圾用户非常普遍,其异常行为及生产的垃圾信息显著降低了用户体验。为了提高识别准确率,已有研究或是尽可能多地定义特征,或是不断尝试提出新的分类检测方法;那么,微博反垃圾问题的突破点优先置于寻找分类特征还是改进分类检测方法,是否特征越多检测效果越好,新的方法是否可以显著提高检测效果。以新浪微博为例,试图通过不同的特征选择方法与不同的分类器组合实验回答以上问题,实验结果表明特征组的选择较分类器的改进更为重要,需从内容信息、用户行为和社会关系多侧面生成特征,且特征并非越多检测效果越好,这些结论将有助于未来微博反垃圾工作的突破。 Microblog has drawn attention of not only legitimate users but also spammers. The garbage information provided by spammers handicaps users' experience significantly. In order to improve the detection accuracy of spammers, most existing studies on spare focus on generating more classification features or putting forward new classifiers. Which kind of issues would be put the high priority of an enormous amount of research effort into? Are extensive features or novel classifiers better for the detection accuracy of spammers? It is tried to address these questions through combining different feature selection methods with different classifiers on a real Sina Weibo dataset. Experimental results show that selected features are more important than novel classifiers for spammer detection. In addition, features should be derived from a wide range, such as text contents, user behaviors, and social relationship, and the dimension of features should not be too high. These results will be useful in finding the breakpoint of Microblog anti-spam works in the future.
出处 《通信学报》 EI CSCD 北大核心 2016年第8期24-33,共10页 Journal on Communications
基金 国家重点基础研究发展计划("973"计划)基金资助项目(No.2009CB320505) 国家科技支撑计划基金资助项目(No.2008BAH37B05) 国家自然科学基金资助项目(No.61170211 No.U1533104 No.61301245) 教育部博士点基金资助项目(No.20110002110056)~~
关键词 新浪微博 特征生成 特征选择 垃圾用户检测 Sina Weibo, feature definition, feature selection, spammer detection
  • 相关文献

参考文献55

  • 1Available online[EB/OL]. http://news.xinhuanet.com/2013-07/04/c_116410610.htm.
  • 2Available online[EB/OL]. http://it.people.com.en/n/2015/0212/cl009-26552746.html.
  • 3SPIRIN N, HAN J W. Survey on web spam detection: principles andalgorithms[J]. ACM SIGKDD Explorations Newsletter, 2012,13(2):50-64.
  • 4MUKHERJEE A, LIU B, GLANCE N S. Spotting fake reviewergroups in consumer reviews[C]//The WWW. c2012: 191-200.
  • 5WANG T Y,WANG G, LI X. Characterizing and detecting maliciouscrowdsourcing[C3//The ACM SIGCOMM. c2013: 537-538.
  • 6WANG Q WILSON C,ZHAO X H. Serf and turf: crowdturfmg forfiin and profit[C]//The WWW. c2012: 679-688.
  • 7SRIDHARAN V, SHANKAR V’ GUPTA M. Twitter games: howsuccessful spammers pick taigets[C]//The ACSAC. c2012: 389-398.
  • 8STRINGHINI Q KRUEGEL C, VIGNA G. Detecting spammers onsocial networks[C]//The ACSAC. c2010: 1-9.
  • 9IRANI D,WEBB S,PU C. Study of static classification of social spamprofiles in MySpace[C]//The ICWSM. c2010: 82-89.
  • 10GAO H Y,HU J, WILSON C. Detecting and characterizing socialspam campaigns[C]//The CCS. c2010: 681-683.

二级参考文献22

  • 1杨楠,弓丹志,李忺,孟小峰.Web社区发现技术综述[J].计算机研究与发展,2005,42(3):439-447. 被引量:35
  • 2张泽明,罗文坚,王煦法.一种基于人工免疫的多层垃圾邮件过滤算法[J].电子学报,2006,34(9):1616-1620. 被引量:16
  • 3中国互联网络信息中心.中国互联网络发展状况统计报告[EB/OL].http://www.cnnic net.cn,2003—07-01.
  • 4Kwak H. Lee C. Park H. et al. What is twitter. a social network or a news media? [C] / /Proc of the 19th Int World Wide Web Conf. New York, ACM. 2010, 591-600.
  • 5Yin D. Hong L. Xiong X. et al. Link formation analysis in microblogs [C] / /Proc of the 34th Annual Int ACM SIGIR Conf on Information Retrieval. New York, ACM. 2011, 1235-1236.
  • 6Becchetti L. Boldi P. Castillo C. er al. Efficient semistreaming algorithms for local triangle counting in massive graphs [C] / /Proc of the 14th ACM SIGKDD Int Conf On Knowledge Discovery and Data Mining. New York, ACM. 2008, 16-24.
  • 7Tsourakakis C. Fast counting of triangles in large real networks without counting, Algorithms and laws [C] / /Proc of the 8th IEEE Int Conf on Data Mining. Piscataway. NJ, IEEE. 2008, 608-617.
  • 8Gyongyi Z, Garcia-Molina H. Pedersen J. Combating Web sparn with TrustRank [C] / /Proc of the 30th Int Conf on Very Large Data Bases. San Franciso . Morgan Kaufmann, 2004, 576-587.
  • 9Sobek M. PRO-Google's PageRank 0 penalty [EB/OL]. (2003-01-31) [2012-07-28]. http://pr. efactory. dele-prO. shtml.
  • 10Wu B. Goel V. Davison B. Propagating trust and distrust to demote Web sparn [C] / /Proc of Models of Trust for the Web Workshop of 15th Int World Wide Web Conf. New York, ACM. 2006, 29-37.

共引文献55

同被引文献55

引证文献8

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部