期刊文献+

广告型微博的识别方法 被引量:3

Detecting Approach for Advertising Microblog
下载PDF
导出
摘要 微博空间内充斥着大量广告信息,这些广告信息对舆情分析造成了极不利的影响.分析广告型微博特点,提出了一种广告型微博识别方法:在传统文本特征的基础上,引入"非活跃期微博数"、"微博重复度"、"特征词对权重"三类特征,并结合支持向量机模型对微博文本进行分类,识别广告微博发布者;分析广告微博发布者与普通用户的差异,提取广告微博发布者的"主题"特征,并面向用户对微博文本进行过滤,实现对广告型微博的识别.实验结果正确率为87.6%,召回率为97.2%,F值为91.6%,证明该方法能高效准确地识别广告型微博. Tbere exists large amount of advertising information which has adverse effect on web public opinion analysis in microblog space. Detecting the advertising microblogs, filtering the microblogs,is becoming an urgent problem. Having analyzed the features of microblog base on massive data, a detecting approach for advertising microblogs is proposed in this paper: add three new features named "word pair weight feature" ," multiplicity" and "post frequency" to the classification algorithm base on traditional text features and SVM model to detect the advertisers;analyze the difference between advertisers and legitimate users, extract the topic feature of every user, filter the microblogs facing users and accomplish the advertising microblog detection. The results based on this method can achieve 86. 7% precision,97. 2% recall and 91.6% F-score. It shows that our method can effectively detect the advertising microb- logs.
出处 《小型微型计算机系统》 CSCD 北大核心 2014年第12期2702-2707,共6页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61171159 61271304)资助 北京市教委科技发展计划重点项目暨北京市自然科学基金B类重点项目(KZ201311232037)资助
关键词 广告型微博 支持向量机 文本过滤 主题 advertising microblog advertiser SVM topic
  • 相关文献

参考文献3

二级参考文献34

  • 1魏红宁.决策树剪枝方法的比较[J].西南交通大学学报,2005,40(1):44-48. 被引量:43
  • 2罗庆霖,周琴.信息过滤模型及其基于神经网络的改进[J].电脑与信息技术,2000,8(3):5-9. 被引量:3
  • 3傅晓东,李卫华.信息过滤Agent的WWW文档分析实现[J].广东教育学院学报,2001,21(2):95-99. 被引量:3
  • 4Nanno T, Fujiki T, Suzuki Y. Automatically collecting, monitoring,and mining Japanese weblogs[C]//Proceedings of the 13^th International World Wide Web Conference on Alternate Track Papers & Posters. ACM Press(WWW Alt. '04),2004:320 321.
  • 5Sato Y, Utsuro T, Fukuhara T. Analysing features of Japanese splogs and characteristics of keywords[C]//Proc. 4th AIRWeb. 2008.
  • 6Kolari P,Finin T,Joshi A. SVMs for the blogosphere: Blog iden tification and splog detection [C]// Proc. of the AAAI Spring Symp. on Computational Approaches to Analyzing Weblogs. California: AAAI Press, 2006 : 92-99.
  • 7Melville P,Gryc W, Lawrence R D. Sentiment Analysis of Blog by Combining Lexical Knowledge with Text Classification[C]// Proc KDD 09. June 2009.
  • 8Ru Yu, Sundaram L H, Chi Yun. Splog Detection Using Self-similarity Analysis on Blog Temporal Dynamics [C]//Proc 5th AIR Web Press. 2007.
  • 9Katayama T, Utsuro T, Sato Y. An Empirical Study on Selective Sampling in Active Learning for Splog Detection[C]//Proc 4th AIRWeb Press. 2009.
  • 10Kolari P, Finin T,Joshi A. Svrns for the blogosphere: Blog identification and splog detection[C]//AAAI Spring Symposium on Computational Approaches to Analysing Weblogs. Baltimore County: Computer Science and Electrical Engineering. University of Maryland, March 2006.

共引文献19

同被引文献21

引证文献3

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部