期刊文献+

基于聚类分析的微博广告发布者识别 被引量:2

Identification of micro-blog advertising publisher based on clustering analysis
下载PDF
导出
摘要 微博空间存在大量的广告内容,这些信息严重影响着普通用户的用户体验和相关的研究工作。现有研究多使用支持向量机(SVM)或随机森林等分类算法对广告微博进行处理,然而分类方法中人工标注大数据量训练集存在困难,因此提出基于聚类分析的微博广告发布者识别方法:对于用户维度,针对微博广告发布者通过发布大量普通微博来稀释其广告内容的现象,提出核心微博的概念,通过提取核心微博主题及其对应的微博序列,计算用户特征和对应微博的文本特征,并使用聚类算法对特征进行聚类,从而识别微博广告发布者。实验结果显示,所提方法准确率为92%,召回率为97%,F值为95%,证明所提方法在广告内容被人为稀释的情况下能准确地识别微博广告发布者,可以为微博垃圾信息识别、清理等工作提供理论支持和实用方法。 There is a large amount of advertising content in micro-blog space, which seriously affects user experience and related research work. Much of existing research on micro-blog process uses classification algorithm such as Support Vector Machine( SVM) and random forest algorithm. However, it is difficult to classify a large volume of data in the classification method manually. A micro-blog advertisement publisher identification method based on clustering analysis was proposed. For user dimension, a concept of core micro-blog was put forward to deal with the phenomenon that ordinary micro-blogs were posted to dilute advertising content. Then the extracted main themes of each user and corresponding micro-blog sequences could be used to calculate user characteristics as well as the text characteristics. After that, a clustering algorithm was used to cluster the features and identify the micro-blog advertisers. The experiment result shows that the precision is 93%, the recall is 97%, and the F value is 95%, which proves that the proposed method can accurately identify the micro-blog advertisement publisher under the condition that the content of the advertisement is artificially diluted. It provides theoretical support and practical methods for the recognition and cleaning work of micro-blog spam information.
作者 赵星宇 赵志宏 王业沛 陈松宇 ZHAO Xingyu;ZHAO Zhihong;WANG Yepei;CHEN Songyu(Software Institute,Nanjing University,Nanjing Jiangsu 210093,China)
出处 《计算机应用》 CSCD 北大核心 2018年第5期1267-1271,共5页 journal of Computer Applications
基金 江苏省产学研前瞻性联合研究项目(BY2015069-03)~~
关键词 微博广告 基于密度的空间聚类 文本过滤 特征提取 micro-blog advertising Density-Based Spatial Clustering of Applications with Noise (DBSCAN) text filtering feature extraction
  • 相关文献

参考文献6

二级参考文献92

  • 1贾自艳,何清,张海俊,李嘉佑,史忠植.一种基于动态进化模型的事件探测和追踪算法[J].计算机研究与发展,2004,41(7):1273-1280. 被引量:58
  • 2张泽明,罗文坚,王煦法.一种基于人工免疫的多层垃圾邮件过滤算法[J].电子学报,2006,34(9):1616-1620. 被引量:16
  • 3任兴平,何忠龙,孟增辉.改进DBSCAN算法中参数Eps值的确定[J].现代电子技术,2007,30(11):120-121. 被引量:5
  • 4Ester,Martin, Kriegel H P,et al. A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise [ C ]//Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining(KDD-96). Ortland, Oregon : [ s. n. ], 1996.
  • 5Daszykowski M, Walczak B,Massart D L. Looking for Natural Patterns In Data [ J ]. Chemometics and Intelligent Laboratory Systems, 2001,56 : 83 - 92.
  • 6Ankerst M, Breunig M, Kriegel H P, et al. Optics: Ordering points to Identify the Clustering Structure [ C ]//Proceedings of ACM SIGMOD International Conference on Management of Data. Philadephia : ACM Press, 1999:49-60.
  • 7高舁.基于密度聚类算法的改进方法研究[D].大连:大连理工大学,2007.
  • 8M.Q. Hu, B. Liu. Mining and Summarizing Custom- er Reviews[C]//ACM SIGKDD 2004.. 168-177.
  • 9Bo Pang, Lillian Lee. Opinion mining and sentiment a- nalysis[C]//Foundations and Trends in Information Retrieval, 2(1-2):1-135.
  • 10M.Q. Hu, B. Liu. Opinion Extraction and Summari- zation on the Web[C]//AAAI06, Boston: 1621-1624.

共引文献267

同被引文献29

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部