期刊文献+

云计算环境下基于代表点增量层次密度聚类的微博事件检测及跟踪 被引量:3

Microblog events detection and tracking with incremental hierarchical DBSCAN based on representative posts using cloud framework
下载PDF
导出
摘要 为从微博服务平台产生的大量实时信息中抽取新闻事件,提出了一套完整的云计算环境下的微博事件检测跟踪算法。首先采用新的基于微博转发数和评论数的权值计算方法,将微博文本表示成向量空间模型;再利用基于代表点的增量层次密度聚类(RIHDBSCAN)算法抽取关键词,最终实现新闻事件的检测和跟踪。针对单一节点无法快速高效地处理海量微博数据的问题,将算法部署在云计算平台Hadoop上。通过在新浪微博平台上获取的真实数据进行实验,结果表明,所提出的权值计算方法比TF-IDF和UF-ITUF有更高的性能,并且云框架的使用较好地提高了处理速度,适合用于海量数据的分析和挖掘。 For the purpose of events extraction from large-scale short posts of microblogging service, a complete event detection and tracking algorithm was proposed using cloud framework. First, based on the number of forward and comment of the microblog, the posts were expressed as Vector Space Model ( VSM). Then the keywords were extracted using RIHDBSCAN (Incremental Hierarchical DBSCAN based on Representative posts) to realize the event detection and tracking. Considering that a single node cannot quickly and efficiently handle the large amount of data, the algorithm would be deployed on Hadoop, a cloud computing platform. The experiment on real microblog data extracted from Sina microblogging platform shows that the proposed method achieves higher performance than that of TF-IDF ( Term Frequency-Inverse Document Frequency) and UF- ITUF (User Frequency-Inverse Thread User Frequency), and the use of cloud framework improves the processing speed. Therefore, it is suitable for data analysis and mining on huge datasets.
出处 《计算机应用》 CSCD 北大核心 2013年第12期3559-3562,3595,共5页 journal of Computer Applications
基金 国家自然科学基金资助项目(61103114) 国家科技支撑计划项目(2012BAH19F00) 中央高校基本科研业务基金资助项目(106112013CDJZR185502) 重庆市高等教育教学改革研究重点项目(112023)
关键词 微博 事件检测 密度聚类算法 云计算 HADOOP平台 代表点 microblog events detection Density-Based Spatial Clustering of Applications with Noise (DBSCAN) cloudcomputing Hadoop platform representative post
  • 相关文献

参考文献15

  • 1MATI-IIOUDAKIS M, KOUDAS N. TwitterMonitor: trend detection over the Twitter stream [ C]// SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Da- ta. New York: ACM, 2010:1155-1158.
  • 2SAKAKI T, OKAZAKI M, MATSUO Y. Earthquake shakes Twitter users: real-time event detection by social sensors [ C]//WWW 10: Proceedings of the 19th International Conference on World Wide Web. New York: ACM, 2010:851-860.
  • 3PETROVI S, OSBORNE M, LAVRENKO V. Streaming first story detection with application to Twitter [ C]// HLT '10 Human Lan- guage Technologies: The 2010 Annual Conference of the North A- merican Chapter of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2010: 181 - 189.
  • 4GHEMAWAT S, GOBIOFF H, LEUNG S-T. The Google file system [J]. ACM SIGOPS Operating Systems Review, 2003, 37(5):29 - 43.
  • 5DEAN J, GHEMAWAT S. MapReduce: simplified data processing on large clusters [ J]. Communications of the ACM, 2008, 51 (1) : 107 - 113.
  • 6郑斐然,苗夺谦,张志飞,高灿.一种中文微博新闻话题检测的方法[J].计算机科学,2012,39(1):138-141. 被引量:84
  • 7ESTER M, KRIEGEL H P, SANDER J, et al. Incremental cluste- ring for mining in a data warehousing environment [ C] // VLDB '98: Proceedings of the 24rd International Conference on Very Large Data Bases. San Francisco: Morgan Kaufmann Publishers, 1998:323-333.
  • 8蔡颖琨,谢昆青,马修军.屏蔽了输入参数敏感性的DBSCAN改进算法[J].北京大学学报(自然科学版),2004,40(3):480-486. 被引量:39
  • 9马帅,王腾蛟,唐世渭,杨冬青,高军.一种基于参考点和密度的快速聚类算法[J].软件学报,2003,14(6):1089-1095. 被引量:108
  • 10周红芳,赵雪涵,周扬.基于限定区域数据取样的密度聚类算法[J].计算机应用,2012,32(8):2182-2185. 被引量:5

二级参考文献55

  • 1周水庚,周傲英,金文,范晔,钱卫宁.FDBSCAN:一种快速 DBSCAN算法(英文)[J].软件学报,2000,11(6):735-744. 被引量:42
  • 2冯少荣,肖文俊.基于密度的DBSCAN聚类算法的研究及应用[J].计算机工程与应用,2007,43(20):216-221. 被引量:34
  • 3Kwak H, Lee C, Park H, et al. What is Twitter, a Social Net- work or a News Media? I-A]//WWW' 10 Proceedings of the 19th International Conference on World Wide Web, 2010[C]. Raleigh, North Carolina, USA : ACM, 2010 : 591 -600.
  • 4Liu Zi-tao, Yu Wen-chao, Chen Wei, et al. Short Text Feature Selection for Miero-blog Mining [A]//Computational Intelli- gence and Software Engineering, 2010[C]. Wuhan, China: Wu- han University, 2010: 1-4.
  • 5Pak A,Paxoubek Pa Twitter as a Corpus for Sentiment Analy- sis and Opinion Mining[A]//Proceedings of LREC, 2010[C]. Valletta, Malta: European Language Resources Association (ELRA). 2010:1320-1326.
  • 6Allan J,Carbonell JG, et al. Topic Detection and Tracking Pilot Study Final Report[A]//Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, 1998 [C]. 1998:194-218.
  • 7Sakaki Ti, Okazaki M, Matsuo Y. Earthquake Shakes Twittt User..Real-time Event Detection by Social Sensors [ A] // Pr1 ceedings of the 19th International Conference on World Wi1 Web, 2010[C]. Raleigh, North Carolina: ACM Press, 2010: 85] 861.
  • 8Petrovi S, Osborne M, Lavrenko V. Streaming First Story De- tection with application to Twitter[A]//Proceedings of HLT- NAACL, 2010 [C]. Stroudsburg, PA, USA: Association for Computational Linguisties. 2010:181-189.
  • 9Zhang H P, Yu H K, Xiong D Y, et al. HHMM-based Chinese lexieal analyzer ICTCLAS [A]//. Proceedings of the second SIGHAN workshop on Chinese language processing-Volume 17, 2003 [C]. Sapporo, Japan: Association for Computational Linguistics, 2003 : 184-187.
  • 10路荣,项亮,刘明荣,等.基于隐主题分析和文本聚类的微博客新闻话题发现研究[A]∥第六届全国信息检索学术会议,2010[C].2010:291-298.

共引文献246

同被引文献72

  • 1贾自艳,何清,张海俊,李嘉佑,史忠植.一种基于动态进化模型的事件探测和追踪算法[J].计算机研究与发展,2004,41(7):1273-1280. 被引量:58
  • 2陆安生,陈永强,屠浩文.决策树C5算法的分析与应用[J].电脑知识与技术(技术论坛),2005(3):17-20. 被引量:16
  • 3刘华.超大规模分类语料库构建[J].现代图书情报技术,2006(1):71-73. 被引量:6
  • 4洪宇,张宇,刘挺,李生.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007,21(6):71-87. 被引量:153
  • 5Allan J, Lavrenko V, Swan R.. Explorations within topic tracking and detection [ M ]//Topic Detection and Tracing : Event-based Information Organization. Kluwer Academic: Massachusetts, 2002,197-224.
  • 6Petrovic S, Osborne M, Lavrenko V. Streaming first story detection with application to Twitter[ C]// In Proceedings of the lhh Annual Conference of the North American Chap- ter of the Association for Computational Linguistics. [ s. n. ], 2010:181-189.
  • 7Sakaki T, Okazaki M, Matsuo Y. Earthquake shakes Twit- ter users : Real-time event detection by social sensors [ C ] //In Proceedings of the 19th International World Wide Web Conference. New York : ACM, 2010,851 - 860.
  • 8Phuvipadawat S, Murata T. Breaking news detection and tracking in Twitter[ C ]//Web Intelligence and Intelligent Agent Technology In Proceedings of IEEE/WIC/ACM In- ternational Conference. Toronto,Canada:IEEE, 2010: 120- 123.
  • 9Lin J, Snow R, Morgan W. Smoothing techniques for a- daptive online language models: topic tracking in tweet streams[ C]/// In Proceedings of the 17th ACM SIGKDD international Conference on Knowledge Discovery and Data Mining. New York : ACM ,2011 : 422-429.
  • 10Asur S, Huberman B A. Predicting the Future With Social Media[ C ] // In Proceedings of the ACM International Conference on Web Intelligence. New Nork:ACM. 2010.

引证文献3

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部