
EDM:高效的微博事件检测算法 被引量:19

EDM: An Efficient Algorithm for Event Detection in Microblogs
摘要 微博数据具有实时动态特性,人们通过分析微博数据可以检测现实生活中的事件。同时,微博数据的海量、短文本和丰富的社交关系等特性也为事件检测带来了新的挑战。综合考虑了微博数据的文本特征(转帖、评论、内嵌链接、用户标签hashtag、命名实体等)、语义特征、时序特性和社交关系特性,提出了一种有效的基于微博数据的事件检测算法(event detection in microblogs,EDM)。还提出了一种通过提取事件关键要素,即关键词、命名实体、发帖时间和用户情感倾向性,构成事件摘要的方法。与基于LDA(latent Dirichlet allocation)模型的事件检测算法进行实验对比,结果表明,EDM算法能够取得更好的事件检测效果,并且能够提供更直观可读的事件摘要。 Microblog data have the characteristics of real-time dynamics, so we can monitor the microblog data to detect events in real life. However, the characteristics of the microblog data, such as the big data, short texts, rich social information and so on, also bring challenges. This paper proposes a novel event-detection algorithm based on microblog data--EDM algorithm, according to the textual characteristics of microblog data (retweeting, commenting, shorten url, hashtag and named entities), semantic features, time features and social information. Besides, this paper extracts keywords, named entities; the publishing time of posts and sentiment analysis for event summarization. Compared with LDA (latent Dirichlet allocation) model, the experimental results demonstrate that the proposed EDM algorithm works better in event detection and offers an intuitive event summary.
出处 《计算机科学与探索》 CSCD 2012年第12期1076-1086,共11页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金 Nos.91024032 61070055 国家科技重大专项"核高基"项目 No.2010ZX01042-002-003 中国人民大学科学研究基金 No.10XNI018~~
关键词 事件检测 事件摘要 特征选取 微博 event detection event summarization feature selection microblog
  • 相关文献


  • 1Zhao Qiankun, Mitra P, Chen Bi. Temporal and information flow based event detection from social text streams[C]//Pro- ceedings of the 22nd AAAI Conference on Artificial Intel- ligence (AAAI '07), Vancouver, Canada, Jul 22-26, 2007: 1501-1506.
  • 2Sayyadi H, Hurst M, Maykov A. Event detection and tracking in social streams[C]//Proceedings of the 3rd International AAAI Conference on Weblogs and Social Media (ICWSM '09), San Jose, California, USA, May 17-20, 2009: 311-314.
  • 3Li Juanzi, Li Jun, Tang Jie. A flexible topic-driven frame- work for news exploration[C]//Proceedings of the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '09), Paris, France, Jun 28-Jul 1, 2009.
  • 4Allan J, Carbonell J, Doddington G, et al. Topic detection and tracking pilot study final report[C]//Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, Feb 1998: 194-218.
  • 5Dai Xiangying, Chen Qingcai, Wang Xiaolong, et al. Online topic detection and tracking of financial news based on hier- archical clustering[C]//Proceedings of the 2010 InternationalConference on Machine Learning and Cybernetics (ICMLC '10), Qingdao, China, Jul 11-14, 2010: 3341-3346.
  • 6Deerwester S, Dumais S T, Furnas G W, et al. Indexing by latent semantic analysis[J]. Journal of the American Society for Information Science, 1990, 41(6): 391-407.
  • 7Hofmann T. Probabilistic latent semantic analysis[C]//Pro- ceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '99), Stockholm, Sweden, Jul 30-Aug 1, 1999. New York, NY, USA: ACM, 1999: 50-57.
  • 8Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
  • 9Phuvipadawat S, Murata T. Breaking news detection and tracking in Twitter[C]//Proceedings of the 2010 Interna- tional Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT ' 10), Toronto, Canada, Aug 31-Sep 3, 2010: 120-130.
  • 10Lee C-H, Wu C-H, Chien T-F. BursT: a dynamic term weighting scheme formining microblogging messages[C]// Proceedings of the 8th Intemational Symposium on Neural Network, Guilin, China, May 29-Jun I, 2011. Berlin: Springer, 20'11: 548-557.











使用帮助 返回顶部