摘要
微博数据具有实时动态特性,人们通过分析微博数据可以检测现实生活中的事件。同时,微博数据的海量、短文本和丰富的社交关系等特性也为事件检测带来了新的挑战。综合考虑了微博数据的文本特征(转帖、评论、内嵌链接、用户标签hashtag、命名实体等)、语义特征、时序特性和社交关系特性,提出了一种有效的基于微博数据的事件检测算法(event detection in microblogs,EDM)。还提出了一种通过提取事件关键要素,即关键词、命名实体、发帖时间和用户情感倾向性,构成事件摘要的方法。与基于LDA(latent Dirichlet allocation)模型的事件检测算法进行实验对比,结果表明,EDM算法能够取得更好的事件检测效果,并且能够提供更直观可读的事件摘要。
Microblog data have the characteristics of real-time dynamics, so we can monitor the microblog data to detect events in real life. However, the characteristics of the microblog data, such as the big data, short texts, rich social information and so on, also bring challenges. This paper proposes a novel event-detection algorithm based on microblog data--EDM algorithm, according to the textual characteristics of microblog data (retweeting, commenting, shorten url, hashtag and named entities), semantic features, time features and social information. Besides, this paper extracts keywords, named entities; the publishing time of posts and sentiment analysis for event summarization. Compared with LDA (latent Dirichlet allocation) model, the experimental results demonstrate that the proposed EDM algorithm works better in event detection and offers an intuitive event summary.
出处
《计算机科学与探索》
CSCD
2012年第12期1076-1086,共11页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金 Nos.91024032
61070055
国家科技重大专项"核高基"项目 No.2010ZX01042-002-003
中国人民大学科学研究基金 No.10XNI018~~
关键词
事件检测
事件摘要
特征选取
微博
event detection
event summarization
feature selection
microblog