摘要
针对当前短文本的突发事件分析不能较为简易且准确地描述事件发展过程的问题,提出一种新的基于短文本的突发事件发展过程表示方法。首先,提出一种事件状态值,它被用于描述事件在各个时间点的状态,以便于用户分析事件的发展过程;其次,根据短文本的结构化信息,将事件状态值从文本信息和用户信息两个方面考虑;然后,考虑文本信息的影响因子,构造相关公式计算文本信息权重;再次,考虑用户信息的影响因子,提出一种改造的Page Rank算法和用户分层思想,构造相关公式计算用户信息权重;最后,根据文本信息权重和用户信息权重计算事件状态值。实验结果表明依次考虑用户信息、采用改造的PageRank算法以及采用分层思想均能修正1~2个描述点,提高事件发展过程表示的准确度。
Current analytical method based on short-text can not describe the evolution process of burst-event in a simple and accurate manner. In order to solve the problem,a new method was proposed to express the evolution process of burst-event based on short-text data sets. Firstly,a method of measuring event status was proposed to describe the state of event at each time for analyzing the development process of the event. Secondly,according to the structured information of short-text,the value of event status was set from two aspects: text information and user information. Thirdly,with the consideration of the impact factor of text information,the weight of text information was calculated by constructing related formulas. Fourthly,with the consideration of the impact factor of user information,a modified Page Rank algorithm was proposed,and users were divided into different layers to calculate the weight of user information by constructing related formulas. Finally,the weight of text information and the weight of user information were combined to calculate the value of event status. The experimental results show that considering user information in turn,the modified Page Rank algorithm,and the idea of dividing the users into different layers all can correct 1 ~ 2 points of description and improve the accuracy of expressing the evolution process of event.
出处
《计算机应用》
CSCD
北大核心
2016年第6期1605-1612,共8页
journal of Computer Applications
基金
上海市教育委员会科研创新项目(B.10-0108-14-202)~~