摘要
目前,话题的演化跟踪方法大部分基于不同时间片对应数据集的特征关键词之间语义相似度与语义距离的计算,忽略话题的不同动态演变阶段各个特征关键词之间关联关系的作用。为此提出一种基于并行关联规则的话题演化跟踪方法。引入时间窗口的概念,对数据集按照时序进行划分,在每个时间窗口获取大规模频繁关键词集;对每个时间窗口的频繁关键词集,应用并行关联规则算法获取关联规则集;筛选并组合所有关联规则集形成话题的相关关键词信息,发现相邻时间窗口的数据集之间的关联关系并实现话题演化跟踪。实验结果表明,与OLDA算法相比,该方法能够更加完整有效地深入分析话题的动态演化细节。
At present,most of the topic evolution tracking methods are based on the calculation of semantic similarity or semantic distance of feature keywords in the different time slices,ignoring the role of related relationship of each feature keyword in diffe-rent dynamic evolution stages of the topic.In this situation,topic evolution and tracking based on parallel association rule was proposed.The concept of time window was introduced,the data set was divided according to the time series.Association rule sets were obtained through parallel association rules in each time window.The related keywords information of the topic was obtained by selecting and assembling all of association rules sets,finding the relationships between the data sets of adjacent time window and tracking the topic evolution.Experimental results show that the proposed method can analyze the dynamic evolution of topic more completely and effectively compared with the OLDA.
作者
王奕文
张如玉
刘昕
张琼声
田红磊
曹帅
WANG Yi-wen;ZHANG Ru-yu;LIU Xin+;ZHANG Qiong-sheng;TIAN Hong-lei;CAO Shuai(College of Computer and Communication Engineering,China University of Petroleum(East China),Qingdao 266580,China)
出处
《计算机工程与设计》
北大核心
2021年第12期3555-3561,共7页
Computer Engineering and Design
基金
山东省自然科学基金项目(ZR2020MF04)
中央高校基本科研业务费专项资金基金项目(19CX05027B)
上海工业控制系统安全创新功能型平台开放课题基金项目(TICPSH202003015-ZC)。
关键词
话题演化
话题跟踪
并行关联规则
时间窗口
关联规则集
topic evolution
topic tracking
parallel association rule
time window
association rule set