摘要
在话题演化跟踪领域,主题模型中时间片大小和主题数K值固定导致无法发掘重要时间转折点,为此提出一种动态时序分割无限潜在狄利克雷分配(dynamic temporal segmentation-infinite latent Dirichlet allocation,DTS-ILDA)模型.对于演化分析中容易产生错误话题关联的问题,提出一种关联过滤机制.首先运用DTS-ILDA模型提取主题,将改进动态时间分割算法与无限潜在狄利克雷分配(infinite latent Dirichlet allocation,ILDA)模型进行融合.动态时间分割算法按时间顺序遍历数据集,根据列联表分析前后时间片主题分布情况以衡量分割效果,从而找到合适的时间片分割点;ILDA模型可在各时间片内提取不同数量话题并对提取出的主题进行演化关联分析,然后用关键过滤方法滤除关联性不强的关联关系,最后按照时间顺序关系为剩余的关联建立子话题的5种演化关系图.实验表明:该方法能有效找到主题内容发生重要变化的时间点,防止产生无意义话题,同时减少错误话题关联干扰,挖掘出准确的话题深层次关系.
In topic evolution and tracking, as the size of time slices and the K value of the topic model are fixed, it is hard to locate important time turning points, which is prone to error topic correlation in the evolutionary analysis. To solve the problem, we propose an improved dynamic temporal segmentation-infinite latent Dirichlet allocation (DTS-ILDA) model and an associated filtering mechanism. The model combines an improved dynamic time segmentation algorithm with an infinite latent Dirichlet allocation (ILDA) model to extract topics. Dynamic time segmentation algorithm traverses the data set according to the time sequence, and then uses a contingency table to analysis the distribution of topics to measure the segmentation results and an ILDA model to extract K topics. In addition, an association filtering mechanism is proposed for error prone association in the evolutionary analysis. It removes weak association relationship. Finally, five evolutionary relationships of right subtopic association are established according to the time sequence relationship. Experiments show that the presented method can effectively find important time points when the main content of the topic changes, preventing generation of meaningless topics. It can also reduce error-topic related interference, extracting exact deep relationship between the topics.
作者
郭晓利
周自岚
刘耀伟
独健鸿
黄岩
GUO Xiao-li ZHOU Zi-lan LIU Yao-wei DU Jian-hong HUANG Yan(School of Information Engineering, Northeast Dianli University, Jilin 132012, Jilin Province, China Jilin Power Supply Company, State Grid Jilin Province Electric Power Supply Company, Jilin 132000, Jilin Province, China Jilin Fengman Power Plant, Jilin 132012, Jilin Province, China Jilin Branch, China Mobile Communications Group Jilin Co., Ltd., Jilin 132012, Jilin Province, China)
出处
《应用科学学报》
CSCD
北大核心
2017年第5期634-646,共13页
Journal of Applied Sciences
基金
国家自然科学基金(No.51277023)
吉林省科技厅项目基金(No.20150307020GX)资助
关键词
主题模型
主题演化
时间分割
无限潜在狄利克雷分配模型
过滤
topic model, topic evolution, temporal segment, infinite latent Dirichlet allo- cation (ILDA) mode, a filtering