期刊文献+

基于LDA模型的文本时间窗口划分研究

Research on Text Time Window Partition Based on LDA Model
原文传递
导出
摘要 [目的/意义]鉴于静态主题模型难以满足用户动态分析需求,为解决现有动态主题模型存在计算成本高或者受主观因素影响深等问题,从时间窗口相似度出发,以LDA模型为基础,提出一种文本时间窗口划分算法。[方法/过程]构建集时间窗口间差异性和时间窗口内一致性于一体的时间窗口相似度指标,基于该指标构建文本时间窗口划分算法,并以创新研究领域为例开展实证研究。[结果/结论]综合每一个时间窗口内最优主题个数下主题之间的平均JS散度和相邻时间窗口间不同主题之间的平均JS散度两个指标结果来看,使用本文提出的算法得出的划分结果明显优于多种固定时间窗口长度划分的结果,验证了本研究提出的文本时间窗口划分算法的有效性。该算法在一定程度上解决了现有动态主题模型计算成本高、主观性强等缺点,增加了文本时间窗口划分结果的客观性与准确性,可以为主题演化等相关研究提供技术支持。 [Objective/Significance]Considering that static topic models are difficult to meet users'dynamic analysis needs,in order to solve the problems of high computational costs or deep influence from subjective factors in existing dynamic topic models,this study proposes a text time window partitioning algorithm based on the LDA model,starting from time window similarity.[Method/Process]This study constructs a time window similarity index that integrates differences between time windows and consistency within time windows.This study constructs a time window partitioning algorithm based on this indicator and conducts empirical research using the innovation research field as an example.[Results/Conclusions]By analyzing the average JS divergence between topics under the optimal number of topics within each time window,as well as the average JS divergence between different topics between adjacent time windows,the partitioning results obtained by the algorithm proposed in this study are significantly better than those obtained by multiple fixed time window length partitioning methods,verifying the effectiveness of using the improved LDA model proposed in this study for text time window partitioning.The algorithm proposed in this study to some extent solves the shortcomings of existing dynamic topic models such as high computational costs and strong subjectivity,increases the objectivity and accuracy of text time window partitioning results,and can provide technical support for related research such as theme evolution.
作者 龙艺璇 王晓刚 周子威 王荣笙 伊惠芳 Long Yixuan;Wang Xiaogang;Zhou Ziwei;Wang Rongsheng;Yi Huifang(Scientific&Technical Information Research Institute,Chinese Academy of Railway Sciences,Beijing 100081,China;China Railway Xi’an Group CO.LTD,Xian 710054,China;Qingdao University Library,Qingdao 266071,China)
出处 《科学观察》 2024年第2期34-45,共12页 Science Focus
基金 中国国家铁路集团有限公司科技研究开发项目“以提升创新效能为导向的铁路企业创新能力评价指标体系及实证研究”(J2022Z004) 中国铁路西安局集团有限公司科技研究开发计划“科技管理体系创新优化、实践及示范应用研究”(N2023092)。
关键词 LDA模型 时间窗口 动态主题模型 文本相似度 创新研究 LDA model time window dynamic topic model text similarity innovation research
  • 相关文献

参考文献13

二级参考文献185

共引文献276

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部