摘要
[目的/意义]针对现有弱信号全自动识别研究尚不完善的问题,提出基于LDA-BERT融合模型的弱信号全自动识别方法。[方法/过程]基于无监督的LDA主题模型对文本数据集进行主题分类,构建主题和术语双层过滤函数从主题分类的结果中提取早期预警信号,通过紧密中心度、主题权重以及主题自相关性三大度量函数评价主题的弱性,并基于主题内术语的归一化频率和概率提取出弱信号。最后,运用BERT深度学习模型从语义层面对弱信号上下文及其类似词进行扩展。[结果/结论]以2021年1月初疫情重爆发事件为例,使用爆发前三月的社交媒体新闻数据集对构建的系统模型进行验证。实验结果表明,该方法可有效检测出相关弱信号,并挖掘出弱信号随时间推移逐渐增强的演化特性。此外,该融合模型在实现弱信号全自动识别的同时,也表现出较单一模型更强的结果可解释能力。
[Purpose/significance]Aiming at the problem that the existing weak signal automatic recognition research is still incomplete,this paper proposes a weak signal automatic recognition method based on the LDA-BERT fusion model.[Method/process]Based on the unsupervised LDA topic model,the text data set was classified by topic,and the topic and term double-layer filter function was constructed to extract early warning signals from the results of topic classification.The weakness of the topic was evaluated by the three major metrics of close centrality,topic weight and topic autocorrelation,and weak signals were extracted based on the normalized frequency and probability of terms within the topic.Finally,the BERT deep learning model was used to expand the weak signal context and similar words from the semantic level.[Result/conclusion]Taking the re-eruption of the epidemic in early January 2021 as an example,the constructed system model was verified using the social media news data set of the three months before the outbreak.The experimental results show that the method can effectively detect the relevant weak signals and dig out the evolution characteristics of the weak signals that gradually increase over time.In addition,the fusion model not only realizes the automatic identification of weak signals,but also shows stronger result interpretability than a single model.
作者
杨波
邵婉婷
Yang Bo;Shao Wanting(School of Information Management,Jiangxi University of Finance and Economics,Nanchang 330013;Institute of Information Resources Management,Jiangxi University of Finance and Economics,Nanchang 330013)
出处
《图书情报工作》
CSSCI
北大核心
2021年第16期98-107,共10页
Library and Information Service
基金
国家自然科学基金项目"基于免疫方法的新创企业成长风险管理知识服务模型研究"(项目编号:72064015)
江西省社会科学规划重点项目"面向新创企业成长风险管理的知识服务机制研究"(项目编号:19TQ01)研究成果之一。