期刊文献+

基于LDA和语步标注的主题识别与分析方法研究

Research on Topic Recognition and Analysis Based on LDA and Move Tagging
下载PDF
导出
摘要 【目的】从主题表征词抽取和主题句功能分类两个维度,设计基于潜在狄利克雷分布模型(Latent Dirichlet Allocation,LDA)和语步标注的主题分析方法,并探究方法的有效性与实用性。【方法】采用LDA模型进行主题识别,利用Sentence Transformer模型对主题词短语进行提取,同时构建句子功能分类模型进行语步标注,识别文本句子功能类型,从句子功能维度对主题内容进行细粒度分析。【结果】以农业资源与环境领域论文数据为例进行实证研究,结果表明,相比传统LDA模型,经过提取主题词短语后,识别出的主题表征词可读性和解释性更强,进一步结合语步标注后,主题句子内容分析更为深入。【局限】主题短语表征词扩展内容存在含义相同问题,有待进一步改进表征词的多样性,以整合相同含义的主题短语表征词。【结论】本研究所提方法在主题表征词抽取、主题内容分析方面具有较好的效果,可以提高文本主题挖掘的效率与深度。 [Objective]From the two dimensions of topic representation word extraction and topic sentence function classification,this paper demonstrates a new topic analysis method based on Latent Dirichlet Allocation(LDA)model and move tagging,and explores the effectiveness and practicality of the method.[Methods]LDA model is used to identify the topic,and the Sentence Transformer model is used to extract the subject phrases.Meanwhile,a sentence function classification model is constructed to annotate the steps,identify the functional types of text sentences,and analyze the topic content from the perspective of sentence function.[Results]Based on the data of papers in the field of agricultural resources and environment,the empirical study shows that,compared with the traditional LDA model,the identified subject characterizing words are more readable and explanatory,and further combined with the step annotation,the content analysis of the subject sentence is more in-depth.[Limitations]There is a problem that the extended content of the subject phrase token words are of the same meaning.It is necessary to further improve by integrating the subject phrase token words with the same meaning.[Conclusions]The proposed method in this study achieves a good effect on topic representation word extraction and topic content analysis,which can improve the efficiency and depth of text topic mining analysis.
作者 张辉 串丽敏 郑怀国 赵静娟 齐世杰 ZHANG Hui;CHUAN Limin;ZHENG Huaiguo;ZHAO Jingjuan;QI Shijie(Institute of Data Science and Agricultural Economics,Beijing Academy of Agriculture and Forestry Sciences,Beijing 100097,China)
出处 《数据与计算发展前沿》 CSCD 2023年第5期107-118,共12页 Frontiers of Data & Computing
基金 北京市农林科学院创新能力建设专项:“基于多源数据融合的农业热点前沿主题识别与实证研究”(KJCX20200403) “智库型农业情报研究与服务能力提升”(KJCX20230208) “面向科研管理的情报研究与服务能力提升”(KJCX20230210) “国家新闻出版署农业融合出版知识挖掘与知识服务重点实验室开放基金”(2023KMKS01)
关键词 LDA模型 语步标注 主题短语 主题分析 LDA model move tagging subject phrase subject analysis
  • 相关文献

参考文献13

二级参考文献122

共引文献110

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部