期刊文献+

基于改进LDA模型的铁路领域主题发现研究 被引量:6

Research on Railway Field Topic Discovery Based on Improved LDA Model
下载PDF
导出
摘要 高效挖掘海量铁路领域科研成果数据中蕴含的主要内容是铁路领域科研人员在大数据时代亟待解决的重要问题。LDA模型是用于主题发现的主流方法,但在面向多单词短语居多的铁路领域研究文献时存在使用受限的问题,因此本文提出一种LDA模型的改进算法:一方面在构建主题模型前,对文本作预处理时抽取语料中的名词短语、动词短语、名词和动词;另一方面在主题模型构建完成后,融合TextRank算法与PMI算法得出关键词组块,并以此替换LDA主题识别结果中的主题词,进一步丰富主题的语义。最后,以铁路领域“牵引供电系统”为例开展实证研究。结果表明,本文提出的改进LDA模型有助于提升铁路领域主题发现结果的可解释性与可识别性,可以为后续铁路领域科研管理中知识服务的实现提供有效的方法支持。 The era of big data has brought difficulties for researchers in the railway field to quickly select the main research directions,obtain international research trends,and understand international research hotspots.Efficiently excavating the main content contained in the massive scientific and technological literature in the railway field has become an important problem to be solved urgently by researchers in the railway field.In view of the fact that the topic model represented by LDA is used as the mainstream method for topic discovery,there is a problem of limited use in the face of scientific and technological literature in the railway field with many multi-word phrases.In this study,we innovatively propose a semantic enhanced LDA topic model.On the basis of in-depth preprocessing of extracting nouns phrases,verb phrases,nouns and verbs,we combine TextRank algorithm and PMI algorithm to obtain keyword chunks.We use the sorted keyword chunks to replace the topic words in the LDA topic recognition results.In this study,we conduct an empirical study on the“traction power supply system”as an example.The results show that the semantic enhanced LDA topic model proposed in this paper can help to improve the interpretability and recognizability of topic discovery results in the railway field.In addition,it can also provide effective method support for the realization of knowledge services in scientific research management in the railway field.
作者 龙艺璇 安源 王东晋 翟夏普 伊惠芳 LONG YiXuan;AN Yuan;WANG DongJin;ZHAI XiaPu;YI HuiFang(Scientific&Technical Information Research Institute,Chinese Academy of Railway Sciences,Beijing 100081,P.R.China;National Science Library,Chinese Academy of Sciences,Beijing 100190,P.R.China)
出处 《数字图书馆论坛》 CSSCI 2022年第2期26-32,共7页 Digital Library Forum
基金 中国铁道科学研究院集团有限公司科研开发项目“铁路科研知识图谱及智能知识服务体系研究”(2020YJ147)资助。
关键词 主题发现 铁路领域 语义增强 LDA主题模型 Topic Discovery Railway Field Semantic Enhancement LDA Topic Model
  • 相关文献

参考文献11

二级参考文献98

  • 1蒋颖.1995~2004年文献计量学研究的共词分析[J].情报学报,2006,25(4):504-512. 被引量:90
  • 2王曰芬,宋爽,卢宁,朱烨.共现分析在文本知识挖掘中的应用研究[J].中国图书馆学报,2007,33(2):59-64. 被引量:44
  • 3张晗,王晓瑜,崔雷.共词分析法与文献被引次数结合研究专题领域的发展态势[J].情报理论与实践,2007,30(3):378-380. 被引量:68
  • 4WETTLER M. , RAPP R, Computation of word associations based on the co-occurrences of words in large corporation [ EB/OL]. [2010-09-12]. http: //acl. ldc. upenn.edu/W/W93/ W93- 0310. pdf.
  • 5SHORT J E. Information lifecycle management in perspective: ini- tial findings from surveys of top management [ EB/OL]. [2010- 09-12]. http://isic, ucsd. edu/pdf/ISrCILMWP06-01, pdf.
  • 6CALLON M, COURFIAL J P. Co-word analysis as a tool for de- scribing the network of interactions between basic and techno- logical research: the case of polymer chemistry [J]. Sciento- metrics, 1991, 22 (1): 155-205.
  • 7马庆国.应用统计学:数理统计方法、数据获取与SPSS应用[M].北京:科学出版社,2005.
  • 8HU C P, HU J M, DENG S L, et al. A co-word analysis of li- brary and information science in China [ J ]. Scientometrics, 2013, 97 (2): 369-382.
  • 9WANG X, MCCALLUM A. Topics over time model of topical trends [ J ] a non-Markov Sigkdd, 2006 : 424-433.
  • 10BLEI D, LAFFERTY j. Dynamic topic models [ C] JJProceed- ings of the 23rd International Conference on Machine Learning. Pittsburgh, Pennsylvania. USA, 2006:113-120.

共引文献238

同被引文献59

引证文献6

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部