期刊文献+

基于多元数据融合的科学文献主题识别研究 被引量:5

Research on the Topic Identification of Scientific Literature Based on Multivariate Data Fusion
原文传递
导出
摘要 [目的/意义]科学文献的主题识别研究是科研管理的重要内容之一,如何全面把握文献的多元数据、提升自动文献主题识别的效果是一个值得研究的问题。[方法/过程]文献的关键词、摘要是判断文献主题的重要依据,文章提出基于文献多元数据融合的主题识别模型,使用Word2vec模型、AP聚类及Node2vec模型表示出关键词层的主题向量,使用LDA模型表示出摘要层的主题向量,通过多视图聚类中的SGF方法进行数据融合并识别文献主题。[结果/结论]以不同规模的文献集为例,通过主题识别研究,验证该模型识别效果的准确性和可解释性优于典型LDA方法、DoC-LDA模型。 [Purpose/significance]The research on topic identification of scientific literature is one of the important contents of scientific research management.How to comprehensively grasp the multivariate data of literature and effectively improve the accuracy of automatic literature topic identification is a problem worthy of research.[Method/process] Keywords and abstracts of documents are important basis for judging document topics.This paper proposes a topic identification model based on multi-data fusion of documents.Word2vec model,AP clustering and Node2vec model are used to represent the topic vector of the keyword layer.The topic vector of the abstract layer is represented by the LDA model,and the SGF method in the multi-view clustering method is used to perform data fusion and extract document topics.[Result/conclusion]Taking document sets of different scales as an example,through topic identification research,it is verified that the accuracy and interpretability of the recognition effect of the model are better than the typical LDA method and the Doc-LDA model.
作者 邱均平 孙月瑞 周贞云 Qiu Junping;Sun Yuerui;Zhou Zhenyun(Chinese Academy of Science and Education Evaluation,Hangzhou Dianzi University,Zhejiang,310018;School of Management,Hangzhou Dianzi University,Zhejiang,310018;Academy of Data Science and Informatics,Hangzhou Dianzi University,Zhejiang,310018)
出处 《情报资料工作》 CSSCI 北大核心 2022年第6期14-20,共7页 Information and Documentation Services
基金 2019年国家社会科学基金重大项目“基于大数据的科教评价信息云平台构建和智能服务研究”(项目编号:19ZDA348) 2020年浙江省软科学研究计划重点项目“创新强省背景下浙江高校科技创新竞争力评价及提升研究”(项目编号:2020C25027)的研究成果之一。
关键词 科学文献 主题识别 数据融合 多视图聚类 多元数据 scientific literature topici dentification data fusion multi-view clustering multivariate data
  • 相关文献

参考文献10

二级参考文献137

  • 1张勤,马费成.国外知识管理研究范式——以共词分析为方法[J].管理科学学报,2007,10(6):65-75. 被引量:482
  • 2BLEO D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J].Journal of machine learning research, 2003,3:993-1022.
  • 3SCOTT J. Social network analysis[M]. London:Sage, 2012.
  • 4BLEI D M, LAFFERTY J D. A correlated topic model of science[J]. The annals of applied statistics, 2007,1(1):17-35.
  • 5GRIFFITHS T L,STEYVERS M. Finding scientific topics[J].Proceedings of the National Academy of Sciences of the United States of America, 2004,101(1):5228-5235.
  • 6HE Q, CHEN B, PEI J, et al. Detecting topic evolution in scientific literature:how can citations help?[C]//Proceedings of the 18th ACM conference on information and knowledge management. New York:ACM, 2009:957-966.
  • 7ALSUMAIT L, BARBARà D, DOMENICONI C. On-line LDA:adaptive topic models for mining text streams with applications to topic detection and tracking[C]//Eighth IEEE international conference on data mining. Piscataway:IEEE, 2008:3-12.
  • 8HASSAN S U, HADDAWY P. Analyzing knowledge flows of scientific literature through semantic links:a case study in the field of energy[J]. Scientometrics, 2015, 103(1):33-46.
  • 9DIETZ L, BICKEL S, SCHEFFER T. Unsupervised prediction of citation influences[C]//Proceedings of the 24th international conference on machine learning.New York:ACM, 2007:233-240.
  • 10STEYVERS M, SMYTH P, ROSEN-ZVI M, et al. Probabilistic author-topic models for information discovery[C]//Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining.New York:ACM,2004:306-315.

共引文献232

同被引文献81

引证文献5

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部