期刊文献+

基于主题模型的科技报告文档聚类方法研究 被引量:16

Research on the Text Clustering Method of Science and Technology Reports Based on the Topic Model
原文传递
导出
摘要 [目的/意义]探索实践以科技报告为文献载体形式的融合主题模型的文本聚类方法,拓展基于科技文献进行技术监测服务的新领域,提出基于科技报告进行语义分析的新方法。[方法/过程]以国家科技报告服务系统中的科技报告为数据源,首先基于LDA主题模型对经过文本预处理的科技报告进行主题挖掘,再基于Ward与K-means相结合的聚类算法对包含主题分布信息的文本向量进行聚类分析,尝试提出一种适合科技报告文档聚类的文本挖掘新方法。[结果/结论]实验结果表明,LDA主题模型能有效准确挖掘科技报告中的主题信息,所提出的Ward与K-means相结合的聚类算法对科技报告的聚类效果也优于其它传统聚类算法。 [ Purpose/significance] This paper explores the method of text clustering in the science and technology reports based on the topic model, develops new scientific literature technology monitoring areas, and puts forward a new semantic analysis method based on science and technology reports. [ Method/process] Based on the national science and technology report service system, firstly, it conducted topic mining based on the LDA model after the text preprocessing; secondly, a clustering analysis based on the combination of K-means and Ward was carried out based on the text vector of the abstract containing theme distribution information. A proper text clustering method for the text mining suitable for the science and technical report was proposed. [ Result/conclusion] The experimental results show that the LDA model can be effectively and accurately used in the topic mining of science and technology reports, and the clustering effect of the combination of Ward and K-means proposed in this paper is better than that of other traditional clustering algorithms in sci- ence and technology reports.
出处 《图书情报工作》 CSSCI 北大核心 2018年第4期113-120,共8页 Library and Information Service
基金 吉林省教育科学“十三五”规划项目“项目教学法在高校基础计算机教学中的应用研究”(项目编号:GH170061)研究成果之一
关键词 科技报告 主题模型 LDA 文本聚类 science and technology report topic model LDA text clustering
  • 相关文献

参考文献11

二级参考文献248

共引文献398

同被引文献285

引证文献16

二级引证文献111

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部