摘要
[研究目的]从过滤主题词的角度出发,对LDA概率主题抽取的结果进行改进,使主题之间划分更加准确、主题内部关键词更加聚类,并通过演化分析跟踪技术发展动态。[研究方法]利用专利之星检索系统获取我国制氢领域的专利数据,采用LDA概率主题模型抽取主题与关键词;结合定义主题标识词以及引入负采样模型,提出FW-LDA的方法;应用该方法对该领域进行相邻时间切片的主题关键词演化分析。[研究结论]通过逐点互信息和皮尔逊相关系数的指标对比,验证了FW-LDA方法的有效性;并且通过这种方法对我国制氢领域的专利数据进行演化分析,以把握发展规律,为技术研究工作提供参考价值。
[Research purpose]From the perspective of filtering topic words,this paper improves LDA probabilistic topic extraction results,which makes the division between topics more accurate,the keywords inside the topic more clustered,and the development of technology was tracked through evolution analysis.[Research method]The patent data of hydrogen production field in China was obtained by using CPRS,and a FW-LDA method for text topic extraction was proposed.This method is a combination improvement method that uses LDA probabilistic topic model to extract topics and keywords,combined with the definition of topic identifiers and the introduction of negative sampling model.Then the FW-LDA method is used to analyze the evolution of topic keywords in adjacent time slices.[Research conclusion]The effectiveness of the FW-LDA method is verified by comparing the indexes of Pointwise Mutual Information and Pearson correlation coefficient.Using this method to conduct the evolution analysis of patent data in hydrogen production field in China can help grasp the development law and provide reference value for technical research.
作者
刘晋霞
张志宇
王芳
Liu Jinxia;Zhang Zhiyu;Wang Fang(School of Economics and Management, Taiyuan University of Science and Technology, Taiyuan 030024)
出处
《情报杂志》
CSSCI
北大核心
2022年第7期57-64,共8页
Journal of Intelligence
基金
山西省哲学社会科学规划课题项目“推动大数据产业与山西装备制造业融合创新路径研究”(编号:2020YY151)
山西省社会科学院(山西省人民政府发展研究中心)规划课题项目“山西省数字政府共建共享共治共用研究”(编号:YWYB202158)
山西省社会科学院(山西省人民政府发展研究中心)2021年度青年课题立项名单“山西数字政府建设路径研究”(编号:YWQN202147)。
关键词
FW-LDA方法
专利主题词
LDA概念
专利分类号
主题标识词
负采样模型
组合改进
FW-LAD method
patent topic words
LDA probabilistic
patent classification number
topic identifiers
negative sampling model
cobination improvement