摘要
[目的/意义]基于“论文-专利”关联视角,文章通过新兴技术抽取与量化研究的方式识别“从论文到专利的创新链条上对未来技术发展趋势有引领作用”的新兴术语。[方法/过程]创新性地结合了Termolator算法和GPT提示学习的新术语提取方法。该方法通过对比实验,探索了GPT提示学习在术语抽取中的应用效果,并且显著提高了术语抽取的准确性和召回率。进一步,利用Minibatch Kmeans++算法对术语识别结果进行聚类,形成技术主题,并通过多维指标量化分析方法对这些新兴技术主题进行识别和分类。[结果/结论]将新兴技术术语划分为热点型、前沿型、应用型和潜在型新兴术语,实现对技术术语主题的有效识别和分类。研究成果表明,该方法能够有效揭示大模型研究领域中对未来技术发展趋势有引领作用的新兴技术,为新兴技术术语识别提供新途径。[局限]技术术语向量化表征和新兴技术主题识别指标阈值确定存在一定局限性,需要进行进一步研究。
[Purpose/significance]Based on the perspective of“paper-patent”correlation,this paper identifies emerging terms that“lead the development trend of future technology in the innovation chain from paper to patent”through the way of emerging technology extraction and quantitative research.[Method/process]This research innovatively adopts Termolator algorithm and GPT prompt learning to extract new terms.This method explores the application effect of GPT prompt learning in term extraction through comparative experiments,and significantly improves the accuracy and recall rate of term extraction.Furthermore,Minibatch Kmeans++algorithm is used to cluster the results of term recognition,forming technical topics,and these emerging technical topics are identified and classified by multidimensional index quantitative analysis method.[Result/conclusion]This paper divides the emerging technical terms into hot,cutting-edge,applied and potential emerging terms,and realizes the effective recognition and classification of the topics of technical terms.The results show that this method can effectively reveal the emerging technologies that lead the development trend of future technologies in the field of large model research and provide a new way for the identification of emerging technology terms.[Limitations]This study has certain limitations in the vectorization representation of technical terms and the determination of the threshold of emerging technology topic recognition indicators which deserve further study.
作者
张凯
吕璐成
韩涛
赵亚娟
Zhang Kai;LüLucheng;Han Tao;Zhao Yajuan(National Science Library,Chinese Academy of Sciences,Beijing 100190;Department of Information Resources Management,School of Economics and Management,University of Chinese Academy of Sciences,Beijing 100190)
出处
《情报理论与实践》
CSSCI
北大核心
2024年第9期183-191,共9页
Information Studies:Theory & Application
基金
国家自然科学基金青年科学基金项目“技术距离视角下的技术融合模式、特征及预测研究”(项目编号:72304268)
国家社会科学基金项目“支撑AI4Science的科技图书馆知识服务内容研究”(项目编号:22BTQ019)的成果。
关键词
GPT提示学习
大语言模型
新兴技术识别
文本挖掘
GPT prompt learning
large language models
emerging technology identification
text mining