摘要
【目的】提高专利技术功效词自动化抽取的准确度。【方法】采用ChatGPT作为教师模型,ChatGLM3作为学生模型,通过知识蒸馏,将ChatGPT生成的训练数据用于微调ChatGLM3,得到多个技术词抽取模型和功效词抽取模型。采用多个技术词抽取模型分别从专利的摘要、第一权利要求和技术功效语段中抽取技术词,采用功效词抽取模型从技术功效语段中抽取功效词。【结果】相较于ChatGPT,微调后的多个技术词抽取模型和功效词抽取模型呈现出准确率高、召回率低的特点。基于第一权利要求的ChatGLM3微调模型的准确率和F1值最高,分别为0.734和0.724;功效词抽取模型的准确率为0.649,大于商业工具标注功效词的准确率0.530。【局限】本研究的技术领域和专利语言单一,验证数据量偏小,数据清洗规则不够全面。【结论】本研究方案通过知识蒸馏操作,提升了大语言模型自动抽取技术功效词的准确性。同时,本研究能够支持从专利文本中挖掘前沿创新技术、热点技术,支撑更高质量的智能化专利分析。
[Objective]This paper aims to improve the accuracy of automatic extraction of technical words and function effects of patents.[Methods]First,ChatGPT is used as the Teacher-model,and ChatGLM3 is used as the Student-model.Through knowledge distillation,the training data extracted by ChatGPT are used to fine-tune ChatGLM3,resulting in multiple technical word extraction models and a function word extraction model.These models are performed to extract technical words and function words from the abstract,the first claim,and the technical effect segments of patents,respectively.[Results]Compared to ChatGPT,the fine-tuned technical word extraction models and the function word extraction model show higher accuracy and lower recall rates.The ChatGLM3 fine-tuning model of the first claim has the highest accuracy of 0.734 and F1 values of 0.724,respectively.The accuracy of the function word extraction model reached 0.649,which was higher than the accuracy of the commercial tool’s 0.530.[Limitations]This study needs to be further optimized in the following aspects.The technical field and patent language are single,the amount of verification data is small,and the data cleaning rules are not comprehensive enough.[Conclusions]This research scheme improves the accuracy of large language models in automatically extracting technical effects through knowledge distillation operation.Additionally,this study supports mining cutting-edge innovative and hotspot technologies from patents,facilitating higher quality intelligent patent analysis.
作者
王奎芳
吕璐成
孙文君
王翼虎
赵亚娟
Wang Kuifang;Lyu Lucheng;Sun Wenjun;Wang Yihu;Zhao Yajuan(National Science Library,Chinese Academy of Sciences,Beijing 100190,China;Department of Information Resources Management,School of Economics and Management,University of Chinese Academy of Sciences,Beijing 100190,China;Institute of Scientific and Technical Information of China,Beijing 100038,China)
出处
《数据分析与知识发现》
EI
CSSCI
CSCD
北大核心
2024年第8期144-156,共13页
Data Analysis and Knowledge Discovery
基金
国家自然科学基金青年科学基金项目(项目编号:72304268)
2023年度国家资助博士后研究人员计划(C档)(项目编号:GZC20232931)
支撑科技自立自强的知识产权情报导航分析研究(项目编号:E329110602)的研究成果之一
关键词
技术功效词抽取
知识蒸馏
微调大模型
语义相似矩阵
Technical Function Word Extraction
Knowledge Distillation
Fine-Tuning Model
Semantic Similarity Matrix