期刊文献+

基于预训练语言模型和TRIZ发明原理的专利分类方法

Patent Classification Method Based on Pre-trainedLanguage Model and TRIZ Inventive Principle
下载PDF
导出
摘要 为充分挖掘专利文本中已有的解决方案和技术知识,依据发明问题解决理论(theory of inventive problem solving,TRIZ),提出了一种基于预训练语言模型的方法,将其用于面向TRIZ发明原理的中文专利分类研究中。基于整词掩码技术,使用不同数量的专利数据集(标题和摘要)对中文RoBERTa模型进一步预训练,生成特定于专利领域的RoBERTa_patent1.0和RoBERTa_patent2.0两个模型,并在此基础上添加全连接层,构建了基于RoBERTa、RoBERTa_patent1.0和RoBERTa_patent2.0的三个专利分类模型。然后使用构建的基于TRIZ发明原理的专利数据集对以上三个分类模型进行训练和测试。实验结果表明,RoBERTa_patent2.0_IP具有更高的准确率、宏查准率、宏查全率和宏F 1值,分别达到96%、95.69%、94%和94.84%,实现了基于TRIZ发明原理的中文专利文本自动分类,可以帮助设计者理解与应用TRIZ发明原理,实现产品的创新设计。 To fully explore the existing solutions and technical knowledge in patent texts,based on TRIZ(theory of inventive problem solving),a method based on pre-trained language models is proposed for Chinese patent classification research oriented towards TRIZ inventive principles.Based on WWM(whole word masking technology),the Chinese RoBERTa model was further pre-trained with different number of patent datasets(composed of title and abstract of patent),and RoBERTa_patent1.0 and RoBERTa_patent 2.0 models specific to the patent domain were generated.On this basis,a Fully Connected Layer was added to construct three patent classification models based on RoBERTa,RoBERTa_patent1.0 and RoBERTa_patent2.0.Then,the constructed patent datasets based on TRIZ inventive principle was used to train and test the above three patent classification models.The experimental results show that,RoBERTa_patent2.0_IP has higher accuracy,P Macro,R Macro,and F 1Macro,reaching 96%,95.69%,94%,and 94.84%respectively,achieving automatic classification of Chinese patent texts based on TRIZ inventive principle and helping designers understand and apply TRIZ inventive principle and achieve innovative product design.
作者 贾丽臻 白晓磊 JIA Li-zhen;BAI Xiao-lei(College of Transportation Science and Engineering,Civil Aviation University of China,Tianjin 300300,China;College of Aeronautical Engineering,Civil Aviation University of China,Tianjin 300300,China)
出处 《科学技术与工程》 北大核心 2024年第30期13055-13063,共9页 Science Technology and Engineering
基金 中央高校基本科研业务费(3122022052)。
关键词 预训练语言模型 RoBERTa 发明原理 整词掩码技术 文本分类 pre-trained language model ROBERTa(robustly optimized BERT pre-training approach) IP(inventive principle) WWM(whole word masking) text classification
  • 相关文献

参考文献12

二级参考文献140

共引文献96

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部