面向技术识别的专利实体抽取--以类脑智能领域为例

Patent Entity Extraction for Technology Recognition:A Case Study of Brain-Inspired Intelligence

下载PDF

导出

摘要 [研究目的]专利实体抽取是基于专利文本的技术识别的基础。目前专利实体抽取任务面临自动化程度和准确率较低等问题,该研究从两方面对此进行改进:一是建立特定领域的高质量专利语料库,二是将先进的算法模型运用到专利实体抽取中。[研究方法]定义了包含13种实体类型的细粒度信息体系,并据此对921篇类脑智能专利的标题和摘要进行人工标注,此后运用Bert-BiLSTM-CRF模型,融合深度学习和机器学习对类脑智能专利实体进行识别。[研究结论]模型在总体上获得0.8的准确率、召回率和F1值,不同类型实体的识别效果具有差异。为了验证模型的性能,设计了几个对比实验。结果显示,微调数据和增加训练规模可以提高模型性能,本模型性能优于同时期一些经典模型。 [Research purpose]Patent entity extraction is the basis of technology recognition from patent texts.At present,patent entity extraction is faced with the problem of low automation and accuracy.This study intended to improve this problem from two aspects:one is to establish a high-quality patent corpus in a specific field,and the other is to apply an advanced algorithm model to patent entity extraction.[Research method]In this regard,a fine-grained information system was defined which contained 13 entity types and the titles and abstracts of 921 patents in the field of brain-inspired intelligence were manually marked according to the annotation rules.Then a Bert-BiLSTM-CRF model which integrates deep learning and machine learning was used to identify the brain-inspired intelligence patent entities.[Research conclusion]The model achieved accuracy rate,recall rate and F1 value of 0.8 on the whole and entities performed differently according to their types.In order to verify the performance of the model,several comparative experiments were designed.The results showed that fine-tuning data and increasing training scale could improve the performance of the model.Moreover,the model is superior to some classical models during the same period.

作者邢晓昭苑朋彬陈亮任亮余池 Xing Xiaozhao;Yuan Pengbin;Chen Liang;Ren Liang;Yu Chi(Institute of Scientific and Technological Information of China,Beijing 100038)

机构地区中国科学技术信息研究所

出处《情报杂志》北大核心 2024年第6期126-133,144,共9页 Journal of Intelligence

基金国家社会科学基金青年项目“基于多源知识网络的颠覆性技术分类识别方法研究”(编号:21CTQ039)研究成果。

关键词专利实体专利文本专利挖掘技术识别深度学习机器学习 Bert-BiLSTM-CRF模型 patent entity patent text patent mining technology recognition deep learning machine learning Bert-BiLSTM-CRF model

分类号 G350 [文化科学—情报学]

引文网络
相关文献

参考文献3

1姜彩红,乔晓东,朱礼军.基于本体的专利摘要知识抽取[J].现代图书情报技术,2009(2):23-28. 被引量：15
2王密平,王昊,邓三鸿,吴志祥.基于CRFs的冶金领域中文专利术语抽取研究[J].现代图书情报技术,2016(6):28-36. 被引量：13
3曹树金,李睿婧.基于专利文献摘要的创新知识图谱构建与应用[J].情报理论与实践,2022,45(11):21-28. 被引量：8

二级参考文献56

1夏天,樊孝忠,刘林.利用JNI实现ICTCLAS系统的Java调用[J].计算机应用,2004,24(B12):177-178. 被引量：24
2葛煦,卢宝华,杨湘华.谈高校科技发展中专利文献的利用[J].技术与创新管理,2005,26(1):68-70. 被引量：6
3王庆民.专利信息的情报功能和专利情报分析[J].现代情报,2007,27(7):223-225. 被引量：39
4Vintar S,Buitelaar P,Ripplinger B. et al. An Efficient and Flexible Format for Linguistic and Semantic Annotation: Proceedings of LREC [ J ]. Online Review, 2003,13 ( 6 ) :466 - 469.
5ArtEquAkt from The University of Southampton [ EB/OL]. [ 2008 - 08-30]. http ://www. aktors, org/technologies/artequakt/.
6Advanced Knowledge Technologies [ EB/OL]. [ 2008 - 08 - 30 ]. http ://www. aktors, org/akt/.
7Semantic Knowledge Technologies [ EB/OL]. [ 2008 - 08 - 30 ]. http ://www. sekt - project, com/.
8Intelligent Search Agent for Information Extraction and Synthesis on the Web [ EB/OL ]. [ 2008 -08 -30 ]. http ://www. ntu. edu. sg,/ sci/research/knowledge, html.
9What is Protege[ EB/OL]. [ 2008 -06 -10 ]. http://protege. stanford, edu/overview/index, html.
10GATE : An Application Developer' s Guide [ EB/OL ]. [ 2008 - 06 - 30 ]. http ://www. dcs. shef. ac. uk/- valyt, diana, kalian, Hamish.

共引文献32

1方曙,胡正银,庞弘燊,张娴.基于专利文献的技术演化分析方法研究[J].图书情报工作,2011,55(22):42-46. 被引量：46
2陈颖,张晓林.专利技术功效矩阵构建研究进展[J].现代图书情报技术,2011(11):1-8. 被引量：20
3梁田,胡正银,程欣,刘春江,方曙,杨志萍.基于知识分类体系的专利检索系统[J].情报理论与实践,2012,35(4):99-102. 被引量：2
4翟东升,张欣琦,张杰.Derwent专利本体设计与构建[J].情报科学,2013,31(12):95-100. 被引量：2
5胡正银,方曙.专利文本技术挖掘研究进展综述[J].现代图书情报技术,2014(6):62-70. 被引量：17
6张兆锋,桂婕,李颖,杜永萍.中文专利信息资源深加工方案设计与实证研究[J].数字图书馆论坛,2014(7):45-51. 被引量：5
7孟令恩,李颖,何彦青,屈鹏,王惠临.基于语义角色标注的专利主题提取研究[J].图书情报工作,2014,58(19):19-24. 被引量：4
8赵飞龙,马建红.面向专利的功能信息自动标注方法研究[J].重庆邮电大学学报（自然科学版）,2015,27(2):273-278. 被引量：4
9文必龙,李云静.基于油田领域本体的信息抽取技术研究[J].计算机技术与发展,2015,25(7):226-229. 被引量：5
10姜春涛.自动标注中文专利的引文信息[J].现代图书情报技术,2015(10):81-87. 被引量：1

1武建龙,刘禹彤,陈劲,王今,鲍萌萌.基于专利挖掘和Gompertz模型的颠覆性技术识别方法研究[J].科研管理,2024,45(4):62-72.
2葛凡,夏庆丰,李振,王海峰,黄璐.用于治疗特应性皮炎的小分子JAK抑制剂及其专利研究[J].中国新药杂志,2024,33(5):417-425.
3孙一平,谢晶,黄梅,蔡学军.教育和人才工作协同发展视角下职业教育资历框架的建设路径[J].中国职业技术教育,2023(35):48-52.
4刘学.《中国塑料》首届高分子材料技术创新与应用论坛上的新观点、新成果[J].中国塑料,2024,38(4):124-128.
5唐林垚.公司法如何促进模型可信与价值对齐[J].东方法学,2024(2):76-87. 被引量：1

情报杂志

2024年第6期

浏览历史

内容加载中请稍等...

面向技术识别的专利实体抽取--以类脑智能领域为例

参考文献3

二级参考文献56

共引文献32

相关作者

相关机构

相关主题

浏览历史