摘要
[研究目的]专利实体抽取是基于专利文本的技术识别的基础。目前专利实体抽取任务面临自动化程度和准确率较低等问题,该研究从两方面对此进行改进:一是建立特定领域的高质量专利语料库,二是将先进的算法模型运用到专利实体抽取中。[研究方法]定义了包含13种实体类型的细粒度信息体系,并据此对921篇类脑智能专利的标题和摘要进行人工标注,此后运用Bert-BiLSTM-CRF模型,融合深度学习和机器学习对类脑智能专利实体进行识别。[研究结论]模型在总体上获得0.8的准确率、召回率和F1值,不同类型实体的识别效果具有差异。为了验证模型的性能,设计了几个对比实验。结果显示,微调数据和增加训练规模可以提高模型性能,本模型性能优于同时期一些经典模型。
[Research purpose]Patent entity extraction is the basis of technology recognition from patent texts.At present,patent entity extraction is faced with the problem of low automation and accuracy.This study intended to improve this problem from two aspects:one is to establish a high-quality patent corpus in a specific field,and the other is to apply an advanced algorithm model to patent entity extraction.[Research method]In this regard,a fine-grained information system was defined which contained 13 entity types and the titles and abstracts of 921 patents in the field of brain-inspired intelligence were manually marked according to the annotation rules.Then a Bert-BiLSTM-CRF model which integrates deep learning and machine learning was used to identify the brain-inspired intelligence patent entities.[Research conclusion]The model achieved accuracy rate,recall rate and F1 value of 0.8 on the whole and entities performed differently according to their types.In order to verify the performance of the model,several comparative experiments were designed.The results showed that fine-tuning data and increasing training scale could improve the performance of the model.Moreover,the model is superior to some classical models during the same period.
作者
邢晓昭
苑朋彬
陈亮
任亮
余池
Xing Xiaozhao;Yuan Pengbin;Chen Liang;Ren Liang;Yu Chi(Institute of Scientific and Technological Information of China,Beijing 100038)
出处
《情报杂志》
北大核心
2024年第6期126-133,144,共9页
Journal of Intelligence
基金
国家社会科学基金青年项目“基于多源知识网络的颠覆性技术分类识别方法研究”(编号:21CTQ039)研究成果。