摘要
命名实体识别是自然语言处理和信息提取的基本任务,传统专家命名实体识别方法存在过度依赖人工特征标注和分词效果、专家简介中大量专业新词无法识别等问题.本文提出一种基于多特征双向门控神经网络结构并结合条件随机场模型进行领域专家实体抽取方法.该方法首先通过构建领域专家语料库以训练实体抽取模型;接着,使用Bert方法进行字嵌入表示,对语料库专业领域词汇构造要素进行特征分析并提取边界特征;然后,利用双向门控神经网络和注意力机制有效获取特定词语长距离依赖关系;最后,结合条件随机场模型实现命名实体识别.在同一数据集上进行5种方法实验比较分析,结果表明该模型较BiLSTM-CRF和IDCNN-CRF方法F1值提高9.98%以上.
Named entity recognition is the basic task of natural language processing(NLP)and information extraction(IE).Traditional expert named entity recognition methods have problems,such as excessive reliance on artificial feature labeling and word segmentation effects,and the inability to recognize a large number of professional new words in the expert profile.This paper proposes a method based on multi-features bidirectional gated neural network structure combined with conditional random field model for the domain expert entity extraction.Firstly,train the entity extraction model by constructing a domain expert corpus.Secondly,use the Bert method to represent the word embedding,and perform feature analysis on the vocabulary structure elements of the professional field of the corpus and extract the boundary features.Thirdly,use the bidirectional gated neural network and attention mechanism to effectively obtain the long-distance dependence of specific words.Finally,combine the conditional random field model to achieve named entity recognition.The experimental comparison and analysis of five methods on the same data set show that the F1 value of the model is improved by more than 9.98%compared with BiLSTM-CRF and IDCNN-CRF.
作者
张柯文
李翔
严云洋
朱全银
马甲林
Zhang Kewen;Li Xiang;Yan Yunyang;Zhu Quanyin;Ma Jialin(Faculty of Computer and Software Engineering,Huaiyin Institute of Technology,Huai’an 223005,China)
出处
《南京师大学报(自然科学版)》
CAS
CSCD
北大核心
2021年第1期128-135,共8页
Journal of Nanjing Normal University(Natural Science Edition)
基金
国家自然科学基金项目(71874067、61602202)
国家重点研发计划项目(2018YFB1004904)
江苏省产学研合作项目(BY2020067、BY2020309)
江苏省农业科技自主创新资金项目(CX203074)
淮阴工学院研究生科技创新计划项目(HGYK202024).
关键词
命名实体识别
自然语言处理
信息提取
多特征
边界特征
named entity recognition
natural language processing
information extraction
multi-feature
boundary feature