期刊文献+

基于多特征双向门控神经网络的领域专家实体抽取方法 被引量:4

Domain Expert Entity Extraction Method Based on Multi-Feature Bidirectional Gated Neural Network
下载PDF
导出
摘要 命名实体识别是自然语言处理和信息提取的基本任务,传统专家命名实体识别方法存在过度依赖人工特征标注和分词效果、专家简介中大量专业新词无法识别等问题.本文提出一种基于多特征双向门控神经网络结构并结合条件随机场模型进行领域专家实体抽取方法.该方法首先通过构建领域专家语料库以训练实体抽取模型;接着,使用Bert方法进行字嵌入表示,对语料库专业领域词汇构造要素进行特征分析并提取边界特征;然后,利用双向门控神经网络和注意力机制有效获取特定词语长距离依赖关系;最后,结合条件随机场模型实现命名实体识别.在同一数据集上进行5种方法实验比较分析,结果表明该模型较BiLSTM-CRF和IDCNN-CRF方法F1值提高9.98%以上. Named entity recognition is the basic task of natural language processing(NLP)and information extraction(IE).Traditional expert named entity recognition methods have problems,such as excessive reliance on artificial feature labeling and word segmentation effects,and the inability to recognize a large number of professional new words in the expert profile.This paper proposes a method based on multi-features bidirectional gated neural network structure combined with conditional random field model for the domain expert entity extraction.Firstly,train the entity extraction model by constructing a domain expert corpus.Secondly,use the Bert method to represent the word embedding,and perform feature analysis on the vocabulary structure elements of the professional field of the corpus and extract the boundary features.Thirdly,use the bidirectional gated neural network and attention mechanism to effectively obtain the long-distance dependence of specific words.Finally,combine the conditional random field model to achieve named entity recognition.The experimental comparison and analysis of five methods on the same data set show that the F1 value of the model is improved by more than 9.98%compared with BiLSTM-CRF and IDCNN-CRF.
作者 张柯文 李翔 严云洋 朱全银 马甲林 Zhang Kewen;Li Xiang;Yan Yunyang;Zhu Quanyin;Ma Jialin(Faculty of Computer and Software Engineering,Huaiyin Institute of Technology,Huai’an 223005,China)
出处 《南京师大学报(自然科学版)》 CAS CSCD 北大核心 2021年第1期128-135,共8页 Journal of Nanjing Normal University(Natural Science Edition)
基金 国家自然科学基金项目(71874067、61602202) 国家重点研发计划项目(2018YFB1004904) 江苏省产学研合作项目(BY2020067、BY2020309) 江苏省农业科技自主创新资金项目(CX203074) 淮阴工学院研究生科技创新计划项目(HGYK202024).
关键词 命名实体识别 自然语言处理 信息提取 多特征 边界特征 named entity recognition natural language processing information extraction multi-feature boundary feature
  • 相关文献

参考文献6

二级参考文献54

  • 1刘扬,俞士汶,于江生.CCD语义知识库的构造研究[J].小型微型计算机系统,2005,26(8):1411-1415. 被引量:9
  • 2Erhard R,Hong-Hai D.Data Cleaning: Problems and Current Approaches[J].IEEE Data Engineering Bulletin,2000,23(4): 3-13
  • 3Jain A K,Murty M N,Flynn P J.Data Clustering: A Survey[J].ACM.Comput.Surv.,1999,31: 264-323
  • 4Yong Yu,Trouve A.A Non-linear K-means Algorithm and Its Application to Unsupervised Clustering[J].Signal Processing,2002 6th International Conference,2002,2:1146-1149
  • 5Chu Shu-chuan,Roddick J F,Tsong-Yi Chen,et al.Efficient Search Approaches for K-medoids-based Algorithms[J].TENCON '02,Proceedings.2002 IEEE Region 10 Conference on Computers,Communications,Control and Power Engineering,2002,1:712a-715a
  • 6Elliott R.EM:A Simplified Representation.Antennas and Propagation Society Newsletter[J].IEEE,1980,22(4):10-11
  • 7Arning A,Agrawal R,Raghavan P.A Linear Method for Deviation Detection in Larger Databases[J].Knowledge Discovery and Data Mining,1996:164-169
  • 8Chaudhuri S,Dayal U,Ganti V.Database Technology for Decision Support Systems[J].Computer,2001,34(12):48-55
  • 9Jiawei H,Micheline K.Data Mining: Concepts and Techniques[M].Copyright by Morgan Kaufmann Publishers,Inc.,2001,5:73-74
  • 10张瑞朋,宋柔.否定词跨标点句管辖的判断[J].中文信息学报,2007,21(5):131-135. 被引量:3

共引文献175

同被引文献69

引证文献4

二级引证文献39

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部