期刊文献+

基于领域适应嵌入的军事命名实体识别 被引量:2

Name Entity Recognition for Military Based on Domain Adaptive Embedding
下载PDF
导出
摘要 为了解决单一军事领域语料不足导致的领域嵌入空间质量欠佳,使得深度学习神经网络模型识别军事命名实体精度较低的问题,文中从字词分布式表示入手,通过领域自适应方法由额外的领域引入更多有用信息帮助学习军事领域的嵌入。首先建立领域词典,将其与CRF算法结合,对收集到的通用领域语料和军事领域语料进行领域自适应分词,作为嵌入训练语料,并将词向量作为特征与字向量拼接,以丰富嵌入信息并验证分词效果;然后对训练所得的通用领域和军事领域的异构嵌入空间进行领域自适应转换,生成领域自适应嵌入,并作为基础模型BiLSTM-CRF层的输入;最后通过CoNLL-2000进行识别评价。实验结果表明,在相同模型下,输入领域适应嵌入比输入一般分词后的语料训练所得的军事领域嵌入,其模型识别的精确率(P)、召回率(R)、综合F1值(F1)分别提高了2.17%,1.04%,1.59%。 In order to solve the poor quality problem of domain embedding space caused by inadequate military corpus which makes low accuracy of applying deep neural network model to military named entity recognition,this paper introduces a domain adaptive method to help learn the embedding of military fields from more useful information of additional fields through distributed representation of words.First,we establish the domain dictionary and combine CRF algorithm to perform domain adaptive word segment with the collected general domain and military areas corpus as training corpus for embedding,and word vectors are used as features and spliced with character vectors to enrich the embedding information and to validate the effect of word segmentation.Then the domain adaptive transformation is carried out to the heterogeneous embedded space of the general domain and the military domain,and the domain adaptive embedding is generated,as the input to BiLSTM-CRF layer of base model.At last,the recognition evaluation is carried out through CoNLL-2000.The experimental results show that,under the same model,the recognition precision rate(P),recall rate(R),and integrated F1value(F1)of the proposed method are improved by 2.17%,1.04%,and 1.59%,respectively,compared with the military field embedding trained by a corpus which is obtained from general word segmentation.
作者 刘凯 张宏军 陈飞琼 LIU Kai;ZHANG Hong-jun;CHEN Fei-qiong(School of Graduate,Army Engineering University of PLA,Nanjing 210000,China;College of Command and Control Engineering,Army Engineering University of PLA,Nanjing 210000,China)
出处 《计算机科学》 CSCD 北大核心 2022年第1期292-297,共6页 Computer Science
关键词 字向量 词向量 中文分词 领域自适应 命名实体识别 Character embedding Word embedding Chinese word segmentation Domain adaptation Named entity recognition
  • 相关文献

参考文献9

二级参考文献47

  • 1张海泉.武器家谱[J].当代军事文摘,2005(3):19-19. 被引量:1
  • 2骆正清,陈增武,胡上序.一种改进的MM分词方法的算法设计[J].中文信息学报,1996,10(3):30-36. 被引量:28
  • 3俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94. 被引量:157
  • 4Charles Sutton, Andrew McCallum. An Introduction to Conditional Random Fields[M]. Foundations and Trends in Machine Learning, 2010.
  • 5余军,陈晓鸥.命名实体识别:One-at-a-time or All-at-once? Word-based or Character-based?[C]//萧国政,何炎祥,孙茂松.中国计算技术与语言问题研究:第七届中丈信息处理国际会议论文集.北京:电子工业出版社.2007:81-89.
  • 6CRF++: Yet another crf toolkit, http://crfpp.sourceforge.net/.
  • 7Nianwen Xue.Chinese word segmentation as character tagging[J]. International Journal of Computational Linguistics and Chinese Language Processing,2003,8(1):29-48.
  • 8Huihsin Tseng,Pichuan Chang,Galen Andrew,et al.A conditional random field word segmenter for sighan bakeoff 2005[C]//Proceedings of the fourth SIGHAN workshop.2005:168-171.
  • 9Yue Zhang,Stephen Clark.Chinese segmentation with a word-based perceptron algorithm[C]//Proceedings of the 45th ACL.2007:840-847.
  • 10Xu Sun,Yaozhong Zhang,Takuya Matsuzaki,et al.A discriminative latent variable chinese segmenter with hybrid word/character information[C]//Proceedings of NAACL.2009:56-64.

共引文献138

同被引文献17

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部