期刊文献+

基于Lattice LSTM的古汉语命名实体识别 被引量:15

Named Entity Recognition in Field of Ancient Chinese Based on Lattice LSTM
下载PDF
导出
摘要 基于《四库全书》数据集,研究古汉语的命名实体识别技术。提出了基于Lattice LSTM模型的古汉语命名实体识别算法,该方法将字符序列信息和词序列信息共同作为模型的输入。采用甲言(jiayan)分词工具,利用word2vec训练古文字、词向量并作为Lattice LSTM模型的输入,提升了古汉语命名实体识别的效果。基于Lattice LSTM模型和预训练的古文字、词向量,提高了古汉语的实体识别效果,相比传统的BiLSTM-CRF模型,其F1分数提升3.95%左右。 Investigated the named entity recognition problem of ancient Chinese literature based on the Complete Collection of Four Treasuries dataset.Proposed an algorithm for named entity recognition of ancient Chinese literature based on the Lattice LSTM model.This method combines both character sequence information and word sequence information as input to the model.Using jiayan word segmentation tool,word2vec is used to train character and word level embedding of ancient Chinese as input to the Lattice LSTM model,which improves the performance of named entity recognition based on ancient Chineseliterature.Based on the Lattice LSTM model and pre-trained character and word level embedding of ancient Chinese,the performance of named entity recognition based on ancient Chinese literature is improved.Compared with the traditional Bi-LSTM-CRF model,its F1 score is improved by about 3.95%.
作者 崔丹丹 刘秀磊 陈若愚 刘旭红 李臻 齐林 CUI Dan-dan;LIU Xiu-lei;CHEN Ruo-yu;LIU Xu-hong;LI Zhen;QI Lin(Computer School,Beijing Information Science and Technology University,Beijing 100192,China)
出处 《计算机科学》 CSCD 北大核心 2020年第S02期18-22,共5页 Computer Science
基金 国家重点研发计划课题(2017YFB1400402)。
关键词 古汉语 命名实体识别 BiLSTM-CRF Lattice LSTM 深度学习 Ancient Chinese literature Named entity recognition BiLSTM-CRF Lattice LSTM Deep learning
  • 相关文献

参考文献6

二级参考文献21

  • 1FRANZEN K,ERIKSSON G,OLSSON F,et al.Protein names and how to find them[J].Int J Med Inf,2002,67:49-61
  • 2FUKUDA K,TAMURA A,TSUNODA T,et al.Toward information extraction:identifying protein names from biological papers[A].In Proceedings of Pacific Symposium on Biocomputing'98[C].Maui,Hawaii,1998.
  • 3ZHOU G,ZHANG J,SU J,et al.Recognizing names in biomedical texts:a machine learning approach[J].Bioinformatics,2004,20(7):1178-1190.
  • 4KAZUHIRO Seki,JAVED Mostafa.A Probabilistic Model for Identifying Protein Names and their Name Boundaries[A].Proceedings of the Computational Systems Bioinformatics[C],Stanford,CA,2003.
  • 5YOSHIMASA Tsuruoka,YUKA Tateishi,KIM Jin-Dong,et al.Developing a robust part-of-speech tagger for biomedical text[A].Advances in Informatics -10th Panhellenic Conference on Informatics[C].[s.l.]2005
  • 6KULICK S,BIES A,LIBERMAN M,et al.Integrated annotation for biomedical information extraction[A].HLT/NAACL 2004 Workshop:BioLink[C].Boston,Massachusetts,2004.
  • 7MIKA S R.Protein names peeled precisely off free text[J].Bioinformatics,2004,20:241-247.
  • 8SCHWARTZ AS,HEARST MA.A simple algorithm for identifying abbreviation definitions in biomedical text[J].Pac Symp Biocomput,2003,8:451-462.
  • 9KIM Jin-Dong,OHTA Tomoko,TSURUOKA Yoshimasa,et al.Introduction to the bio-entity recognition task at JNLPBA[A].Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications(JNLPBA-2004)[C].Geneva,Switzerland,2004.
  • 10ZHOU Guodong,SU Jian.Exploring deep knowledge resources in biomedical name recognition[A].Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications(JNLPBA-2004)[C].Geneva,Switzerland,2004.

共引文献147

同被引文献135

引证文献15

二级引证文献91

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部