期刊文献+

融入自注意力机制的社交媒体命名实体识别 被引量:43

Combined self-attention mechanism for named entity recognition in social media
原文传递
导出
摘要 相比规范新闻文本中命名实体识别(named entity recognition,NER),中文社交媒体中命名实体识别的性能偏低,这主要受限于文本的规范性和标注语料的规模。近年来中文社交媒体的命名实体识别研究主要针对标注语料规模小这一问题,倾向于使用外部知识或者借助联合训练来提升最终的识别性能,但对社交媒体文本不规范导致的对文本自身蕴含特征的挖掘不够这一问题的研究很少。该文着眼于文本自身,提出了一种结合双向长短时记忆和自注意力机制的命名实体识别方法。该方法通过在多个不同子空间捕获上下文相关信息来更好地理解和表示句子结构,充分挖掘文本自身蕴含的特征,并最终提升不规范文本的实体识别性能。在Weibo NER公开语料上进行了多组对比实验,实验结果验证了方法的有效性。结果表明:在不使用外部资源和联合训练的情况下,命名实体识别的F1值达到了58.76%。 Named entity recognition(NER)in Chinese social media is less effective than in standard news mainly due to the normalization and the size of the existing annotated corpus.In recent years,research on named entity recognition in Chinese social media has tended to use external knowledge and joint training to improve performance due to the small size of the annotated corpus.However,there are few studies on mining entity recognition characteristics in social media.This article focuses on named entity recognition in text articles using a neural network model that combines bi-directional long short-term memory with a self-attention mechanism.This model extracts context information from different dimensions to better understand and represent the sentence structure and improve the recognition performance.Tests on the Weibo NER released corpus show that this method is more effective than previous approaches and that this method has a 58.76%F1-score without using external knowledge or joint learning.
作者 李明扬 孔芳 LI Mingyang;KONG Fang(School of Computer Science and Technology,Soochow University,Suzhou 215006,China)
出处 《清华大学学报(自然科学版)》 EI CAS CSCD 北大核心 2019年第6期461-467,共7页 Journal of Tsinghua University(Science and Technology)
基金 国家自然科学基金资助项目(61472264 61876118) 人工智能应急项目(61751206) 国家重点研发计划子课题(2017YFB1002101)
关键词 命名实体识别 中文社交媒体 自注意力机制 named entity recognition(NER) Chinese social media self-attention mechanism
  • 相关文献

参考文献2

二级参考文献9

  • 1Yiming Yang, An evaluation of statistical approaches to text categorization[J]. In:Journal of Information Retrieval,1999,1(2) :67 - 88.
  • 2Jian-yun Nie, Jianfeng Gao etc. On the Use of Words and N-grams for Chinese Information Retrieval[A]. Fifth International Workshop on Information Retrieval with Asian Languages [ C ]. Hong Kong, September 30 - October 1,2000.
  • 3Lafferty J,McCallum A,Pereira F.Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data.In:Proceedings of the 18th International Conf on machine Learning,2001.282~289
  • 4Sha F,Pereira F.Shallow Parsing with Conditional Random Fields.In:Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL),2003
  • 5现代汉语语料库加工规范-词语切分与词性标注.北京大学计算语言学研究所,1999
  • 6Bai Shuanhu.An Integrated Model of Chinese Word Segmentation and Part-of Speech Tagging.In:Advanced and Applications on Computational Linguistics,Third National Computational Linguistics Meeting,Shanghai.Nov.1995.56~61
  • 7Bai S H,Xia,Y,Huang C N.Automatic Part-of-Speech Tagging System of Chinese:[Technical Report].Beijing:Tsinghua University,1992
  • 8白栓虎.基于统计的汉语词性自动标注方法[J].语文建设,1994(10):38-40. 被引量:2
  • 9周明,吴进,黄昌宁.用于词性标注的一种快速学习算法──对Brill的基于变换算法的一项改进[J].计算机学报,1998,21(4):357-366. 被引量:8

共引文献97

同被引文献302

引证文献43

二级引证文献210

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部