期刊文献+

EMSS:一种基于Span匹配的中文实体抽取方法

EMSS:a Chinese Entity Extraction Method Based on Span Matching
下载PDF
导出
摘要 基于Span(跨度)的实体抽取模型目前在英文数据集上取得了优异的效果,且已被证明跨度实体抽取比传统的序列标注实体抽取的效果更好.本文提出了一种基于跨度与拼接的中文命名实体抽取模型(EMSS),EMSS使用端到端的span抽取模型,文本经过BERT预训练模型进行字向量的编码,随后进入span抽取层枚举出所有可能的跨度,并加入跨度边界与跨度长度两种特征向量用于对跨度向量进行计算,最后进入跨度预测层进行实体标签的预测.同时提出了一种基于BIO格式的新标签标注方法,该标注方法不受模型与数据集领域限制,可以在不影响下游任务的情况下提高模型识别准确率.在Weibo、Resume、MSRA、OntoNotes4.0数据集上与当前主流的中文实体抽取模型进行对比实验.实验结果表明,提出的EMSS优于现有主流模型,均取得了7%左右的F1值提升.并将该方法应用到煤矿机电设备领域,解决煤矿机电设备领域的实体识别问题,在自制数据集上的实验证明本文的标注方法,不仅在中文实体上有效,而且对汉字、英文、数字结合的混合类型实体也有明显的效果. Span based entity extraction models currently achieve excellent results on English datasets,and span entity extraction has been shown to be more effective than traditional sequence annotated entity extraction.In this paper,we propose a span and splicing based entity extraction model(EMSS),we use an end-to-end span extraction model,the text is encoded in a BERT pre-training model for word vectors,and then enters the span extraction layer to enumerate all possible spans,and add two feature vectors,span boundary and span length,to compute the span vectors.Finally,it enters the span prediction layer for entity label prediction.A new label labeling method based on BIO format is also proposed,which is not restricted by the model and dataset domain and can improve the model recognition accuracy without affecting the downstream tasks.We conduct comparative experiments with current mainstream Chinese entity extraction models on Weibo,Resume,MSRA,and OntoNotes4.0 public datasets.The experimental results show that our approach outperforms the existing models,both achieving about 7%improvement in F1 values.The experiments on the homemade domain dataset prove that our annotation method is effective not only on Chinese entities,but also on mixed types of entities combining Chinese characters,English,and numbers.
作者 游新冬 刘陌村 韩君妹 吕学强 YOU Xindong;LIU Mocun;HAN Junmei;L Xueqiang(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100101,China;General Key Laboratory of Complex System Simulation,Systems Engineering Research Institute,Academy of Military Sciences,Beijing 100101,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2024年第9期2087-2093,共7页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(62171043)资助 北京市自然科学基金项目(4212020)资助 国家语委项目(ZDI145-10,YB145-3)资助 国防科技重点实验室基金项目(6412006200404)资助 北京市市教委科研计划资助项目(KM202111232001)资助 华能集团总部科技项目(HNKJ21-HF43)资助.
关键词 实体抽取 跨度 神经网络 entity extraction span neural network
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部