摘要
基于图像信息的辅助,提高从非结构化文本中识别命名实体的准确率,可以有效缓解社交媒体场景中因短文本语义信息不全而产生歧义,图片多却不能发挥作用的问题.尽管现有的研究通常采用跨模态注意力机制合并文本和图像的语义表示,但是大多不能建立一个一致的表示来融合两种模态之间的语义信息,且图像中的冗余信息往往会影响多模态实体识别(Multimodal Name Entity Recognition,MNER)的性能.为了解决这些问题,本文提出了一种基于异构图模型的MNER方法,可以有效利用文本和图像之间的交互信息.具体地,首先,构建了一个基于BERT-BiLSTM-CRF的实体识别模型,识别出文本中可能存在的实体;其次,以文本中可能存在的实体作为两个模态之间的桥梁,设计了一个由Token、实体和视觉对象组成的异构图网络,并定义了两种边来表示相互间的语义关系;最后,基于文本和图像组成的异构图,设计了一种多模态融合模型(MHGT),从而减轻了图像噪声的负面影响.在两个通用的MNER数据集上的实验结果表明,本文提出的多模态实体识别方法在Twitter2015和Twitter2017上分别获得了75.26%和86.51%的F1值,优于基线模型的性能.
With the aid of image information,improving the accuracy of identifying entities from unstructured text can effectively alleviate the problem of ambiguity caused by incomplete semantic information in short text in social media scenarios,and solve the problem of too many images but not functioning.Although the existing research often used cross-modal attention mechanism to merge the semantic representations of text and images,most of them cannot establish a consistent representation to fuse the semantic information between the two modes,and the redundant information in images often affects the performance of multimodal name entity recognition(MNER).To address these problems,this paper proposes a MNER method based on heterogeneous graph network,which can effectively utilize the interactive information between text and images.Specifically,firstly,an entity recognition model(BERT-BiLSTM-CRF)is constructed to identify the possible entities in the text;Secondly,a heterogeneous graph network consisting of Tokens,entities and visual objects is designed using the possible entities in the text as a bridge between the two modalities,and two edges are defined to represent the semantic relationships between them;Finally,a multimodal fusion model(MHGT)was designed based on heterogeneous graph composed of text and images,thereby reducing the negative impact of image noise.Experimental results on two publicly MNER datasets show that the proposed MNER method achieved 75.26%F1 on Twitter2015 and 86.51%F1 on Twitter2017,respectively,which are superior to the performance of the baseline models.
作者
李代祎
张笑文
严丽
LI Daiyi;ZHANG Xiaowen;YAN Li(College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China;College of Computer Science and Technology,Zhengzhou University of Light Industry,Zhengzhou 450000,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2024年第9期2063-2070,共8页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(62176121,61370075)资助
江苏省基础研究计划项目(BK20191274)资助.
关键词
多模态实体识别
注意力机制
异构图模型
BERT
条件随机场
multimodal entity recognition
attention mechanism
heterogeneous graph network
BERT
conditional random field