Implicit Modality Mining: An End-to-End Method for Multimodal Information Extraction

下载PDF

导出

摘要 Multimodal named entity recognition(MNER)and relation extraction(MRE)are key in social media analysis but face challenges like inefficient visual processing and non-optimal modality interaction.(1)Heavy visual embedding:the process of visual embedding is both time and computationally expensive due to the prerequisite extraction of explicit visual cues from the original image before input into the multimodal model.Consequently,these approaches cannot achieve efficient online reasoning;(2)suboptimal interaction handling:the prevalent method of managing interaction between different modalities typically relies on the alternation of self-attention and cross-attention mechanisms or excessive dependence on the gating mechanism.This explicit modeling method may fail to capture some nuanced relations between image and text,ultimately undermining the model’s capability to extract optimal information.To address these challenges,we introduce Implicit Modality Mining(IMM),a novel end-to-end framework for fine-grained image-text correlation without heavy visual embedders.IMM uses an Implicit Semantic Alignment module with a Transformer for cross-modal clues and an Insert-Activation module to effectively utilize these clues.Our approach achieves state-of-the-art performance on three datasets.

作者 Jinle Lu Qinglang Guo

机构地区 School of Cyber Science and Technology National Engineering Research Center for Public Safety Risk Perception and Control by Big Data(RPP)

出处《Journal of Electronic Research and Application》 2024年第2期124-139,共16页 电子研究与应用

关键词 MULTIMODAL Named entity recognition Relation extraction Patch projection

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

1黄子麒,胡建鹏.实体类别增强的汽车领域嵌套命名实体识别[J].计算机应用,2024,44(2):377-384.
2Zhulin HAN,Jian WANG.Knowledge enhanced graph inference network based entity-relation extraction and knowledge graph construction for industrial domain[J].Frontiers of Engineering Management,2024,11(1):143-158.
3聂啸林,张礼麟,牛当当,吴华瑞,朱华吉,张宏鸣.面向葡萄知识图谱构建的多特征融合命名实体识别[J].农业工程学报,2024,40(3):201-210.
4周佳伦,李琳宇,马洪彬,姜艳静.MRC-PBM:一种中文电子病历嵌套命名实体识别方法[J].国外电子测量技术,2024,43(1):159-165.
5Jing Yang,Bin Ji,Shasha Li,Jun Ma,Jie Yu.SciCN:A Scientific Dataset for Chinese Named Entity Recognition[J].Computers, Materials & Continua,2024,78(3):4303-4315.
6李雨萍,章宇媚,邹月芬.前交叉韧带重建后正常移植物及并发症的磁共振影像表现[J].南京医科大学学报（自然科学版）,2024,44(3):387-392.
7Xiaohui Cui,Chao Song,Dongmei Li,Xiaolong Qu,Jiao Long,Yu Yang,Hanchao Zhang.RoBGP:A Chinese Nested Biomedical Named Entity Recognition Model Based on RoBERTa and Global Pointer[J].Computers, Materials & Continua,2024,78(3):3603-3618.
8Hong Zhang,Haijian Shao.Exploring the Latest Applications of OpenAI and ChatGPT: An In-Depth Survey[J].Computer Modeling in Engineering & Sciences,2024,138(3):2061-2102.
9车俐,吕连辉,蒋留兵.AF-CenterNet:基于交叉注意力机制的毫米波雷达和相机融合的目标检测[J].计算机应用研究,2024,41(4):1258-1263.
10Kang Liu,Yangqiu Song,Jeff Z.Pan.Editorial for Special Issue on Commonsense Knowledge and Reasoning:Representation,Acquisition and Applications[J].Machine Intelligence Research,2024,21(2):215-216.

Journal of Electronic Research and Application

2024年第2期

浏览历史

内容加载中请稍等...

Implicit Modality Mining: An End-to-End Method for Multimodal Information Extraction

相关作者

相关机构

相关主题

浏览历史