期刊文献+

Implicit Modality Mining: An End-to-End Method for Multimodal Information Extraction

下载PDF
导出
摘要 Multimodal named entity recognition(MNER)and relation extraction(MRE)are key in social media analysis but face challenges like inefficient visual processing and non-optimal modality interaction.(1)Heavy visual embedding:the process of visual embedding is both time and computationally expensive due to the prerequisite extraction of explicit visual cues from the original image before input into the multimodal model.Consequently,these approaches cannot achieve efficient online reasoning;(2)suboptimal interaction handling:the prevalent method of managing interaction between different modalities typically relies on the alternation of self-attention and cross-attention mechanisms or excessive dependence on the gating mechanism.This explicit modeling method may fail to capture some nuanced relations between image and text,ultimately undermining the model’s capability to extract optimal information.To address these challenges,we introduce Implicit Modality Mining(IMM),a novel end-to-end framework for fine-grained image-text correlation without heavy visual embedders.IMM uses an Implicit Semantic Alignment module with a Transformer for cross-modal clues and an Insert-Activation module to effectively utilize these clues.Our approach achieves state-of-the-art performance on three datasets.
出处 《Journal of Electronic Research and Application》 2024年第2期124-139,共16页 电子研究与应用
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部