期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Implicit Modality Mining: An End-to-End Method for Multimodal Information Extraction
1
作者 Jinle Lu qinglang guo 《Journal of Electronic Research and Application》 2024年第2期124-139,共16页
Multimodal named entity recognition(MNER)and relation extraction(MRE)are key in social media analysis but face challenges like inefficient visual processing and non-optimal modality interaction.(1)Heavy visual embeddi... Multimodal named entity recognition(MNER)and relation extraction(MRE)are key in social media analysis but face challenges like inefficient visual processing and non-optimal modality interaction.(1)Heavy visual embedding:the process of visual embedding is both time and computationally expensive due to the prerequisite extraction of explicit visual cues from the original image before input into the multimodal model.Consequently,these approaches cannot achieve efficient online reasoning;(2)suboptimal interaction handling:the prevalent method of managing interaction between different modalities typically relies on the alternation of self-attention and cross-attention mechanisms or excessive dependence on the gating mechanism.This explicit modeling method may fail to capture some nuanced relations between image and text,ultimately undermining the model’s capability to extract optimal information.To address these challenges,we introduce Implicit Modality Mining(IMM),a novel end-to-end framework for fine-grained image-text correlation without heavy visual embedders.IMM uses an Implicit Semantic Alignment module with a Transformer for cross-modal clues and an Insert-Activation module to effectively utilize these clues.Our approach achieves state-of-the-art performance on three datasets. 展开更多
关键词 MULTIMODAL Named entity recognition Relation extraction Patch projection
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部