期刊文献+

基于语言和视觉融合Transformer的指代图像分割

Referring Image Segmentation Based on Language and Visual Fusion Transformer
下载PDF
导出
摘要 针对指代图像分割任务中存在语言表达歧义、多模态特征对齐不充分、对图像整体理解不全面等问题,提出一种基于Transformer特征融合与对齐的多模态深度学习模型。该模型使用优化的Darknet53图像特征提取骨干网络,加强了对全局特征理解能力。使用了卷积神经网络结构、双向门控循环单元Bi-GRU结构和自注意力机制相互结合的语言特征提取结构,挖掘深层次语义特征,消除语言表达的歧义性。构建了基于Transformer的特征对齐结构,以提升模型的分割细节和分割精度。最后,采用平均的交并比mIoU和在不同阈值的识别精度作为模型评估指标,通过实验证明所提模型可以充分融合多模态的特征,理解多模态特征的深层语义信息,模型识别结果更加准确。 To solve the problems of ambiguous language expression,insufficient multimodal feature alignment and incomplete understanding of the image as a whole in referring image segmentation tasks,a multimodal deep learning model based on Transformer feature fusion and alignment is proposed.The model uses an optimized Darknet53 image feature extraction backbone network to enhance global feature understanding.It also adopts convolutional neural network structure,Bi-directional gated recurrent unit Bi-GRU structure and self-attentive mechanism to combine with each other for linguistic feature extraction to tap deep semantic features and eliminate the ambiguity of linguistic expressions.Furthermore,a feature alignment structure based on Transformer is constructed to enhance the segmentation details and segmentation accuracy of the model.Finally,the average intersection over union mIoU and the recognition accuracy at different thresholds are used as model evaluation indexes.By experiments,the effectiveness of model is verified.It can fully fuse the multimodal features,understand the deep semantic information of the features,and the model recognition results are more accurate.
作者 段勇 刘铁 DUAN Yong;LIU Tie(School of Information Science and Engineering,Shenyang University of Technology,Shenyang Liaoning 110870,China)
出处 《传感技术学报》 CAS CSCD 北大核心 2024年第7期1193-1201,共9页 Chinese Journal of Sensors and Actuators
基金 辽宁省高等学校优秀科技人才支持计划(LR15045) 辽宁省教育厅科学研究经费面上项目(LJKZ0139)。
关键词 深度学习 指代图像分割 自然语言处理 注意力机制 Transformer模型 deep learning referring image segmentation natural language processing attention mechanism transformer model
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部