摘要
多模态神经机器翻译旨在利用视觉信息来提高文本翻译质量。传统多模态机器翻译将图像的全局语义信息融入翻译模型,而忽略了图像的细粒度信息对翻译质量的影响。对此,该文提出一种基于图文细粒度对齐语义引导的多模态神经机器翻译方法,该方法首先采用跨模态交互图文信息,以提取图文细粒度对齐语义信息,然后以图文细粒度对齐语义信息为枢纽,采用门控机制将多模态细粒度信息对齐到文本信息上,实现图文多模态特征融合。在多模态机器翻译基准数据集Multi30K英语到德语、英语到法语以及英语到捷克语翻译任务上的实验结果表明,该文提出的方法是有效的,并且优于大多数先进的多模态机器翻译方法。
Multi-modal neural machine translation aims to improve the quality of text translation by utilizing visual information.Traditional multimodal machine translation models incorporate global semantic information from images into the translation model,ignoring the impact of fine-grained image information on translation quality.To address this issue,this paper proposes a multimodal neural machine translation method guided by the semantic information derived from the fine-grained alignment of images and text.Specifically,using the fine-grained alignment semantic information as the pivot,a gating mechanism is employed to align multimodal fine-grained information with textual information,achieving multimodal feature fusion between images and text.Experimental results on the Multi30K English-to-German,English-to-French,and English-to-Czech translation tasks show that the proposed method is effective and outperforms most state-of-the-art methods.
作者
叶俊杰
郭军军
谭凯文
相艳
余正涛
YE Junjie;GUO Junjun;TAN Kaiwen;XIANG Yan;YU Zhengtao(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming,Yunnan 650500,China;School of Information Science and Engineering,Yunnan University,Kunming,Yunnan 650500,China)
出处
《中文信息学报》
CSCD
北大核心
2024年第10期24-34,共11页
Journal of Chinese Information Processing
基金
国家重点研究与发展计划(2020AAA0107904)
国家自然科学基金(62366025)
云南省科技厅自然科学基金(202301AT070444)。
关键词
多模态神经机器翻译
图文细粒度
语义交互
对齐语义
multi-modal neural machine translation
image-text fine-grained
semantic interaction
alignment semantic