摘要
随着人工智能的快速发展,对跨模态的研究也渐渐地受到了科研人员的关注。基于深度学习的视觉问答(Visual Question Answering,VQA)模型在数据集上的准确率不断提高,但这些模型也表现出了共同的缺点,即对模态的利用不平衡。本文概述了视觉问答语言先验性领域的多篇论文,对比了各种方法的优缺点,并在现有方法的基础上展望未来缓解视觉问答语言先验性的发展方向。
With the rapid development of artificial intelligence, the research on cross modal has gradually attracted the attention of researchers. Visual Question Answering Based on deep learning The accuracy of visual question answering(VQA)models in data sets is improving, but these models also show a common disadvantage, that is, the unbalanced utilization of modes.This paper summarizes several papers in the field of visual question answering language prioritisation, compares the advantages and disadvantages of various methods, and looks forward to the future to alleviate the development of visual question answering language prioritisation based on the existing methods Exhibition direction.
作者
权海波
杨颖
QUAN Haibo;YANG Ying(College of Computer Science and Information Engineering,Fuyang Normal University,Fuyang Anhui 236037,China)
出处
《信息与电脑》
2022年第1期55-58,共4页
Information & Computer
基金
安徽省教育厅自然科学研究重点项目(项目编号:KJ2019A0536)。
关键词
视觉问答
语言先验性
深度学习
计算机视觉
visual question answering
language prior
deep learning
computer vision