摘要
视觉问答与对话是人工智能领域的重要研究任务,是计算机视觉与自然语言处理交叉领域的代表性问题之一。视觉问答与对话任务要求机器根据指定的视觉图像内容,对单轮或多轮的自然语言问题进行作答。视觉问答与对话对机器的感知能力、认知能力和推理能力均提出了较高的要求,在跨模态人机交互应用中具有实用前景。文中对近年来视觉问答与对话的研究进展进行了综述,对数据集和算法进行了归纳,对研究挑战和问题进行了总结,最后对视觉问答与对话的未来发展趋势进行了讨论。
Visual question answering and dialogue are important research tasks in artificial intelligence,and the representative problems in the intersection of computer vision and natural language processing.Visual question answering and dialogue tasks require the machine to answer single-round or multi-round questions based on the specified visual content.Visual question answering and dialogue require the machine’s abilities of perception,cognition and reasoning,and have application prospects in cross-modal human-computer interaction applications.This paper reviews recent research progress of visual question answering and dialogue,and summarizes datasets,algorithms,challenges,and problems.Finally,this paper discusses the future research trend of visual question answering and dialogue.
作者
牛玉磊
张含望
NIU Yu-lei;ZHANG Han-wang(School of Computer Science and Engineering,Nanyang Technological University,639798,Singapore)
出处
《计算机科学》
CSCD
北大核心
2021年第3期87-96,共10页
Computer Science
基金
阿里巴巴-南洋理工大学新加坡联合研究所。
关键词
视觉问答
视觉对话
视觉语言
视觉推理
深度学习
Visual question answering
Visual dialogue
Vision and language
Visual reasoning
Deep learning