摘要
随着互联网的发展,人类可以获得的信息量呈指数型增长,我们能够从数据中获得的知识也大大增多,之前被搁置的人工智能再一次焕发活力。随着人工智能的不断发展,近年来,产生了视觉问答(VQA)这一课题,并发展成为人工智能的一大热门问题。视觉问答(VQA)系统需要将图片和问题作为输入,结合图片及问题中的信息,产生一条人类语言作为输出。视觉问答(VQA)的关键解决方案在于如何融合从输入图像和问题中提取的视觉和语言特征。本文围绕视觉问答问题,从概念、模型等方面对近年来的研究进展进行综述,同时探讨现有工作存在的不足;最后对视觉问答未来的研究方向进行了展望。
With the development of the internet,the amount of information available to human beings increases exponentially, and the amount of knowledge we can get from the data also increases greatly. Artificial intelligence,which had been put on hold, is radiate vitality. With the continuous development of artificial intelligence, in recent years,visual question answer (VQA) has emerged as a hot topic in the field of artificial intelligence. Visual question answer (VQA) system needs to take pictures and questions as input and combine these two parts of information to produce a human language as output. The key solution for VQA is how to fuse visual and linguistic features extracted from input images and questions. This paper focuses on the visual question and answer,summarizes the research progress in recent years from the aspects of concept and model,and discusses the existing deficiencies. Finally,the future research direction of VQA are prospected.
作者
葛梦颖
孙宝山
GE Mengying;SUN Baoshan(School of Computer Science and Technology,Tianjin Polytechnic University,Tianjin 300387,China)
出处
《现代信息科技》
2019年第11期11-13,16,共4页
Modern Information Technology
关键词
深度学习
人工智能
视觉问答
自然语言处理
deep learning
artificial intelligence
visual question answer
natural language processing