期刊文献+

知识型视觉问答研究综述

Knowledge-based Visual Question Answering:A Survey
下载PDF
导出
摘要 视觉问答作为人工智能完备性和视觉图灵测试的重要呈现形式,加上其具有潜在的应用价值,受到了计算机视觉和自然语言处理两个领域的广泛关注。知识在视觉问答中发挥着重要作用,特别是在处理复杂且开放的问题时,推理知识和外部知识对获取正确答案极为关键。蕴含知识的问答机制被称为知识型视觉问答,目前还没有针对知识型视觉问答的系统性调查。面向视觉问答中的知识参与方式和表达形式的研究能够有效填补知识型视觉问答体系中在文献综述方面存在的缺口。文中对知识型视觉问答的各组成单元进行了调查,对知识的存在形态进行了研究,提出了知识层级概念。进一步地,针对视觉特征提取、语言特征提取和多模态融合过程中的知识参与方式和表达形式进行了归纳和总结,并对未来发展趋势及研究方向进行了探讨。 As an important presentation form of the completeness of artificial intelligence and the visual Turing test,visual question answering(VQA),coupled with its potential application value,has received extensive attention from computer vision and na-tural language processing.Knowledge plays an important role in visual question answering,especially when dealing with complex and open questions,reasoning knowledge and external knowledge are critical to obtaining correct answers.The question and answer mechanism that contains knowledge is called knowledge-based visual question answering(Kb-VQA).At present,systematic investigations on Kb-VQA have not been discovered.Research on knowledge participation methods and expression forms in VQA can effectively fill the gaps in the literature review in the knowledge-based visual question answering system.In this paper,the constituent units of Kb-VQA are investigated,the existence of knowledge is studied,and the concept of knowledge hierarchy is proposed.Further,the knowledge participation methods and expression forms in the process of visual feature extraction,language feature extraction and multi-modal fusion are summarized,and future development trends and research directions are discussed.
作者 王瑞平 吴士泓 张美航 王小平 WANG Ruiping;WU Shihong;ZHANG Meihang;WANG Xiaoping(School of Artificial Intelligence and Automation,Huazhong University of Science and Technology,Wuhan 430074,China;Research Institute of Yuanguang,YGSOFT INC.,Zhuhai,Guangdong 519085,China;School of Mechanical Automation,Wuhan University of Science and Technology,Wuhan 430081,China)
出处 《计算机科学》 CSCD 北大核心 2023年第1期166-175,共10页 Computer Science
基金 国家自然科学基金(51975432)。
关键词 视觉问答 知识分层 内部逻辑推理 外部知识库 知识表达形式 知识参与方式 Visual question answering Knowledge stratification Internal logical reasoning External knowledge base Knowledge expression form Knowledge participation method
  • 相关文献

参考文献3

二级参考文献7

共引文献28

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部