摘要
依照所回答的问题类型区分,图像和文本的视觉问答大体分为2类,第1类是可以从图像中直接获取答案的问题,第2类是需借助外部知识获取答案的问题。目前的视觉问答方法只能在一类问题上具有较高的准确率,回答另一类问题的技术尚不成熟。为了扩大可回答的问题类型,设计了一种知识图谱辅助下的视觉问答方法——K-VQA。在基于深度学习VQA的基础上,通过查询知识图谱区分问题类型,对不同类型的问题采用最合适的方法进行回答,对于需借助外部知识进行回答的问题,利用图像和问题中的信息判断回答问题所需的实体和属性,抽取知识图谱中的三元组,获取问题答案。结果表明,不同的视觉问答技术适用于不同类型的问题,K-VQA方法既能回答简单问题也能回答推理性问题,准确率高达56.67%。因此,作为知识图谱辅助下的视觉问答方法,K-VQA可以回答更多类型的问题并获得较高的准确率,对于深入研究VQA和VQA方法具有重要的参考价值。
The types of questions answered by the visual question answering of images and texts are roughly divided into two types.The first type is the questions that can get the answers directly from the images,and the second type is the questions that need the help of external knowledge to obtain the answers.The current visual question answering method only has a high accuracy in one kind of questions,but the technology to answer the second kind of questions is not yet mature.In order to expand the types of questions that can be answered,a visual question answering method-K-VQA was designed with the help of knowledge graph.On the basis of deep learning VQA,the types of questions are distinguished by querying the knowledge graph,so that different types of questions can be answered with the most appropriate method.For the questions that need to be answered with external knowledge,the images and information in the questions are used to determine the entities and attributes required to answer the questions,and the triples in the knowledge graph are extracted to obtain the answers to the questions.The results show that different visual question answering techniques are suitable for different types of questions.The K-VQA method can answer both simple questions and reasoning questions with an accuracy of 56.67%.Therefore,as a visual question answering method assisted by knowledge graph,K-VQA can answer more types of questions and obtain higher accuracy,which has important reference value for further study of VQA and VQA methods.
作者
高鸿斌
毛金莹
王会勇
GAO Hongbin;MAO Jinying;WANG Huiyong(School of Information Science and Engineering,Hebei University of Science and Technology,Shijiazhuang,Hebei 050018,China)
出处
《河北科技大学学报》
CAS
2020年第4期315-326,共12页
Journal of Hebei University of Science and Technology
基金
河北省自然科学基金(F2018208116)。
关键词
知识工程
视觉问答
外部知识
知识图谱
三元组
knowledge engineering
visual question answering
external knowledge
knowledge graph
triple