期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Deep Multi-Module Based Language Priors Mitigation Model for Visual Question Answering
1
作者 于守健 金学勤 +2 位作者 吴国文 石秀金 张红 《Journal of Donghua University(English Edition)》 CAS 2023年第6期684-694,共11页
The original intention of visual question answering(VQA)models is to infer the answer based on the relevant information of the question text in the visual image,but many VQA models often yield answers that are biased ... The original intention of visual question answering(VQA)models is to infer the answer based on the relevant information of the question text in the visual image,but many VQA models often yield answers that are biased by some prior knowledge,especially the language priors.This paper proposes a mitigation model called language priors mitigation-VQA(LPM-VQA)for the language priors problem in VQA model,which divides language priors into positive and negative language priors.Different network branches are used to capture and process the different priors to achieve the purpose of mitigating language priors.A dynamically-changing language prior feedback objective function is designed with the intermediate results of some modules in the VQA model.The weight of the loss value for each answer is dynamically set according to the strength of its language priors to balance its proportion in the total VQA loss to further mitigate the language priors.This model does not depend on the baseline VQA architectures and can be configured like a plug-in to improve the performance of the model over most existing VQA models.The experimental results show that the proposed model is general and effective,achieving state-of-the-art accuracy in the VQA-CP v2 dataset. 展开更多
关键词 visual question answering(VQA) language priors natural language processing multimodal fusion computer vision
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部