期刊文献+

基于深度学习的分布式视觉问答模型

The distributed visual question answering model based on deep learning
下载PDF
导出
摘要 视觉问答(Visual Question Answering,VQA)是让机器能够回答与图像相关的自然语言问题。现有视觉问答存在一些模型仅对特定类型的问题样本产生效果的情况,本文提出了一种基于深度神经网络的分布式框架模型。首先将训练样本根据答案分布的信息熵分为有偏和无偏样本,对于有偏样本为其生成反事实训练样本,迫使模型增强对图像和问题的关键区域的关注,减轻语言先验影响;其次对于无偏样本,利用大量的图像文本预训练加微调的方法,提升模型对无偏样本的性能;最后使用多分类交叉熵损失来衡量模型预测结果与真实标签之间的差异,提升模型的性能。实验数据采用VQA-cp-v2和VQA-v2数据集,实验结果表明,本文提出的分布式视觉问答方法在解决有偏和无偏样本影响的问题上取得明显改进。 Visual Q&A enables machines to answer natural language questions related to images.There are some existing visual question answering models that only produce effects on specific types of question samples,this paper proposes a distribut-ed framework model based on deep neural networks.Firstly,the training samples are divided into biased and unbiased ones ac-cording to the information entropy of the answer distribution,counterfactual training samples are generated for the biased sam-ples,forcing the model to increase its attention to the key regions of the image and the problem,mitigate the prior influence of language;Secondly,for unbiased samples,a large number of image text pre-training and fine-tuning methods are used to im-prove the performance of the model on unbiased samples;Finally,the multi-classification cross-entropy loss is used to measure the difference between the prediction results of the model and the true labels,and improve the performance of the model.Experi-mental results show that based on VQA-cp-v2 and VQA-v2 datasets,the distributed visual question answering method proposed in this paper has achieved significant improvement in solving the problem of the influence of biased and unbiased samples.
作者 周彤 王峰 余正涛 郭晨靓 赵佳 ZHOU Tong;WANG Feng;YU Zheng Tao;GUO Chen Liang;ZHAO Jia(School of Computer and Information Engineering,Fuyang Normal University,Fuyang Anhui 236037,China)
出处 《阜阳师范大学学报(自然科学版)》 2024年第1期8-14,共7页 Journal of Fuyang Normal University:Natural Science
基金 国家自然科学基金项目(61906044) 中国博士后科学基金面上项目(2020M681984) 安徽省高校自然科学研究重大项目(KJ2020ZD48) 安徽省高校自然科学研究重点项目(2023AH050406,KJ2021A0682)。
关键词 视觉问答 分布式框架 信息熵 反事实 预训练 visual question answering distributed framework information entropy counterfactual pre-training
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部