Visual Question Answering(VQA)has attracted extensive research focus and has become a hot topic in deep learning recently.The development of computer vision and natural language processing technology has contributed t...Visual Question Answering(VQA)has attracted extensive research focus and has become a hot topic in deep learning recently.The development of computer vision and natural language processing technology has contributed to the advancement of this research area.Key solutions to improve the performance of VQA system exist in feature extraction,multimodal fusion,and answer prediction modules.There exists an unsolved issue in the popular VQA image feature extraction module that extracts the fine-grained features from objects of different scale difficultly.In this paper,a novel feature extraction network that combines multi-scale convolution and self-attention branches to solve the above problem is designed.Our approach achieves the state-of-the-art performance of a single model on Pascal VOC 2012,VQA 1.0,and VQA 2.0 datasets.展开更多
The original intention of visual question answering(VQA)models is to infer the answer based on the relevant information of the question text in the visual image,but many VQA models often yield answers that are biased ...The original intention of visual question answering(VQA)models is to infer the answer based on the relevant information of the question text in the visual image,but many VQA models often yield answers that are biased by some prior knowledge,especially the language priors.This paper proposes a mitigation model called language priors mitigation-VQA(LPM-VQA)for the language priors problem in VQA model,which divides language priors into positive and negative language priors.Different network branches are used to capture and process the different priors to achieve the purpose of mitigating language priors.A dynamically-changing language prior feedback objective function is designed with the intermediate results of some modules in the VQA model.The weight of the loss value for each answer is dynamically set according to the strength of its language priors to balance its proportion in the total VQA loss to further mitigate the language priors.This model does not depend on the baseline VQA architectures and can be configured like a plug-in to improve the performance of the model over most existing VQA models.The experimental results show that the proposed model is general and effective,achieving state-of-the-art accuracy in the VQA-CP v2 dataset.展开更多
Video quality assessment(VQA) plays a vital role in the field of video processing, including areas of video acquisition, video filtering in retrieval, video compression, video restoration, and video enhancement. Since...Video quality assessment(VQA) plays a vital role in the field of video processing, including areas of video acquisition, video filtering in retrieval, video compression, video restoration, and video enhancement. Since VQA has gained much attention in recent years, this paper gives an up-to-date review of VQA research and highlights current challenges in this filed. The subjective study and common VQA databases are first reviewed.Then, a survey on the objective VQA methods, including full-reference, reduced-reference,and no-reference VQA, is reported. Last but most importantly, the key limitations of current research and several challenges in the field of VQA are discussed, which include the impact of video content, memory effects, computational efficiency, personalized video quality prediction, and quality assessment of newly emerged videos.展开更多
基金This work is supported by the National Natural Science Foundation of China(61872231,61701297).
文摘Visual Question Answering(VQA)has attracted extensive research focus and has become a hot topic in deep learning recently.The development of computer vision and natural language processing technology has contributed to the advancement of this research area.Key solutions to improve the performance of VQA system exist in feature extraction,multimodal fusion,and answer prediction modules.There exists an unsolved issue in the popular VQA image feature extraction module that extracts the fine-grained features from objects of different scale difficultly.In this paper,a novel feature extraction network that combines multi-scale convolution and self-attention branches to solve the above problem is designed.Our approach achieves the state-of-the-art performance of a single model on Pascal VOC 2012,VQA 1.0,and VQA 2.0 datasets.
文摘The original intention of visual question answering(VQA)models is to infer the answer based on the relevant information of the question text in the visual image,but many VQA models often yield answers that are biased by some prior knowledge,especially the language priors.This paper proposes a mitigation model called language priors mitigation-VQA(LPM-VQA)for the language priors problem in VQA model,which divides language priors into positive and negative language priors.Different network branches are used to capture and process the different priors to achieve the purpose of mitigating language priors.A dynamically-changing language prior feedback objective function is designed with the intermediate results of some modules in the VQA model.The weight of the loss value for each answer is dynamically set according to the strength of its language priors to balance its proportion in the total VQA loss to further mitigate the language priors.This model does not depend on the baseline VQA architectures and can be configured like a plug-in to improve the performance of the model over most existing VQA models.The experimental results show that the proposed model is general and effective,achieving state-of-the-art accuracy in the VQA-CP v2 dataset.
基金partially supported by National Basic Research Program of China ("973"Program)(2015CB351803)the National Natural Science Foundation of China(61390514,61527804,61572042,61520106004)Sino-German Center(GZ 1025)
文摘Video quality assessment(VQA) plays a vital role in the field of video processing, including areas of video acquisition, video filtering in retrieval, video compression, video restoration, and video enhancement. Since VQA has gained much attention in recent years, this paper gives an up-to-date review of VQA research and highlights current challenges in this filed. The subjective study and common VQA databases are first reviewed.Then, a survey on the objective VQA methods, including full-reference, reduced-reference,and no-reference VQA, is reported. Last but most importantly, the key limitations of current research and several challenges in the field of VQA are discussed, which include the impact of video content, memory effects, computational efficiency, personalized video quality prediction, and quality assessment of newly emerged videos.