基于跨模态对比学习的视觉问答主动学习方法被引量：4

Contrastive Cross-Modal Representation Learning Based Active Learning for Visual Question Answer

下载PDF

导出

摘要视觉自动问答技术是一个新兴的多模态学习任务,它联系了图像内容理解和文本语义推理,针对图像和问题给出对应的回答.该技术涉及多种模态交互,对视觉感知和文本语义学习有较高的要求,受到了广泛的关注.然而,视觉自动问答模型的训练对数据集的要求较高.它需要多种多样的问题模式和大量的相似场景不同答案的问题答案标注,以保证模型的鲁棒性和不同模态下的泛化能力.而标注视觉自动问答数据需要花费大量的人力物力,高昂的成本成为制约该领域发展的瓶颈.针对这个问题,本文提出了基于跨模态特征对比学习的视觉问答主动学习方法(CCRL).该方法从尽可能覆盖更多的问题类型和尽可能获取更平衡的问题分布两方面出发,设计了视觉问题匹配评价(VQME)模块和视觉答案不确定度度量(VAUE)模块.视觉问题评价模块使用了互信息和对比预测编码作为自监督学习的约束,学习视觉模态和问题模式的匹配关系.视觉答案不确定性模块引入了标注状态学习模块,自适应地选择匹配的问题模式并学习跨模态问答语义关联,通过答案项的概率分布评估样本不确定度,寻找最有价值的未标注样本进行标注.在实验部分,本文在视觉问答数据集VQA-v2上将CCRL和其他最新的主动学习算法进行了性能比较,实验结果表明该方法在各个问题模式下均超越之前的方法,该方法对比当前性能最好的主动学习方法在不同的采样率下平均提升了1.65%的准确率.在仅标注30%的数据下,该方法可以达到100%样本标注下性能的96%;在40%的标注比例之下,该方法可以达到100%样本标注下性能的97%.这说明该方法可以选取出具有高指导价值的样本,节约了标注花费的同时最大化视觉自动问答的模型性能. Visual question answer(VQA)is a newly developing multi-modal learning task that bridges both the comprehensions of the visual content and the textual question to generate a corresponding answer.It attracts a lot of attention from the community and involves the interaction of different modalities,which requires the capability of image perception and textual semantic learning.However,the training of VQA has great requirements for the dataset.It requires a wide variety of question patterns and a large number of question answer annotations with different answers for similar scenarios to ensure the robustness of the model and the generalization ability under different modalities.Thus,it is very time-consuming and expensive to label a VQA dataset,which becomes a bottleneck for the development of VQA.In view of these problems,this paper proposes a contrastive cross-modal representation learning based active learning(CCRL)method for VQA.The key idea of CCRL is to cover more question patterns and make the distribution of answers more balanced.It consists of a visual question matching evaluation(VQME)module and a visual answer uncertainty estimation(VAUE)module.The visual question matching evaluation module utilizes mutual information and contrastive predictive coding as the constraints to learn the alignment relationship between visual content and question pattern.The answer uncertainty module introduces the label state learning model.It selects matched question patterns for each image and learn the semantic relationship between cross-modal questions and answers.Then the model estimates the uncertainty of the answer based on the distribution of its probability,by which CCRL can select most informative samples and label them.In the experiment,this work implements the latest active learning algorithms on the VQA task and performs performance evaluation on VQA-v2 dataset.The experimental results demonstrate that CCRL outperforms the previous methods in all question patterns and averagely improves the accuracy by 1.65%compared to the state-of-the-art active learning method.With 30%labeled samples,CCRL achieves 96%of the performance with 100%labeled data.With 40%labeled samples,CCRL achieves 97%of the performance with 100%labeled data.This indicates that CCRL can select instructive and diverse samples,which greatly cuts down the annotation cost and maximizes the VQA performance respectively.

作者张北辰李亮查正军黄庆明 ZHANG Bei-Chen;LI Liang;ZHA Zheng-Jun;HUANG Qing-Ming(School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 101408;Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;School of Information Science and Technology,University of Science and Technology of China,Hefei 230027;Peng Cheng Laboratory,Shenzhen,Guangdonog 518055)

机构地区中国科学院大学计算机科学与技术学院中国科学院计算技术研究所智能信息处理重点实验室中国科学技术大学信息科学技术学院鹏城实验室

出处《计算机学报》 EI CAS CSCD 北大核心 2022年第8期1730-1745,共16页 Chinese Journal of Computers

基金科技部科技创新2030-“新一代人工智能”重大项目(2018AAA0102000) 国家自然科学基金(61732007,61771457,U21B2038) 中国科学院青年创新促进会(20200108) 中央高校基本科研业务费专项资金资助.

关键词主动学习跨模态语义推理对比学习视觉问答互信息 active learning cross-modal semantic reasoning contrastive learning visual question answer mutual information

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1白露,郭嘉丰,曹雷,程学旗.基于查询意图的长尾查询推荐[J].计算机学报,2013,36(3):636-642. 被引量：7

二级参考文献18

1Goel S, Broder A, Gabrilovich E, Pang B. Anatomy of thelong tail: Ordinary people with extraordinary tastes//Pro- ceedings of the ACM international conference on Web search and Data Mining. New York, USA, 2010:201-210.
2Huh S, Fienberg S E. Discriminative topic modeling based on manifold learning//Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA, 2010: 653-662.
3Kawamae N, Takahashi K. Information retrieval based on collaborative fltering with latent interest semantic map// Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. New York, USA, 2005: 618-623.
4Mei Q, Cai D, Zhang D, Zhai C. Topic modeling with net- work regularization//Proceedings of the International Conference on World Wide Web. New York, USA, 2008: 101- 110.
5Hofmann T. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 2001, 42(1-2): 177-196.
6Blei D M, Ng A Y, Jordan M I, Lafferty J. Latent dirichlet allocation. The Journal of Machine Learning Research, 2003, 3:993- 1022.
7Newman M E J, Leicht E A. Mixture models and explorato- ry analysis in networks. Proceedings of the National Aeade my of Sciences, 2007, 104(23) : 9564-9569.
8Ramasco J J, Mungan M. Inversion method for content- based networks. Physical Review E, 2008, 77(3): 036122.
9Ren W, Yah G, Liao X. A simple probabilistic algorithm for detecting community structure in social networks. Physical Review E, 2009, 79(3): 036111.
10Mei Q, Zhou D, Church K. Query suggestion using hitting time//Proceedings of the ACM Conference on Information and Knowledge Management. New York, USA, 2008:469-478.

共引文献6

1刘钰峰,李仁发.基于Term-Query-URL异构信息网络的查询推荐[J].湖南大学学报（自然科学版）,2014,41(5):106-112. 被引量：3
2洪婕,张健,胡亮.基于领域本体知识库的专业搜索引擎查询推荐算法研究--以盐湖化工领域为例[J].情报学报,2014,33(10):1091-1098. 被引量：5
3任育伟,吕学强,李卓,徐丽萍.基于查询热度和实体识别的查询推荐[J].计算机应用研究,2016,33(3):657-660. 被引量：1
4张博,张斌,孙达明,张书波.一种融合用户学习过程的用户查询意图模型[J].计算机应用研究,2017,34(6):1640-1646. 被引量：2
5徐涛,张继水,卢敏.一种基于潜在出行意图的旅客价值发现模型[J].现代电子技术,2019,42(4):143-147.
6张晓娟,彭琳,李倩.查询推荐研究综述[J].情报学报,2019,38(4):432-446. 被引量：4

同被引文献11

1刘金硕,冯阔,Jeff Z.Pan,邓娟,王丽娜.MSRD:多模态网络谣言检测方法[J].计算机研究与发展,2020,57(11):2328-2336. 被引量：19
2杜鹏飞,李小勇,高雅丽.多模态视觉语言表征学习研究综述[J].软件学报,2021,32(2):327-348. 被引量：26
3王剑,王玉翠,黄梦杰.社交网络中的虚假信息:定义、检测及控制[J].计算机科学,2021,48(8):263-277. 被引量：27
4包希港,周春来,肖克晶,覃飙.视觉问答研究综述[J].软件学报,2021,32(8):2522-2544. 被引量：12
5王守会,覃飙.知识库问答系统研究进展[J].小型微型计算机系统,2021,42(9):1793-1801. 被引量：9
6何相腾,彭宇新.跨域和跨模态适应学习的无监督细粒度视频分类[J].软件学报,2021,32(11):3482-3495. 被引量：3
7邹品荣,肖锋,张文娟,张万玉,王晨阳.面向视觉问答的多模块协同注意模型[J].计算机工程,2022,48(2):250-260. 被引量：6
8孟杰,王莉,杨延杰,廉飚.基于多模态深度融合的虚假信息检测[J].计算机应用,2022,42(2):419-425. 被引量：6
9王莉.网络虚假信息检测技术研究与展望[J].太原理工大学学报,2022,53(3):397-404. 被引量：6
10黄皓,周丽华,黄亚群,姜懿庭.基于混合深度模型的虚假信息早期检测[J].山东大学学报（工学版）,2022,52(4):89-98. 被引量：4

引证文献4

1李卓远,李军.基于对比学习的多模态注意力网络虚假信息检测方法[J].中国科技论文,2023,18(11):1192-1197. 被引量：1
2尹梦冉,梁美玉,于洋,曹晓雯,杜军平,薛哲.面向跨模态检索的查询感知双重对比学习网络[J].软件学报,2024,35(5):2120-2132.
3崔文成,施文涛,邵虹.一种基于共同注意网络的医学视觉问答方法[J].生物医学工程学杂志,2024,41(3):560-568.
4杨旭华,庞宇超,叶蕾.利用可交谈多头共注意力机制的视觉问答[J].小型微型计算机系统,2024,45(8):1901-1907.

二级引证文献1

1张玉艳.融合多模态信息的电视视频检索系统设计[J].电视技术,2024,48(4):40-42. 被引量：1

1徐鲲,霍亮,沈涛,符季颖.基于领域本体和BIM的水利安全监测专题知识模型构建方法研究[J].测绘与空间地理信息,2022,45(6):25-28. 被引量：2
2冯汝康.语义推理的功能组件动态绑定研究[J].软件导刊,2022,21(6):103-107.
3苏彤毅.国省干线长陡下坡路段安全性评价方法及应用[J].运输经理世界,2021(17):136-138.
4谷奉锦,贺楚闳,潘庆亚,王晔,朱晓荣.基于知识图谱的5G网络故障分析方法[J].无线电通信技术,2022,48(4):751-757. 被引量：4
5张晨雨.初中英语课堂口语能力培养多模态教学设计——以“Do you like ba-nana?”为例[J].海外英语,2022(10):213-214. 被引量：2
6马雷,周一飞.谜题与棘题:两类不同性质的难题[J].自然辩证法研究,2022,38(1):111-117. 被引量：2
7于冬芬.泉水宣传片《泉之城》的多模态话语分析[J].喜剧世界（中旬刊）,2022(4):138-139.
8渠继明.耿庄水库库区地质条件及地质问题评价[J].山西水利科技,2022(1):5-7.
9王雪枫,张雪松,王峰,石方宇,赵佳.视觉问答中的模型分析与展望[J].阜阳师范大学学报（自然科学版）,2022,39(2):76-84. 被引量：2
10王曼莉.基于战略驱动的国有企业子分公司考核创新——以C建筑集团为例[J].智库时代,2020,0(15):121-123. 被引量：1

计算机学报

2022年第8期

浏览历史

内容加载中请稍等...

基于跨模态对比学习的视觉问答主动学习方法被引量：4

参考文献1

二级参考文献18

共引文献6

同被引文献11

引证文献4

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于跨模态对比学习的视觉问答主动学习方法 被引量：4

参考文献1

二级参考文献18

共引文献6

同被引文献11

引证文献4

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于跨模态对比学习的视觉问答主动学习方法被引量：4