基于模态类别的多模态信息处理与融合综述

Survey on Multimodal Information Processing and Fusion Based on Modal Categories

下载PDF

导出

摘要随着人工智能和深度学习技术的不断发展,多模态信息处理与融合领域的相关研究受到了研究者们的广泛关注。本文总结多模态信息处理的发展历史和里程碑式的工作,以及多模态融合策略和模型。根据模态类别的不同,分类整理多模态信息处理与融合的主流数据集。以模态类型作为分类标准,本文系统地梳理多模态信息处理与融合的研究进展,强调不同模态之间的区别,并将多模态信息处理与融合分为:视听处理与融合、声文处理与融合、视觉-文本处理与融合和视觉-音频-文本处理与融合4种类别,对不同输入模态的处理融合方法与模型进行详细的研究。最后针对多模态处理与融合领域的发展进行总结与展望。 With the continuous advancement of artificial intelligence and deep learning technologies,research in the field of mul⁃timodal information processing and fusion has garnered widespread attention from researchers.This paper provides a comprehen⁃sive overview of the development history and milestone works of multimodal information processing,along with strategies and models for multimodal fusion.Based on different modalities,mainstream datasets for multimodal information processing and fu⁃sion are systematically classified and summarized.Using modality type as the classification criterion,this paper systematically re⁃views the research progress in multimodal information processing and fusion,emphasizing the distinctions between different mo⁃dalities.Multimodal information processing and fusion are categorized into four types:audio-visual processing and fusion,audio-text processing and fusion,visual-text processing and fusion,and visual-audio-text processing and fusion.Detailed in⁃vestigations are conducted on methods and models for processing and fusing different input modalities.Finally,a summary and outlook on the development of multimodal processing and fusion are provided.

作者黄文栋王怡凡 HUANG Wendong;WANG Yifan(College of Computer Science and Technology,China University of Petroleum(East China),Qingdao 266580,China)

机构地区中国石油大学(华东)计算机科学与技术学院

出处《计算机与现代化》 2024年第7期47-62,共16页 Computer and Modernization

基金山东省自然科学基金资助项目(ZR202211180156)。

关键词多模态处理多模态信息处理多模态融合深度学习 multimodal processing multimodal information processing multimodal fusion deep learning

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1吴友政,李浩然,姚霆,何晓冬.多模态信息处理前沿综述:应用、融合和预训练[J].中文信息学报,2022,36(5):1-20. 被引量：18
2任泽裕,王振超,柯尊旺,李哲,吾守尔·斯拉木.多模态数据融合综述[J].计算机工程与应用,2021,57(18):49-64. 被引量：55

二级参考文献5

1何俊,张彩庆,李小珍,张德海.面向深度学习的多模态融合技术研究综述[J].计算机工程,2020,46(5):1-11. 被引量：66
2陈鹏,李擎,张德政,杨宇航,蔡铮,陆子怡.多模态学习方法综述[J].工程科学学报,2020,42(5):557-569. 被引量：28
3林敏鸿,蒙祖强.基于注意力神经网络的多模态情感分析[J].计算机科学,2020,47(S02):508-514. 被引量：17
4曾春艳,严康,王志锋,余琰,纪纯妹.深度学习模型可解释性研究综述[J].计算机工程与应用,2021,57(8):1-9. 被引量：30
5陆莉霞,邹俊忠,郭玉成,张见,王蓓.多模态融合的膝关节损伤预测[J].计算机工程与应用,2021,57(9):225-232. 被引量：10

共引文献70

1张才俊,江帆,张波.基于改进型B+树的电力多模态数据索引系统研究[J].系统仿真技术,2023,19(4):302-307.
2罗思言,王心舟,饶向荣.人工智能在中医诊断中的应用进展[J].中国医学物理学杂志,2022,39(5):647-654. 被引量：12
3祝文军,王思宁,高晓欣,郑倩.基于知识流和迁移学习的负荷预测[J].电信科学,2022,38(5):114-123. 被引量：3
4郭婷,张天序,郭诗嘉.一种红外图像和宽光谱融合的人脸识别算法[J].武汉工程大学学报,2022,44(3):320-324. 被引量：2
5牛红伟,郝佳,曹贝宁,龙辉,张非凡,王国新.面向产品概念设计的多模态智能交互框架及实现[J].计算机集成制造系统,2022,28(8):2508-2521. 被引量：6
6罗兰.国漫文化的表意实践、发展逻辑与正向建构[J].电视研究,2022(7):74-77. 被引量：1
7祁铧颖,贺萍.跨模态数据融合综述[J].软件工程,2022,25(10):1-7. 被引量：5
8张继东,张慧迪.融合注意力机制的多模态突发事件用户情感分析[J].情报理论与实践,2022,45(11):170-177. 被引量：10
9朱曼,文元桥,孙吴强,张家辉,Axel HAHN.船舶运动模型参数辨识研究综述[J].交通信息与安全,2022,40(5):1-11. 被引量：4
10徐文婉,周小平,王佳.跨模态检索技术研究综述[J].计算机工程与应用,2022,58(23):12-23. 被引量：7

计算机与现代化

2024年第7期

浏览历史

内容加载中请稍等...

基于模态类别的多模态信息处理与融合综述

参考文献2

二级参考文献5

共引文献70

相关作者

相关机构

相关主题

浏览历史