期刊文献+

多模态深度学习综述 被引量:27

Survey of Multimodal Deep Learning
下载PDF
导出
摘要 模态是指人接收信息的方式,包括听觉、视觉、嗅觉、触觉等多种方式。多模态学习是指通过利用多模态之间的互补性,剔除模态间的冗余性,从而学习到更好的特征表示。多模态学习的目的是建立能够处理和关联来自多种模式信息的模型,它是一个充满活力的多学科领域,具有日益重要和巨大的潜力。目前比较热门的研究方向是图像、视频、音频、文本之间的多模态学习。着重介绍了多模态在视听语音识别、图文情感分析、协同标注等实际层面的应用,以及在匹配和分类、对齐表示学习等核心层面的应用,并针对多模态学习的核心问题:匹配和分类、对齐表示学习方面给出了说明。对多模态学习中常用的数据集进行了介绍,并展望了未来多模态学习的发展趋势。 Modal refers to the way people receive information,including hearing,vision,smell,touch and other ways.Multimodal learning refers to learning better feature representation by using the complementarity between multimodes and eliminating the redundancy between them.The purpose of multimodal learning is to build a model that can deal with and correlate information from multiple modes.It is a dynamic multidisciplinary field,with increasing importance and great potential.At present,the popular research direction is multimodal learning among image,video,audio and text.This paper focuses on the application of multimodality in audio-visual speech recognition,image and text emotion analysis,collaborative annotation and other practical levels,as well as the application in the core level of matching and classification,alignment representation learning,and gives an explanation for the core issues of multimodal learning:matching and classification,alignment representation learning.Finally,the common data sets in multimodal learning are introduced,and the development trend of multimodal learning in the future is prospected.
作者 孙影影 贾振堂 朱昊宇 SUN Yingying;JIA Zhentang;ZHU Haoyu(College of Electronics and Information Engineering,Shanghai University of Electric Power,Shanghai 200090,China)
出处 《计算机工程与应用》 CSCD 北大核心 2020年第21期1-10,共10页 Computer Engineering and Applications
基金 国家自然科学基金青年科学基金(No.61401269)。
关键词 多模态学习 多模态应用 多模态融合 共享表示空间 multimodal learning multimodal application multimodal fusion shared representation space
  • 相关文献

参考文献11

二级参考文献28

共引文献245

同被引文献298

引证文献27

二级引证文献162

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部