期刊文献+

基于四元组度量损失的多模态变分自编码模型 被引量:1

Quadruplet Metric Loss Based on Multimodal Variational Auto-Encoder
下载PDF
导出
摘要 由于多模态数据具有异质性与耦合性等特点,使得对其进行建模存在较大难度.目前对多模态数据建模的一个重要研究方向是以变分自编码器为框架的多模态深度概率生成模型,已有的研究对不同模态数据之间的共享信息没有显式的约束,这使得多模态数据共享信息与私有信息不能被高效地解耦表示,进而导致数据的信息抽取不准确以及图像生成质量不清晰.本文在解耦表示共享信息与私有信息的研究思路上提出了基于四元组度量损失的多模态变分自编码(quadruplet metric loss based multimodal variational auto-encoder,Q-MVAE)模型,引入四元组度量损失,在隐空间显式地约束共享信息的抽取与对齐,使模型学到更好的解耦表示.相关定性与定量的实验证明了本文提出的模型在MNIST-SVHN多模态数据集上的数据表示与生成性能优于各对比模型.实验同时验证了模型对数据的推理表示也可用于多模态数据分类等下游任务.此外,模型还展现了对图像风格等私有信息解耦表示生成的潜力. Because of the heterogeneity and coupling of multimodal data,it is difficult to model multimodal data. At present,an important research direction of multimodal data modeling is the multimodal depth probability generative model based on the framework of variational auto-encoder. However,the existing research has no explicit constraints on the shared information between different modal data,which makes the multimodal data sharing and private information can not be effectively decoupled and represented,thus resulting in inaccurate data extraction and unclear image generative quality. Based on the research idea of decoupling representation of shared and private information,in this article we propose a quadruplet metric loss based multimodal variational auto-encoder(Q-MVAE),introduce quadruplet metric loss,explicitly constrain the extraction and alignment of shared information in the hidden space,and make the model learn better decoupling representation. Relevant qualitative and quantitative experiments show that the data representation and generative performance of the proposed model on MNIST-SVHN multimodal data set is better than that of the comparison models. At the same time,the experiment verifies that the model can also be used for downstream tasks such as multimodal data classification. Moreover,the model also shows the potential of generating decoupled representations of private information such as image style.
作者 陈亚瑞 杨剑宁 吴世伟 刘垚 王晓捷 CHEN Yarui;YANG Jianning;WU Shiwei;LIU Yao;WANG Xiaojie(College of Artificial Intelligence,Tianjin University of Science&Technology,Tianjin 300457,China)
出处 《天津科技大学学报》 CAS 2022年第6期45-53,62,共10页 Journal of Tianjin University of Science & Technology
基金 天津市研究生科研创新项目(人工智能专项)(2020YJSZXS31)。
关键词 多模态数据 变分自编码器 生成模型 度量学习 multimodal data variational auto-encoder generative model metric learning
  • 相关文献

参考文献2

二级参考文献2

共引文献4

同被引文献12

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部