摘要
6G时代下,为了兼顾多媒体用户音频、视频、触觉的沉浸式体验需求与低时延、高可靠、大容量的通信质量,提出一种跨模态信号重建架构和由视频信号重建触觉信号的深度学习模型。首先,通过控制机器人触摸各种材质,构建了包含音频、视频、触觉信号的数据集VisTouch,为后续各种跨模态问题的研究奠定基础;其次,通过利用多模态信号间的语义关联性,设计一种普适的、稳健的端到端跨模态信号重建框架;再次,以通过视频信号重建触觉信号为例,构建视频辅助的触觉重建模型,包括基于3D CNN的视频特征提取网络,基于全卷积网络的GAN生成网络与基于CNN的GAN辨别网络;最后,通过实验结果验证跨模态信号重建框架的可靠性以及触觉重建模型的准确性。
In the 6G era,to balance the immersive experience needs of multimedia users for audio,video,and haptics with low-latency,high-reliability,and large-capacity communication,a cross-modal signal reconstruction framework and video-to-haptic reconstruction model was proposed.First,robots were controlled to touch various materials.In this way,a large-scale dataset VisTouch that includes audio,video,and haptic signals was constructed.This dataset could lay the foundation for subsequent researches on various cross-modal problems.In addition,based on the semantic relations of multi-modal signals,a universe and robust end-to-end cross-modal signal reconstruction framework was designed.Furthermore,the reconstruction from video to haptic signals was taken as an example.A video-assisted haptic reconstruction model was established,including a 3D CNN-based video extraction sub-network,a fully convolutional network based GAN generation sub-network and a CNN-based GAN discrimination sub-network.Finally,the reliability of the cross-modal signal reconstruction framework and the accuracy of the proposed video-to-haptic model were verified through experimental results.
作者
李昂
陈建新
魏昕
周亮
LI Ang;CHEN Jianxin;WEI Xin;ZHOU Liang(College of Telecommunications&Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;Key Laboratory of Broadband Wireless Communication and Sensor Network Technology(Ministry of Education),Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
出处
《通信学报》
EI
CSCD
北大核心
2022年第6期28-40,共13页
Journal on Communications
基金
国家自然科学基金资助项目(No.62071254)
江苏高校优势学科建设工程基金资助项目。