期刊文献+

一种基于多模态感知的双声道音频生成方法 被引量:1

A Dual-Channel Audio Generation Method Based on Multimodal Perception
下载PDF
导出
摘要 现有多数视频只包含单声道音频,缺乏双声道音频所带来的立体感。针对这一问题,本文提出了一种基于多模态感知的双声道音频生成方法,其在分析视频中视觉信息的基础上,将视频的空间信息与音频内容融合,自动为原始单声道音频添加空间化特征,生成更接近真实听觉体验的双声道音频。我们首先采用一种改进的音频视频融合分析网络,以编码器-解码器的结构,对单声道视频进行编码,接着对视频特征和音频特征进行多尺度融合,并对视频及音频信息进行协同分析,使得双声道音频拥有了原始单声道音频所没有的空间信息,最终生成得到视频对应的双声道音频。在公开数据集上的实验结果表明,本方法取得了优于现有模型的双声道音频生成效果,在STFT距离以及ENV距离两项指标上均取得提升。 Most existing videos only contain mono audio and lack the stereoscopic sense by dual-channel audio.To address this issue,this paper proposes a method for generating dual-channel audio based on multimodal perception.Based on the analysis of visual information in the video,it fuses the spatial information and the audio content of the video,and generates dual-channel audio that is closer to the real auditory experience.We first encode the mono video via an improved audio-video fusion analysis network with an encoder-decoder structure.Then we fuse the video features and audio features in multiple perspectives.Subsequently,we co-analyze the video and audio information,so that the dual-channel audio has spatial information that the original mono audio does not have.Finally,the corresponding dual-channel audio is generated by the audio-video fusion analysis network.Experimental results demonstrate that our method achieves better performance than existing models in the generation of two-channel audio,with improvements in both STFT distance and ENV distance.
作者 官丽 尹康 樊梦佳 薛昆 解凯 GUAN Li;YIN Kang;FAN Meng-jia;XUE Kun;XIE Kai(Beijing Electric Power Corporation,Beijing 100031,China;NR Electric Co.,Ltd.,Nanjing,Jiangsu 211102,China)
出处 《计算技术与自动化》 2022年第4期157-165,共9页 Computing Technology and Automation
关键词 音频生成 卷积神经网络 多模态 audio generation CNN multimodal
  • 相关文献

同被引文献12

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部