期刊文献+

机械设备多模态声源分离方法研究

Research on Multimodal Sound Source Separation Method for Mechanical Equipment
下载PDF
导出
摘要 针对单模态混合信号分离方法存在的无法确定机械设备与声源对应关系的问题,提出一种多模态特征融合的机械设备声源分离方法。首先,通过利用多组不同尺度的特征提取层,构建一种多尺度特征提取结构的Res2Net18网络,以提取机械设备细粒度视觉特征;再用坐标注意力机制模块替换UNet网络中直接跳跃连接,以增强编码器中不同音频特征的空间位置信息表达。其次,将机械设备视觉特征融入混合音频特征中生成对应声源掩码,再利用掩码与混合音频频谱结合得到独立声源频谱,从而实现根据视觉特征分离对应机械设备声源,该方法有效解决了单模态混合信号分离方法存在的无法确定机械设备与声源对应关系的问题。最后,在机械设备数据集上SDR、SIR和SAR分别达到6.14 dB、8.59dB和18.33 dB,与现有三种多模态声源分离模型进行对比,所提多模态声源分离方法在SDR和SAR均取得最优结果,验证了多模态声源分离方法的有效性。 Aiming at the problem that the corresponding relationship between mechanical equipment and sound source cannot be determined in the single-modal mixed-signal separation method,a sound source separation method for mechanical equipment based on multi-modal feature fusion is proposed.Firstly,by using multiple sets of feature extraction layers of different scales,a Res2Net18 network with a multi-scale feature extraction structure is constructed to extract fine-grained visual features of mechanical equipment.The spatial position information expression of different audio features in the encoder is enhanced.Secondly,the visual features of mechanical equipment are integrated into the mixed audio features to generate a corresponding sound source mask,and then the independent sound source spectrum is obtained by combining the mask and the mixed audio spectrum,so as to realize the visual feature separation corresponds to the sound source of the mechanical equipment.The proposed method effectively solves the problem of the inability to determine the corresponding relationship between the mechanical equipment and the sound source in the single-mode mixed-signal separation method.Finally,the SDR,SIR and SAR respectively reach 6.14 dB,8.59 dB and 18.33 dB on the mechanical equipment data set.Compared with the existing three multimodal sound source separation models,the proposed multimodal sound source separation method achieves the best results in both SDR and SAR,which verifies its effectiveness.
作者 简斌 肖晓萍 李自胜 张楷 袁昊 JIAN Bin;XIAO Xiao-ping;LI Zi-sheng;ZHANG Kai;YUAN Hao(School of Manufacturing Science and Engineering,Southwest University of Science and Technology,Mianyang 621010,China;Engineering Technology Center,Southwest University of Science and Technology,Mianyang 621010.China;School of Mechanical Engineering,Southwest Jiaotong University,Chengdu 610031,China)
出处 《计算机技术与发展》 2023年第6期208-214,共7页 Computer Technology and Development
基金 国家重点研发项目(2021YFB3400702) 四川省科技计划项目(2018GZ0083,2018JY0245) 西南科技大学博士基金项目(17ZX7153,17ZX7154)。
关键词 机械设备 多模态数据 特征融合 声源分离 卷积神经网络 mechanical equipment multimodal data feature fusion sound source separation convolutional neural network
  • 相关文献

参考文献6

二级参考文献73

共引文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部