期刊文献+

DVUGAN:基于STDCT的DDSP集成变分U-Net的语音增强 被引量:2

DVUGAN:DDSP Integrated Variational U-Net Speech Enhancement Based on STDCT
下载PDF
导出
摘要 本文提出基于生成对抗网络设计的DVUGAN模型,用于语音增强任务。该模型工作在变换域上,输入采用STDCT特征,该特征能隐式表达相位,可在实值网络中学习,避免了复频域复杂网络或处理,利用相位的同时降低模型复杂度;生成器采用变分U-Net编解码器,集成DDSP组件利用强归纳偏置显著提升自动编码器性能,变分概率瓶颈改善脉冲噪声源的抑制,增加对未知数据分布的鲁棒性;引入DDSP中的Multi-Scale Spectral Loss,利用振荡器感知偏差,指导生成器优化感知性能;将SI-SNR Loss优化判别器性能,以平衡生成对抗网络结构,促使模型稳定训练。该模型在DNS开发数据集和Voice Bank+DEMAND数据集下评估优于基线模型和最近部分研究,证明了本文提出的DVUGAN在变换域语音增强领域的优越性。 In this paper,a DVUGAN model based on generative adversarial network design is proposed for speech enhancement tasks. The model works in the transform domain,and the input adopts the STDCT feature,which can express the phase implicitly and can be learned in the real valued network,avoiding the complex network or processing in the complex frequency domain,and reducing the complexity of the model while using the phase. The generator uses a variational U-Net codec,integrates DDSP components and utilizes strong inductive bias to significantly improve the performance of the autoencoder. The variational probability bottleneck improves the suppression of pulse noise sources and increases the robustness of unknown data distribution. Multi-scale Spectral Loss in DDSP is introduced to guide the generator to optimize the sensing performance by using the oscillator perception bias. The performance of the discriminant is optimized by the SI-SNR Loss,so as to balance the structure of the adversarial network and promote the stable training of the model. The model is evaluated to be superior to the baseline model and some recent studies in the DNS development dataset and Voice Bank+Demand dataset,which prove the superiority of the proposed DVUGAN in the field of speech enhancement in the transformation domain.
作者 徐峰 李平 XU Feng;LI Ping(Academy of Information Science and Engineering,Huaqiao University,Xiamen,Fujian 361021,China)
出处 《信号处理》 CSCD 北大核心 2022年第3期582-589,共8页 Journal of Signal Processing
基金 福建省科技重大专项(2020HZ02014) 福建省自然科学基金项目(2018J01095) 福建省高校产学研合作科技重大项目(2013H6016) 华侨大学中青年教师科技创新资助计划项目(ZQN-PY509)。
关键词 语音增强 STDCT DDSP 生成对抗网络 speech enhancement STDCT DDSP generative adversarial networks
  • 相关文献

同被引文献2

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部