期刊文献+

双路注意力循环网络的轻量化语音分离

Light-weight speech separation based on dual-path attention and recurrent neural network
下载PDF
导出
摘要 提出了双路注意力循环网络的轻量化语音分离方法。首先,该方法使用基于“双路注意力机制”和“双路循环网络”的可选择分支结构对语音信号进行建模,从而提取深层特征信息并降低模型的参数量。其次,引入子带处理技术,从而降低模型的计算量。在LibriCSS数据集上的实验结果表明,该方法取得的平均词错误率为8.6%,且参数量和计算量分别仅为0.15 MiB和15.2 G/6s,与当前主流方法相比,分别减小了3.3~391.3倍和1.1~3.2倍。这表明,所提方法在取得高语音分离性能的同时,能有效地降低模型的参数量和计算量。 A light-weight speech separation algorithm based on dual-path attention and recurrent neural network is proposed.First,optional branch structures based on dual-path attention mechanism and dual-path recurrent network are utilized to model the speech signals,which facilitate the extraction of deep feature information and the reduction of training parameters.Second,sub-band processing approach is introduced to alleviate the computation burden.As shown by the experimental results on the LibriCSS dataset,the average word error rate obtained by the proposed algorithm is 8.6%with only 0.15 MiB training parameters and 15.2 G/6s computation cost,which is 3.3−391.3 and 1.1−3.2 times smaller than other mainstream approaches.This proves the proposed algorithm can effectively reduce the training parameters and computation cost while achieving high speech separation performance.
作者 杨弋 胡琦 张鹏远 YANG Yi;HU Qi;ZHANG Pengyuan(Key Laboratory of Speech Acoustics and Content Understanding,Institute of Acoustics,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049)
出处 《声学学报》 EI CAS CSCD 北大核心 2023年第5期1060-1069,共10页 Acta Acustica
关键词 语音分离 轻量化模型 深度神经网络 双路网络 自注意力网络 Speech separation Light-weight model Deep neural network Dual-path network Self-attention network
  • 相关文献

参考文献7

二级参考文献93

  • 1陶智,赵鹤鸣,龚呈卉.基于听觉掩蔽效应和Bark子波变换的语音增强[J].声学学报,2005,30(4):367-372. 被引量:39
  • 2Benesty J, Makino S, Chen J. Speech enhancement. New York: Springer, 2005.
  • 3Brandstein M, Ward D. (Eds.). Microphone arrays signal processing techniques and applications. New York: Springer, 2001.
  • 4Deller J R, Proakis J G, Hansen J H L. Discrete-time processing of speech signals. New York: Macmillan Publishing Company, 1993.
  • 5Ephraim Y, Malah D. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. on ASSP, 1985; 33(2): 443-445.
  • 6Cappe O. Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Trans. on SAP, 1994; 2(2): 345-349.
  • 7Boll S F. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. on ASSP, 1979; 27(2): 113-120.
  • 8Gustafsson H, Nordholm S E, Claesson I. Spectral Subtraction Using Reduced Delay Convolution and Adaptive Averaging. IEEE Trans. on SAP, 2001; 9(8): 799-807.
  • 9Hu Y, Loizou P C. Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Trans. on ASLP, 2004; 12(1): 59-67.
  • 10Gulzow T, Ludwig T, Heute U. Spectral-subtraction speech enhancement in multirate systems with and without non- uniform and adaptive bandwidths. Signal Processing, 2003; 83(8): 1613-1631.

共引文献103

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部