动态特征联合新掩模优化神经网络语音增强

Combination of dynamic features with a new mask to optimize neural network speech enhancement

下载PDF

导出

摘要针对神经网络语音增强算法因特征选取不能全面表示语音非线性结构导致语音质量较差的问题,提出一种动态特征联合新掩模优化神经网络语音增强的方法。首先,提取带噪语音的3种特征并进行拼接以得到静态特征,后求一阶、二阶差分导数,捕捉语音的瞬息信号,融合成动态特征,动静结合完成特征内部互补,减少语音失真。其次,为了使增强语音的可懂度和清晰度同时达到最好,提出一种新的自适应掩模,它既能自适应调整语音、噪声的能量比例,又能自适应调节传统掩模和平方根掩模的比例;并用Gammatone通道权重修改每个通道内的掩模值,模仿人类听觉系统,进一步提升语音的可懂度。最后,对不同噪声背景下的多条语音进行实验仿真。结果表明,与已有的文献中不同算法相比,该算法的信噪比、主观语音质量、短时客观可懂度值都较高,验证了该算法的有效性。 Concerning the problem that the Neural Network speech enhancement algorithm cannot fully represent the nonlinear structure of speech due to feature selection,which leads to speech distortion.This paper proposes the combination of dynamic features with a new mask to optimize neural network speech enhancement.First,three features of noisy speech are extracted and spliced to obtain static features.Then,the first and second difference derivatives are obtained to capture the instantaneous signals of speech and fuse them into dynamic features.The combination of dynamic and static features completes internal complementarity of features and reduced speech distortion.Second,in order to enhance the intelligibility and clarity of speech at the same time,an adaptive mask is proposed,which can adjust the energy ratio of speech and noise as well as the ratio of the traditional mask and the square root mask.The Gammatone channel weight is used to modify the mask value in each channel to simulate the human auditory system and further improve the speech intelligibility.Finally,the simulation of multiple voices under different noise backgrounds shows that compared with different literature algorithms,the algorithm has a higher SNR,subjective speech quality and short-term objective intelligibility,which verifies the effectiveness of the algorithm.

作者梅淑琳贾海蓉王晓刚武奕峰 MEI Shulin;JIA Hairong;WANG Xiaogang;WU Yifeng(College of Information and Computer,Taiyuan University of Technology,Taiyuan 030024,China;Network Optimization Center,China Unicom Shanxi Branch,Taiyuan 030000,China)

机构地区太原理工大学信息与计算机学院中国联通山西省分公司网络优化中心

出处《西安电子科技大学学报》 EI CAS CSCD 北大核心 2021年第3期91-98,共8页 Journal of Xidian University

基金国家自然科学基金(12004275) 山西省留学回国人员科技活动择优资助(20200017) Research Project Supported by Shanxi Scholarship Council of China(2020042)。

关键词动态特征自适应掩模语音增强神经网络 dynamic characteristics adaptive mask speech enhancement Neural Network

分类号 TN912.35 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献8

1贾海蓉,王卫梅,王雁,裴俊华.区分性联合稀疏字典交替优化的语音增强[J].西安电子科技大学学报,2019,46(3):74-81. 被引量：5
2袁文浩,娄迎曦,梁春燕,王志强.感知联合优化的深度神经网络语音增强方法[J].西安电子科技大学学报,2019,46(2):89-94. 被引量：4
3李保明,付小宁.基于理想组合掩蔽的监督性语音增强算法[J].计算机科学与应用,2018,8(4):546-552. 被引量：1
4王雁,贾海蓉,吉慧芳,王卫梅.特征联合优化深度信念网络的语音增强算法[J].计算机工程与应用,2019,55(9):38-42. 被引量：5
5郭欣,贾海蓉,王栋.利用子空间改进的K-SVD语音增强算法[J].西安电子科技大学学报,2016,43(6):109-115. 被引量：4
6白静,史燕燕,薛珮芸,郭倩岩.融合非线性幂函数和谱减法的CFCC特征提取[J].西安电子科技大学学报,2019,46(1):86-92. 被引量：11
7刘文举,聂帅,梁山,张学良.基于深度学习语音分离技术的研究现状与进展[J].自动化学报,2016,42(6):819-833. 被引量：67
8袁文浩,梁春燕,娄迎曦,房超,王志强.一种时频平滑的深度神经网络语音增强方法[J].西安电子科技大学学报,2019,46(4):130-136. 被引量：6

二级参考文献89

1岳倩倩,周萍,景新幸.基于非线性幂函数的听觉特征提取算法研究[J].微电子学与计算机,2015,32(6):163-166. 被引量：5
2黄丽娜,苏轼阁,刘莎,韩娜.中文广东话版与普通话版噪声下言语测试材料的开发(英文)[J].中国耳鼻咽喉头颈外科,2005,12(1):55-60. 被引量：36
3高银秋,邓宗元,杨震.数字音频产品中基于人耳听觉感知特性的水印嵌入系统设计[J].南京邮电大学学报（自然科学版）,2006,26(5):56-64. 被引量：2
4Kim G, Lu Y, Hu Y, Loizou P C. An algorithm that im- proves speech intelligibility in noise for normal-hearing lis- teners. The Journal of the Acoustical Society of America, 2009, 126(3): 1486-1494.
5Dillon H. Hearing Aids. New York: Thieme, 2001.
6Allen J B. Articulation and intelligibility. Synthesis Lectures on Speech and Audio Processing, 2005, 1(1): 1-124.
7Seltzer M L, Raj B, Stern R M. A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Communication, 2004, 43(4): 379-393.
8Weninger F, Erdogan H, Watanabe S, Vincent E, Le Roux J, Hershey J R, Schuller B. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation. Liberec, Czech Republic: Springer International Publishing, 2015.91 -99.
9Weng C, Yu D, Seltzer M L, Droppo J. Deep neural networks for single-channel multi-talker speech recognition. IEEE/ ACM Transactions on Audio, Speech, and Language Pro- cessing, 2015, 23(10): 1670-1679.
10Boll S F. Suppression of acoustic noise in speech using spec- tral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979, 27(2): 113-120.

共引文献93

1李艳生,刘园,张毅,杨美美.混响环境下移动机器人语音控制方法及系统实现[J].仪器仪表学报,2019,40(11):165-171. 被引量：13
2杨海龙,曾祥福,钟维良.多尺度时域单通道语音分离网络设计[J].电声技术,2021,45(10):96-99.
3黄张翼,周翊,舒晓峰,刘宏清.联合贝叶斯估计与深度神经网络的语音增强方法[J].小型微型计算机系统,2019,40(1):40-44. 被引量：5
4吕菲,夏秀渝.基于方位特征的听觉选择性注意计算模型研究[J].自动化学报,2017,43(4):634-644. 被引量：5
5支艳利,张云伟.基于环形麦克风阵列的远场语音识别系统[J].微型电脑应用,2017,33(4):62-64. 被引量：2
6王程,周婉,何军.面向自动音乐生成的深度递归神经网络方法[J].小型微型计算机系统,2017,38(10):2412-2416. 被引量：14
7周正仙,邹翔,袁扬胜,甘露,祝玉军.干涉型光纤语音传感器及语音降噪方法研究[J].仪器仪表学报,2017,38(11):2715-2724. 被引量：7
8袁文浩,孙文珠,夏斌,欧世峰.利用深度卷积神经网络提高未知噪声下的语音增强性能[J].自动化学报,2018,44(4):751-759. 被引量：36
9朱慧敏.基于人耳掩蔽特性的语音增强数字助听器研究[J].国外电子测量技术,2018,37(5):129-132.
10凌佳佳,袁晓兵.联合噪声分类和掩码估计的语音增强方法[J].电子设计工程,2018,26(17):30-34. 被引量：3

1茅言杰,李思坤,王向朝,韦亚一,陈国栋.基于光刻胶三维形貌的光刻多参数联合优化方法[J].光学学报,2020,40(4):144-156. 被引量：3
2陈永明,戴颖超.基于Focal Loss的GBDT改进分类算法研究[J].机电技术,2020(3):32-35. 被引量：2
3李盼,王玉,吴正午.基于神经网络的电子卷宗自动分类方法研究[J].中国电子科学研究院学报,2021,16(4):363-368. 被引量：3
4王小乔,尹晓盟,刘丹,周艳丽,刘蕊.心房颤动知识评估工具的汉化及信度效度检验[J].中国护理管理,2021,21(5):670-673. 被引量：1
5水天中.永恒的瞬息[J].艺术品鉴,2021(6):61-61.
6本刊编辑部.对生活充满梦想[J].读天下,2021(5):1-1.
7董宏越,马建芬,张朝霞.基于时域波形映射-频域谐波损失的语音增强[J].计算机工程与设计,2021,42(6):1677-1683. 被引量：2
8韦进,郝洪涛,张晓彤,胡敏章,申重阳.利用中国北部连续重力站资料评估全球潮汐模型精度[J].地震学报,2021,43(1):84-99. 被引量：4
9李玥,张丁丁,佟元任,魏怡真,朱惠娟,刘继海,罗林枝,张抒扬.临床和转化医学研究导师胜任力评价量表的汉化和信效度分析[J].中华医学教育杂志,2021,41(5):472-476. 被引量：5
10李晶晶,夏鸿斌,刘渊.融合注意力LSTM的神经张量分解推荐模型[J].中文信息学报,2021,35(5):91-100. 被引量：1

西安电子科技大学学报

2021年第3期

浏览历史

内容加载中请稍等...

动态特征联合新掩模优化神经网络语音增强

参考文献8

二级参考文献89

共引文献93

相关作者

相关机构

相关主题

浏览历史