提高耳语音可懂度的非对称压缩语音增强方法被引量：3

An asymmetric attenuated speech enhancement approach for improving intelligibility of noisy whisper

下载PDF

导出

摘要提出两种基于非对称代价函数的耳语音增强算法,将语音增强过程中的放大失真和压缩失真区分对待。Modified ItakuraSaito(MIS)算法对放大失真给予更多的惩罚,而Kullback-Leibler(KL)算法则对压缩失真给予更多的惩罚。实验结果表明,在低于—6 dB的低信噪比情况中,经MIS算法增强后的耳语音的可懂度相比传统算法有显著提高;而KL算法则获得了同最小均方误差语音增强算法近似的可懂度提高效果,证实了耳语音中的放大失真和压缩失真对于耳语音可懂度的影响并不相同,低信噪比时较大的压缩失真有助于提高耳语音可懂度,而高信噪比时的压缩失真对耳语音可懂度影响较小。 Two asymmetric cost function for whispered speech enhancement methods are proposed. The cost of the amplification distortion and the attenuation distortion are different in both methods. The Modified Itakura-Saito （MIS） distance function gives more penalties to speech amplification distortion while the Kullback-Leibler （KL） divergence function gives more penalties to speech attenuation distortion. The experimental results show that the MIS method gains larger intelligibility improvement of the whispered speech than the conventional speech enhancement algorithms in much lower Signal to Noise Ratio （SNR） less than -6 dB, and the KL method has similar intelligibility improvement performance to the Minimum Mean Square Error （MMSE） speech enhancement method. The results confirm that the amplification distortion and the attenuation distortion have different effects on the intelligibility of the enhanced whisper. Specifically, larger attenuation distortion can improve speech intelligibility in lower SNR condition and it has a little influence on speech intelligibility in high SNR condition.

作者周健郑文明王青云赵力

机构地区安徽大学计算智能与信号处理教育部重点实验室东南大学水声信号处理教育部重点实验室东南大学儿童发展与学习科学教育部重点实验室

出处《声学学报》 EI CSCD 北大核心 2014年第4期501-508,共8页 Acta Acustica

基金国家自然科学基金(61301295 61231002 61273266 61003131) 安徽省自然科学基金(1308085QF100 1408085MF113) 安徽大学博士科研启动经费资助

关键词耳语音可懂度语音增强非对称放大失真代价函数噪声谱信噪比高斯噪声最小均方误差 Speech intelligibility Speech recognition

分类号 TN912.35 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献20

1Tartter V C. Identifiability of vowels and speakers from whispered syllables. Attention, Perception, Psychophy- sies, 1991, 49(4): 365-372.
2王敏,赵鹤鸣.基于多带解调分析和瞬时频率估计的耳语音话者识别[J].声学学报,2010,35(4):471-476. 被引量：12
3陶智,赵鹤鸣,谈雪丹,顾济华,张晓俊,吴迪.采用扩展型双线性变换法将耳语音转换为正常语音的研究[J].声学学报,2012,37(6):651-658. 被引量：4
4顾晓江,赵鹤鸣,吕岗.模型与特征混合补偿法及其在耳语说话人识别中的应用[J].声学学报,2012,37(2):198-203. 被引量：4
5Jin Yun, Zhao Yan, Huang Chengwei, Zhao Li. Study on the emotion recognition of whispered speech. In: Zhou Shangming, Wang Wenwu ed. GCIS2009, Proceedings of WRI Global Congress on Intelligent Systems, Xiamen, China, 2009, Piscataway, N J: IEEE, 2009:242-246.
6Li Junfeng, Yang Lin, Zhang Jianping, Yan Yonghong. Comparative intelligibility investigation of single-channel noise-reduction algorithms for Chinese, 3apanese, and En- glish. The Journal of the Acoustical Society of America, 2011, 129(5): 3291-3301.
7杨琳,张建平,颜永红.单通道语音增强算法对汉语语音可懂度影响的研究[J].声学学报,2010,35(2):248-253. 被引量：17
8Loizou P C, Kim G. Reasons why current speech- enhancement algorithms do not improve speech intelligibil- ity and suggested solutions. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(1): 47-56.
9Ephraim Y, Malah D. Speech enhancement using a mini- mum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Pro- cessing, 1985 33(2): 443-445.
10Cohen I. Noise spectrum estimation in adverse environ- ments: Improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing, 2003, 11(5): 466-475.

二级参考文献51

1栗学丽,丁慧,徐柏龄.基于熵函数的耳语音声韵分割法[J].声学学报,2005,30(1):69-75. 被引量：34
2王晶,傅丰林,张运伟.语音增强算法综述[J].声学与电子工程,2005(1):22-26. 被引量：21
3林玮,杨莉莉,徐柏龄.基于修正MFCC参数汉语耳语音的话者识别[J].南京大学学报（自然科学版）,2006,42(1):54-62. 被引量：23
4樊星,卢晶,徐柏龄.汉语耳语音转换为正常音的研究[J].电声技术,2005,29(12):44-47. 被引量：11
5李国锋,刘莹.利用倒谱方法实现气声发育的重建[J].应用声学,1996,15(5):41-44. 被引量：5
6张家禄齐士钤宋美珍等.汉语声调在言语可懂度中的重要作用.声学学报,1981,7:237-237.
7Song Myung-Suk, Lee Chang-Heon, Kang Hong-Goo. Performance analysis of various single channel speech enhancement algorithms for automatic speech recognition. Inter- speech2006, 1451-1454, Pittsburgh, Pennsylvania.
8Hu Guoning, Wang DeLiang. Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Networks, 2004; 15(5): 1135-1150.
9Hu Yi, Loizou P C. A comparative intelligibility study of single-microphone noise reduction algorithms. J. Acoust. Soc. Am., 2007; 122(3): 1777-1786.
10Hu Yi, Loizou P C. Subjective evaluation and comparison of speech enhancement algorithms. Speech Communication, 2007; 49:588-601.

共引文献30

1黄永明,章国宝,李雄,达飞鹏.全局特征及弱尺度融合策略的小样本语音情感识别[J].声学学报,2012,37(3):330-338. 被引量：9
2梁瑞宇,邹采荣,赵力,王青云,奚吉.汉语数字助听器高频听损增强方法的实验研究[J].声学学报,2012,37(5):527-533. 被引量：1
3王辉,张玲华.数字助听器中广义旁瓣抵消器结构的汉语语音处理技术[J].声学学报,2012,37(5):534-538.
4张潇丹,包永强,奚吉,赵力,邹采荣.基于MD-CM-SFLA神经网络的耳语音情感识别[J].东南大学学报（自然科学版）,2012,42(5):848-853. 被引量：2
5蒋斌,匡正,吴鸣,杨军.汉语音段反转言语的可懂度研究[J].声学学报,2012,37(6):659-666. 被引量：3
6陈雪勤,赵鹤鸣.有效高斯分量通用背景模型下耳语音声道系统转换研究[J].声学学报,2013,38(2):195-200. 被引量：5
7雍雅琴,沙洪,李鹏.数字助听器中广义旁瓣消除器的仿真研究[J].医疗卫生装备,2013,34(5):13-15. 被引量：1
8CHEN Xueqin,ZHAO Heming.Research of whispered speech vocal tract system conversion based on universal background model and effective Gaussian components[J].Chinese Journal of Acoustics,2013,32(4):400-410. 被引量：1
9龚呈卉,赵鹤鸣,陶智,张庆芳.全局谱参数下的耳语说话人状态因子分析[J].声学学报,2014,39(2):281-288. 被引量：1
10何勇军,付茂国,孙广路.语音特征增强方法综述[J].哈尔滨理工大学学报,2014,19(2):19-25. 被引量：3

同被引文献55

1Boll S. Suppression of acoustic noise in speech using spec- tral subtraction[ J ]. Acoustics Speech & Signal Processing IEEE Transactions on, 1979, 27 (2) : 113-120.
2Scalart P, Filho J V. Speech enhancement based on a priori signal to noise estimation [ C ]//IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Atlanta, 1996, 2 : 629-632.
3Ephraim Y, Malah D. Speech enhancement using a mini- mum-mean square error short-time spectral amplitude es- timator[ J]. Acoustics, Speech and Signal Processing, IEEE Transactions on, 1984, 32(6) : 1109-1121.
4Zhou J, Liang R, Zhao L, et al. Whisper Intelligibility Enhancement Using a Supervised Learning Approach [ J ]. Circuits Systems & Signal Processing, 2012, 31 (6): 2061-2074.
5Chen J, Wang Y, Wang D L. A feature study for classifi- cation-based speech separation at very low signal-to-noise ratio [ C ] //IEEE International Conference on Acoustics, Speech and Signal Processing ( ICASSP ), Florence, 2014 : 7039 -7043.
6Li N, Loizou P C. Factors influencing intelligibility of i- deal binary-masked speech: Implications for noise reduc- tion [ J ]. The Journal of the Acoustical Society of Ameri- ca, 2008, 123(3): 1673-1682.
7Kim G, Loizou P C. Improving speech intelligibility in noise using environment-optimized algorithms [J]. IEEE Transactions on Audio, Speech, and Language Process- ing, 2010, 18(8): 2080-2090.
8Kim G, Loizou P C. A new binary mask based on noise constraints for improved speech intelligibility [ C ]//IN- TERSPEECH, Chiba, Japan, 2010: 1632-1635.
9Li N, Bao C C, Xia B Y, et al. Speech intelligibility im- provement using the constraints on speech distortion and noise over-estimation [ C ] //IEEE International Confer- ence on Intelligent Information Hiding and Multimedia Signal Processing, Beijing, 2013: 602-606.
10Kim G. Binary Mask Criteria Based on Distortion Con- straints Induced by a Gain Function for Speech Enhance- ment [ J ]. IEIE Transactions on Smart Processing and Computing, 2013, 2(4): 197-202.

引证文献3

1叶琪,陶亮,周健,王华彬.基于噪声谱约束的二值掩码估计语音增强算法[J].信号处理,2016,32(1):70-76. 被引量：1
2叶琪,陶亮,周健,王华彬.基于联合失真控制的子空间语音增强算法[J].声学技术,2016,35(3):254-259.
3王康,王鹏,麻乘榕,毛燕蓉,邱小军.噪声对单耳语言可懂度的影响[J].声学学报,2016,41(5):776-783. 被引量：1

二级引证文献2

1成帅,张海剑,孙洪.结合时变滤波和时频掩码的语音增强方法[J].信号处理,2019,35(4):601-608. 被引量：5
2全飞熊,陈浠庆,李晨,陈亚柯,尹源,陈瑞扬,阎一诺,罗雯.复合声子结构的声场模拟及噪声-压电转化设计[J].应用声学,2022,41(2):182-191.

1吴凯,张沁心,周妍妍.单环SRR型左手材料对微带天线增益提高的研究[J].中国科技纵横,2012(16):25-26.
2曹玉强,柏森,蔡凯,李卫东.基于G.729压缩语音流隐蔽通信系统研究[J].现代电子技术,2013,36(17):68-70.
3王俊,张思发,黄永峰.VOIP语音流的捕获和过滤方法研究[J].数据通信,2009(2):32-34.
4唐晖,李弼程,屈丹,张连海.VoIP压缩码流说话人识别研究[J].计算机工程,2009,35(7):180-182. 被引量：2
5李春华,付丽.基于DCT变换的数字图像盲水印算法[J].河北科技大学学报,2012,33(4):334-337. 被引量：3
6李晔,崔慧娟,唐昆.基于谱减的语音增强算法的改进[J].清华大学学报（自然科学版）,2006,46(10):1685-1687. 被引量：6
7段丽红,王海梅.遇疑难修电源[J].家电维修（大众版）,2009(9):14-14.
8杨于村,蒋燕.基于广义线性区分核支持向量机的说话人确认[J].电声技术,2009,33(8):64-67.
9Y.Yoshida,Y.Kikuchi,M.Sugino,S.Daly,张永胜.大屏幕液晶电视图像画质的改善[J].现代显示,2005(10):23-26.
10刘晓侠,倪林.一种标准兼容的多描述图像编码分析及改进[J].计算机工程与应用,2010,46(2):184-187. 被引量：2

声学学报

2014年第4期

浏览历史

内容加载中请稍等...

提高耳语音可懂度的非对称压缩语音增强方法被引量：3

参考文献20

二级参考文献51

共引文献30

同被引文献55

引证文献3

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

提高耳语音可懂度的非对称压缩语音增强方法 被引量：3

参考文献20

二级参考文献51

共引文献30

同被引文献55

引证文献3

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

提高耳语音可懂度的非对称压缩语音增强方法被引量：3