基于听感知特性的双麦克风近讲语音增强算法被引量：1

Dual-microphone speech enhancement algorithm based on the auditory features for a close-talk system

导出

摘要针对近讲系统的声学场景,提出一种基于听感知特性的双麦克风语音增强算法。模拟人耳频率分解特性,用gammatone滤波器组对2路麦克风采集的声音信号进行多子带频率分解;对分解后的时域信号进行分帧,生成时频单元,并计算每个时频单元的能量;以2路信号时频单元能量比值为线索,估计每个时频单元信噪比,模拟人耳掩蔽特性生成掩蔽模板,并作用于带噪声的语音信号,实现目标语音与环境噪声的分离。实验结果表明:由2路麦克风信号时频单元能量的比值可较准确估计时频单元的信噪比;该算法可提高带babble噪声命令词的识别正确率,优于当前单通道及双通道语音增强算法。 A dual-microphone speech enhancement algorithm was developed for close-talk applications using two microphones. Gammatone filterbanks were used to decompose the sound into multi-frequency channels. The decomposed signals were framed and analyzed to calculate the energy of each time-frequency （T-F） unit. The energy ratio between the two microphones was used to estimate the signal noise ratio and the binary mask for the T-F unit. Finally, the binary mask was used to separate the target speech from the mixture. Tests show that this algorithm accurately estimates the signal-to-noise ratio for the T-F units. The speech enhancement algorithm improves the recognition accuracy of command sentences with noise and is superior to single channel and dual channel speech enhancement algorithms.

作者蒋毅刘润生冯振明

机构地区清华大学电子工程系总后勤部军需装备研究所

出处《清华大学学报（自然科学版）》 EI CAS CSCD 北大核心 2014年第9期1179-1183,共5页 Journal of Tsinghua University(Science and Technology)

关键词语音增强双麦克风 gammatone滤波器组时频单元声学掩蔽 speech enhancement dual-microphone gammatone filterbanks time-frequency units auditory mask

分类号 TN912.35 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献15

1JIN Zhaozhang,WANG Deliang.Reverberant speech segregation based on multipitch tracking and classification[J].IEEE Trans Audio,Speech,and Language Processing,2011,19(8):2328-2337.
2ZHANG Xiaojia,SHAO Yang,WANG Deliang.CASA-based robust speaker identification[J].IEEE Trans Audio,Speech,and Language Processing,2012,20(5):1608-1616.
3CHAO Ling,WANG Deliang,Jang R,et al.A tandem algorithm for singing pitch extraction and voice separation from Music Accompaniment[J].IEEE Trans Audio,Speech,and Language Processing,2012,20(5):1482-1491.
4HU Guoning,WANG Deliang.Auditory segmentation based on onset and offset analysis[J].IEEE Trans Audio,Speech,and Language Processing,2007,15(2):396-405.
5Martin C,Hershey J,Rennie S.Monaural speech separation and recognition challenge[J].Computer Speech and Language,2010,24(1):1-15.
6SueH,Jon B,Grown B.Mask estimation for missing data speech recognition based on statistics of binaural interaction[J].IEEE Trans Audio,Speech,and Language Processing,2006,14(1):58-67.
7胡奎,梁维谦.基于听觉场景分析的近讲语音增强算法[J].清华大学学报（自然科学版）,2011,51(9):1176-1179. 被引量：1
8Nima Y,Philipos L.A dual microphone speech enhancement algorithm based on the coherence function[J].IEEE Trans Audio,Speech,and Language Processing,2012,20(2):599-609.
9FathiK,Mondher F,Mohamed G,et al.Dual-channel spectral subtraction algorithms based speech enhancement dedicated to a bilateral cochlear implant[J].Appl Acoust,2012,73(1):12-20.
10张卫强,刘加.基于听感知特征的语种识别[J].清华大学学报（自然科学版）,2009(1):78-81. 被引量：21

<12 >

二级参考文献26

1Zissman M A. Comparison of four approaches to automatic language identification of telephone speech [J]. IEEE Transactions on Speech and Audio Processing, 1996, 4(1): 31 - 44.
2Li H, Ma B, Lee C H. A vector space modeling approach to spoken language identification [J]. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15(1): 271 - 284.
3Huang X D, Acero A, Hon H W. Spoken Language Processing [M]. Upper Saddle River, NJ: Prentice Hall PTR, 2000.
4Abdulla W H. Auditory based feature vectors for speech recognition systems [J]. Advances in Communications and Software Technologies, 2002: 231- 236.
5Li Q, Soong F, Siohan O. A high-performance auditory feature for robust speeeh recognition [C]//Proe 6th Int Conf on Spoken Language Processing. Beijing: China Military Friendship Publish, 2000, Ⅲ: 51- 54.
6Colombi J M, Anderson T R, Rogers S K. Auditory model representation for speaker recognition [C]//Proc ICASSP. Piscataway, NJ: IEEE Press, 2006, Ⅱ:700-703.
7Glasberg B R, Moore B C. Derivation of auditory filter shapes from notched-noise data [J]. Hearing Research, 1990, 47(1-2): 103-108.
8Slaney M. An efficient implementation of the Patterson-Holdsworth auditory filter bank [R]. Apple Computer Inc, 1993.
9Aertsen A M, Johannesma P I, Hermes D J. Spectro-temporal receptive fields of auditory neurons in the grassfrog [J]. Biological Cybernetics, 1980, 38(4) : 235 - 248.
10Hermansky H, Morgan N. RASTA processing of speech [J]. IEEE Transactions on Speech and Audio Processing, 1994, 2(4); 578-589.