Maximum likelihood polynomial regression for robust speech recognition

Maximum likelihood polynomial regression for robust speech recognition

导出

摘要 The linear hypothesis is the main disadvantage of maximum likelihood linear re- gression （MLLR）. This paper applies the polynomial regression method to model adaptation and establishes a nonlinear model adaptation algorithm using maximum likelihood polynomial regression （MLPR） for robust speech recognition. In this algorithm, the nonlinear relationship between training and testing Gaussian means in every Mel channel is approximated by a set of polynomials and the polynomial coefficients are estimated from adaptation data in test envi- ronment using the expectation- maximization （EM） algorithm and maximum likelihood （ML） criterion. The experimental results show that the second-order polynomial can approximate the actual nonlinear function better and in noise compensation and speaker adaptation, the word error rates of MLPR are significantly lower than those of MLLR. The proposed MLPR algorithm overcomes the limitation of linear hypothesis well and can decrease the impact of noise, speaker and other factors simultaneously. It is especially suitable for joint adaptation of speaker and noise. The linear hypothesis is the main disadvantage of maximum likelihood linear re- gression （MLLR）. This paper applies the polynomial regression method to model adaptation and establishes a nonlinear model adaptation algorithm using maximum likelihood polynomial regression （MLPR） for robust speech recognition. In this algorithm, the nonlinear relationship between training and testing Gaussian means in every Mel channel is approximated by a set of polynomials and the polynomial coefficients are estimated from adaptation data in test envi- ronment using the expectation- maximization （EM） algorithm and maximum likelihood （ML） criterion. The experimental results show that the second-order polynomial can approximate the actual nonlinear function better and in noise compensation and speaker adaptation, the word error rates of MLPR are significantly lower than those of MLLR. The proposed MLPR algorithm overcomes the limitation of linear hypothesis well and can decrease the impact of noise, speaker and other factors simultaneously. It is especially suitable for joint adaptation of speaker and noise.

作者 LU Yong WU Zhenyang

机构地区 School of Information Science and Engineering

出处《Chinese Journal of Acoustics》 2011年第3期358-370,共13页 声学学报（英文版）

基金 supported by the 973 Program of China(2002CB312102) the National Natural Science Foundation of China(60672094)

分类号 TP391.41 [自动化与计算机技术—计算机应用技术] O212.1 [理学—概率论与数理统计]

引文网络
相关文献

参考文献4

1赵蕤,王作英.语音识别中信道和噪音的联合补偿[J].声学学报,2006,31(5):466-470. 被引量：11
2刘海滨,吴镇扬,赵力,曾毓敏.基于动态单边自相关序列和频率规整线性预测的抗噪声语音识别[J].声学学报,2004,29(2):182-186. 被引量：5
3刘海滨,吴镇扬,赵力,曾毓敏.噪声环境下基于最大后验非线性变换的隐马尔可夫模型自适应算法[J].声学学报,2004,29(5):467-471. 被引量：4
4孙暐,吴镇扬,刘海滨.非线性统计匹配用于子带鲁棒语音识别[J].电子与信息学报,2006,28(3):480-484. 被引量：4

二级参考文献36

1国立新,莫福源,李昌立.基于连续高斯混合密度HMM的汉语全音节语音识别研究[J].声学学报,1995,20(5):321-329. 被引量：11
2孙暐,吴镇扬,刘海滨,周琳.并行子带HMM最大后验概率自适应非线性类估计算法[J].电路与系统学报,2005,10(6):20-24. 被引量：1
3Ivandro Sanches. Noise-compensated hidden Markov models. IEEE Trans on Speech and Audio Processing, 2000;8(5): 533-540.
4Hwang T H, Lee L M, Wang H C. Cepstral behavior due to additive noise and a compensation scheme for noisy speech recognition. IEEProc of Vis Image Signal Process, 1998;145(5): 316-321.
5Mansour D, Juang B H. The short-time modified coherence representation and its application for noisy speech recognition. IEEE Trans Acoust , Speech, Signal Processing,1980; 28(4): 357-366.
6Javier Hernando, Climent Nadeu. Linear prediction of the one-sided autocorrelation sequence for noisy speech recognition. IEEE Transactions on Speech and Audio Processing, 1997; 5(1): 80-84.
7Davis S B, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentence. IEEE Trans Acoust , Speech,Signal Processing, 1989; 37(6): 795-804.
8Yoon Kim, Smith J O. A speech feature based on bark frequency warping-the non-uniform linear prediction cepstrum. Proc of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New York, 1999(10):17-20.
9Rabiner L. Fundamentals of speech recognition. Prentice Hall, 1993.
10Smith J O, Abel J S. Bark and ERB bilineax transform. IEEE Trans on Speech and Audio Processing, 1999; 7(6):697-708.

共引文献14

1彭圆,王晟,王科俊,李雪耀,林良骥,林正青,王建文.感知线性预测在水下目标分类中的应用研究[J].声学学报,2006,31(2):146-150. 被引量：16
2王欢良,钱瑶,F.K.Soong,韩纪庆.基于声调建模的带噪汉语数字串语音识别[J].声学学报,2007,32(5):454-460. 被引量：2
3马会丽,唐红,赵国锋.电话外呼系统的研究与实现[J].计算机应用,2007,27(9):2343-2345. 被引量：5
4张军,韦岗,余华.基于特征分量输出概率加权的多数据流鲁棒语音识别方法[J].声学学报,2008,33(2):102-108. 被引量：2
5宁更新,韦岗.一种用于抗噪语音识别的动态参数补偿新方法[J].电路与系统学报,2008,13(2):14-19.
6王智国,吴及,戴礼荣,王仁华.一种对加性噪声和信道函数联合补偿的模型估计方法[J].声学学报,2008,33(3):238-243. 被引量：5
7赵忠彪,李文鑫,高荣.基于神经网络的矢量量化算法在语音辨识系统中的应用研究[J].河南科学,2008,26(7):839-841. 被引量：1
8曾毓敏,吴镇扬.基于浊音语音谐波谱子带加权重建的抗噪声说话人识别[J].东南大学学报（自然科学版）,2008,38(6):935-941. 被引量：5
9张岩,李风华,李整林,张仁和.爆炸信号中气泡脉动去除方法及其应用[J].声学学报,2009,34(2):124-130. 被引量：5
10ZHANG Jun WEI Gang YU Hua NING Genxin.Robust multi-stream speech recognition based on weighting the output probabilities of feature components[J].Chinese Journal of Acoustics,2009,28(3):269-279. 被引量：4

1强波,王正志,倪青山.面向调控网络参数学习的无迹粒子滤波算法[J].计算机工程与应用,2011,47(9):146-148.
2彭海军,高强,吴志刚,钟万勰.Symplectic multi-level method for solving nonlinear optimal control problem[J].Applied Mathematics and Mechanics(English Edition),2010,31(10):1251-1260.
3丰洪才,卢正鼎.基于MAP和MLLR的综合渐进自适应方法研究[J].计算机工程,2005,31(5):4-7. 被引量：3
4邹腊梅,肖基毅,龚向坚.基于Maximum Likelihood与HMM的文本挖掘[J].计算机技术与发展,2007,17(12):110-112. 被引量：1
5高强,谭述君,张洪武,钟万勰.基于对偶变量变分原理和两端动量独立变量的保辛方法[J].动力学与控制学报,2009,7(2):97-103. 被引量：6
6高强,谭述君,张洪武,林家浩,钟万勰.基于对偶变量变分原理和两端位移独立变量的保辛方法[J].计算力学学报,2010,27(5):745-751. 被引量：3
7蒋泰,张林军.语音识别自适应算法在智能家居中的应用[J].计算机系统应用,2017,26(3):150-155. 被引量：3
8CHENG Ning,LIU XunYing,WANG Lan.A flexible framework for HMM based noise robust speech recognition using generalized parametric space polynomial regression[J].Science China(Information Sciences),2011,54(12):2481-2491.
9钟山,刘加.MLLR特征的SVM语种识别算法[J].清华大学学报（自然科学版）,2009(S1):1283-1287.
10陈存宝,赵力,邹采荣.基于极大似然线性回归的模型合成和特征映射进行说话人确认[J].声学学报,2011,36(1):81-87. 被引量：2

Chinese Journal of Acoustics

2011年第3期

浏览历史

内容加载中请稍等...

Maximum likelihood polynomial regression for robust speech recognition

参考文献4

二级参考文献36

共引文献14

相关作者

相关机构

相关主题

浏览历史