基于模型自适应的声效鲁棒性语音识别算法被引量：1

Vocal effort related robust speech recognition based on adaptation method

下载PDF

导出

摘要针对声音效果变化引起的语音声学特性的改变,提出基于声学模型自适应的方法。分析了正常模式下训练的声学模型在识别其他声效模式下语音的表现;根据随机段模型的模型特性,将最大似然线性回归方法引入到随机段模型系统中,并利用自适应后的声学模型来识别对应的声效模式下的语音。在"863-test"测试集上进行的汉语连续语音识别实验显示,正常模式下训练的声学模型识别其他四种声效模式下的语音时,识别精度均有较大程度的下降;而自适应后的系统在识别对应的声效模式的语音时,识别精度有了明显的改观。表明了基于声学模型自适应的方法在解决语音识别中声音效果变化问题上的有效性。 Adaptation of acoustic models is presented to cope with the acoustic variability caused vocal effort variability in Mandarin speech recognition. Acoustic models trained on normal speech are applied to recognize sentences under the remaining four vocal effort modes. The maximum likelihood linear regression adaptation method is extended to the stochastic segment model, and the acoustic models after adaptation are used to recognize speech of corresponding vocal effort mode. Experiments conducted on ＂863-test＂ show that there is significant decrease in recognition accuracy in case of mismatched speech models, and the recognition performance can be improved considerably by adaptation. This proves that adaptation of acoustic models is effective in solving the acoustic variability caused vocal effort.

作者晁浩宋成薛霄刘志中

机构地区河南理工大学计算机科学与技术学院

出处《计算机工程与应用》 CSCD 北大核心 2016年第2期156-160,204,共6页 Computer Engineering and Applications

基金国家自然科学基金(No.61175066 No.61300124) 河南省基础与前沿技术研究计划资助项目(No.132300410332)

关键词语音识别声音效果自适应最大似然线性回归 speech recognition vocal effort adaptation maximum likelihood linear regression

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献14

1Traunmuller H,Eriksson.A.Acoustic effects of variation in vocal effort by men,women,and children[J].Journal of the Acoustical Society of America,2000,107(6):3438-3451.
2Petr Z,Milan S,Jiri S.Impact of vocal effort variability on automatic speech recognition[J].Speech Communication,2012,54(6):732-742.
3Zhang C,Hansen J H L.Analysis and classification of speech mode:whispered through shouted[C]//Proceedings of 8th Annual Conference of the International Speech Communication Association,Antwerp,2007:2289-2292.
4Lu Y,Cooke M.The contribution of changes in F0 and spectral tilt to increased intelligibility of speech produced in noise[J].Speech Communication,2009,51(12):1253-1262.
5Bou-Ghazale S E,Hansen J H L.HHM-based stressed speech modeling with application to improved synthesis and recognition of isolated speech under stress[J].IEEE Trans on Speech Audio Process,1998,6(3):201-216.
6Ternstrom S,Bohman M,Sodersten M.Loud speech over noise:some spectral attributes with gender differences[J].Journal of the Acoustical Society of America,2006,119(3):1648-1665.
7Brungart D S,Scott K R,Simpson B D.The influence of vocal effort on human speaker identification[C]//Proceedings of 7th European Conference on Speech Communication and Technology,Aalborg,2001:747-750.
8Ito T,Takeda K,Itakura F.Analysis and recognition of whispered speech[J].Speech Communication,2007,45(2):139-152.
9Meyer B T,Kollmeier B.Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition[J].Speech Communication,2011,53(5):753-767.
10Kimball O,Ostendorf M,Bechwati I.Context modeling with the stochastic segment model[J].IEEE Trans on Signal Processing,1992,40(6):1584-1587.

二级参考文献17

1Dugakakis V.V,Ostendorf M,Rohlicek J.R..Fast algorithms for phone classification and recognition using segment-based models.IEEE Transactions Speech Audio Processing,1992,40(12):2885～2896
2Lee C,Glass R..Real-time probabilistic segmentation for segment-based speech recognition.In:Proceedings of the International Conference on Spoken Language Processing,Sydney,Australia,1998,1803～1806
3Ostendorf M,Roukos S..A stochastic segment model for phoneme based continuous speech recognition.IEEE Transactions on Acoustics,Speech and Signal Processing,1989,37(12):1857～ 1869
4Gish H,Ng K,Rohlicek R..Secondary processing using speech segments for an HMM word spotting system.In:Proceedings of the International Conference on Spoken Language Processing,Alberta,Canada,1992,1:17～20
5Rueber B..Obtaining confidence measures from sentence probabilities.In:Proceedings of the 5th European Conference on Speech Communication and Technology,Rhodes,Greece,2001,739～742
6Tan B.T,Gu Y,Thomas T..Word level confidence measures using N-Best sub-hypotheses likelihood ratio.In:Proceedings of the 7th European Conference on Speech Communication and Technology,Aalborg,Denmark,2001,2565～2568
7Falavigna D,Gretter R,Riccardi G..Acoustic and word lattice based algorithms for confidence scores.In:Proceedings of the International Conference on Spoken Language Processing,Denver,USA,2002,1621～1624
8Mangu L,Brill E,Stolcke A..Finding consensus among words:Lattice-based word error minimization.In:Proceedings of the 6th European Conference on Speech Communication and Technology,Budapest,1999,495～498
9Rabiner L.R,Wilpon J.G,Soong F.K..High performance connected digit recognition,using hidden Markov models.IEEE Transactions on Acoustics,Speech and Signal Processing,1989,37(8):1214～1225
10Deng Y.G,Huang T.Y,Xu B..Towards high performance continuous Mandarin digit string recognition.In:Proceedings of the International Conference on Spoken Language Processing,Beijing,2000,3:642～645

共引文献11

1李生,赵铁军.Chinese Information Processing and Its Prospects[J].Journal of Computer Science & Technology,2006,21(5):838-846. 被引量：1
2袁里驰.基于改进的隐马尔科夫模型的语音识别方法[J].中南大学学报（自然科学版）,2008,39(6):1303-1308. 被引量：19
3袁里驰.Improved hidden Markov model for speech recognition and POS tagging[J].Journal of Central South University,2012,19(2):511-516. 被引量：4
4晁浩,杨占磊,刘文举.汉语语音识别中基于音节的声学模型改进算法[J].计算机应用,2013,33(6):1742-1745. 被引量：1
5晁浩,杨占磊,刘文举.汉语语音识别中声学界标点引导的随机段模型解码算法[J].计算机科学,2013,40(10):208-212. 被引量：1
6晁浩,杨占磊,刘文举.基于最大似然线性回归的随机段模型说话人自适应研究[J].计算机工程与科学,2014,36(8):1604-1608.
7晁浩,杨占磊,刘文举.汉语语音识别中融合发音信息的随机段模型研究[J].计算机应用研究,2014,31(11):3365-3368. 被引量：1
8晁浩,宋成,刘志中.语音识别中基于发音特征的声调集成算法[J].计算机工程与应用,2014,50(23):21-25. 被引量：2
9晁浩,刘志中,薛霄.汉语语音识别中融合发音信息的随机段模型研究[J].计算机应用研究,2015,32(4):1087-1090. 被引量：1
10晁浩.融合音素串编辑距离的随机段模型解码算法[J].计算机工程与应用,2015,51(6):208-211.

同被引文献1

1张怡然,白静,王力.基于多窗频谱估计和平滑幅度谱包络的Mel频率倒谱系数(MFCC)改进算法[J].科学技术与工程,2014,22(19):253-256. 被引量：6

引证文献1

1龙乐凯,周萍,杨海燕.基于Gammatone滤波器和子带能量规整的语音特征提取[J].测控技术,2017,36(5):21-24. 被引量：2

二级引证文献2

1刘宇,张聪,杭波,王松,赵涵捷,朱华东.基于信息熵和时间趋势的音频关注区域提取算法研究[J].计算机应用研究,2019,36(12):3652-3656.
2王寅杰,邓艾东,范永胜,占可,高原.基于改进PNCC-SVM的滚动轴承故障声纹识别方法[J].噪声与振动控制,2024,44(3):146-151.

1晁浩,杨占磊,刘文举.基于最大似然线性回归的随机段模型说话人自适应研究[J].计算机工程与科学,2014,36(8):1604-1608.
2张爱英,倪崇嘉.基于音频事件检测和分类的音频监控系统背景模型自适应方法研究[J].计算机科学,2016,43(9):310-314. 被引量：1
3邓侃,欧智坚.深层神经网络语音识别自适应方法研究[J].计算机应用研究,2016,33(7):1966-1970. 被引量：15
4周宇,陈熙霖,赵德斌,姚鸿勋,高文.基于数据生成的手语识别自适应方法[J].高技术通讯,2009,19(12):1258-1264.
5晁浩,杨占磊,刘文举.汉语语音识别中声学界标点引导的随机段模型解码算法[J].计算机科学,2013,40(10):208-212. 被引量：1
6钟山,何亮,邓妍,刘加.基于最大似然线性回归矩阵的说话人识别算法研究[J].自动化学报,2009,35(5):546-550.
7古今,郭立,郑东飞.一种基于感知特性的鲁棒性语音认证算法[J].中国科学院研究生院学报,2009,26(4):474-482.
8邓妍,张卫强,刘加.基于音素解码的语种识别系统联合自适应算法研究[J].自动化学报,2012,38(4):652-658. 被引量：3
9李荟,赵云敏.特征音方法在说话人识别中的应用[J].计算机系统应用,2013,22(8):176-179.
10张君昌,李艳艳.基于小波变换的鲁棒性语音特征提取新方法[J].计算机仿真,2010,27(8):355-358. 被引量：6

计算机工程与应用

2016年第2期

浏览历史

内容加载中请稍等...

基于模型自适应的声效鲁棒性语音识别算法被引量：1

参考文献14

二级参考文献17

共引文献11

同被引文献1

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于模型自适应的声效鲁棒性语音识别算法 被引量：1

参考文献14

二级参考文献17

共引文献11

同被引文献1

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于模型自适应的声效鲁棒性语音识别算法被引量：1