噪声环境下基于单高斯模型的声道归一化研究

The Study of Vocal Tract Length Normalization based on Single Mixture in Noisy Environment

下载PDF

导出

摘要声道归一化是语音识别中说话人自适应的方法之一,在噪声环境下对其进行了研究并做了一系列的实验。在实现过程中,首次在噪声环境下采用了基于单高斯混合模型选择弯折因子的方法,并取得了良好的结果。实验基于AURORA语音数据库,并用其所带的汽车噪声环境下的测试集对模型进行了识别验证。实验结果表明,采用声道归一化后的识别结果在各个噪声下均比原来有不同程度的改善,迭代训练能改进单轮声道归一化的结果,最佳结果出现在迭代训练的第三轮。噪声环境下基于一个高斯混合模型选择的弯折因子相比其他高斯混合模型选择的弯折因子,句子平均识别率提高了近1.68%。经过声道归一化后的性别独立模型的识别结果能接近于未经声道归一化后的性别依赖模型的识别结果,如果训练数据充分,声道归一化后的性别独立模型的识别结果能更好。 Vocal tract length normalization is one of speaker adaptation in speech recognition.In this paper,we focus on the study of it and do a series of experiments.In its realization,we firstly adopt the means on scale factor which is based on single mixture in noisy environment and reach the better result.The experiments are based on AURORA speech database.We recognize the models using the test set in noisy car environment which is included in AURORA speech database.The results show that in various noise the recognized results of the VTLN are better than those of no VTLN.Iterative training can improve the performance of single turn VTLN and the optimal result is in third turns.In noisy environment,the average sentence correction based on the scale factor of single mixture is improved more 1.68 percent than that of the other mixtures. The gender independent performance of no VTLN is close to the gender dependent performance of VTLN.If the training data is sufficent,the gender independent performance of VTLN is better.

作者张文明张向东张兴敢候震

机构地区南京大学电子系美国富迪科技(南京)有限公司

出处《微处理机》 2006年第5期102-105,共4页 Microprocessors

关键词声道归一化语音识别说话人自适应 Vocal tract Length normalization Speech recognition Speaker adaptation

分类号 TP302 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献9

1[1]SYoung et al.The HTK book" Cambridge University Engineering Department[DB/OL].2002:62-63.
2[2]H G Hirsch & D Pearce.The AURORA Experimental Framework for the Performance Evaluations of Speech Recognition Systems under Noisy Conditions[DB/OL].ISCA ITRW ASR2000 "Automatic Speech Recognition:Challenges for the Next Millennium"; Paris,2000:18-20.
3[3]L Welling,S.Kanthak,H.Ney.Improved Methods for Vocal Tract Normalization[DB/OL].Proc.IEEE International Conference on Acoustics,Speech and Signal Processing,Phoenix,Arizona,USA,1999:761-764.
4[4]L Welling,R.Haeb-Umbach,X.Aubert,N.Haberland.A Study on Speaker Normalization using Vocal Tract Normalization and Speaker Adaptive Training[DB/OL].Proc.IEEE International Conference on Acoustics,Speech and Signal Processing,Seattle,USA,1998(5):797-800.
5[5]M Pitz,H.Ney.Vocal Tract Normalization as Linear Transformation of MFCC[DB/OL].In Proc.European Conference on Speech Communication and Technology,Geneva,Switzerland,2003(9):1445-1448.
6[6]A.Acero and X.Huang.Speaker and Gender Normalization for Continuous-Density Hidden Markov Models[DB/OL].in Proc.of the Int.Conf.on Acoustics,Speech,and Signal Processing.Atlanta,1996.
7[7]LWelling,H Ney,S Kanthak.Speaker Adaptive Modeling by Vocal Tract Normalization[DB/OL].In IEEE Transactions on Speech and Audio Processing,2002;10(6):415-426.
8[8]C J Legetter,P C Woodland.Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models[DB/OL].Computer Speech & Language,1995;(9):171-185.
9杨行俊迟惠生.语音信号数字处理[M].北京:电子工业出版社,1995,8..

共引文献22

1卢小春,胡维平,王修信,梁冬冬.基于TIVC5410 DSP的数字语音识别实时系统[J].现代电子技术,2004,27(13):55-57.
2吕苗荣,古德生.采用最优频率匹配法实现信号的频段滤波处理[J].物探化探计算技术,2005,27(2):166-170. 被引量：3
3梁雪松.HMM关键词检出系统专[J].电信快报,2005(6):26-28.
4刘波,庞俊,段瑞峰.基于DSP的语音降噪实时实现[J].电子工程师,2005,31(11):39-40.
5孙宏斌,杨艺山.基于G.729协议的算法优化及仿真实现研究[J].计算机仿真,2006,23(3):88-91. 被引量：3
6马道钧,李鹏,余菲.基音检测中帧长选择的分析[J].北京电子科技学院学报,2006,14(4):41-44. 被引量：1
7李昕,陈健.一种多媒体通信语音编码器算法及其实时实现[J].电子技术应用,1999,25(5):41-44. 被引量：1
8雷传华,张秀彬,孙济宇.连接数字语音识别系统的DSP实时实现[J].上海交通大学学报,1999,33(12):1525-1528. 被引量：4
9李镐炜,黄芝平,王跃科.CELP语音编码与TMS320C54x[J].电声技术,1999,23(12):3-6. 被引量：3
10张屺,罗诗途,刘国福.动态降噪原理及其在语音辨识系统中的应用[J].电声技术,2000,24(7):21-23. 被引量：4

1秦川.基于扩展K均值算法的入侵检测模型[J].电脑知识与技术（过刊）,2013,19(5X):3267-3269.
2杨敏,赖惠成,董九玲.基于改进HMM-RVM混合模型的人脸识别方法研究[J].激光杂志,2015,36(11):44-47. 被引量：3
3樊慧丽,杨亚萍.基于遗传算法的支持向量机人脸识别技术[J].浙江万里学院学报,2006,19(5):30-32. 被引量：2
4王正洪,邹凌.基于主成分分析方法的人脸识别研究[J].微计算机信息,2007,23(28):235-237. 被引量：10
5丰洪才,卢正鼎.基于置信度的无监督说话人自适应语音识别[J].计算机工程与科学,2005,27(9):93-96. 被引量：1
6徐向华,朱杰,郭强.决策树结构对说话人自适应影响的研究[J].声学学报,2006,31(1):42-47. 被引量：3
7肖述才,欧智坚,王作英.语音识别中的一种说话人聚类算法[J].中文信息学报,2005,19(4):84-88. 被引量：4
8王晶莹,王作英.利用隐空间投影算法的模型自适应方法[J].清华大学学报（自然科学版）,2007,47(7):1159-1161.
9屈丹,杨绪魁,张文林.特征空间本征音说话人自适应[J].自动化学报,2015,41(7):1244-1252. 被引量：4
10王炜,吕萍,颜永红.一种改进的基于层次聚类的说话人自动聚类算法[J].声学学报,2008,33(1):9-14. 被引量：4

微处理机

2006年第5期

浏览历史

内容加载中请稍等...

噪声环境下基于单高斯模型的声道归一化研究

参考文献9

共引文献22

相关作者

相关机构

相关主题

浏览历史