基于稀疏组LASSO约束的本征音子说话人自适应

Sparse group LASSO constraint eigenphone speaker adaptation method for speech recognition

下载PDF

导出

摘要本征音子说话人自适应方法在自适应数据量不足时会出现严重的过拟合现象,提出了一种基于稀疏组LASSO约束的本征音子说话人自适应算法。首先给出隐马尔可夫—高斯混合模型下本征音子说话人自适应的基本原理;然后将稀疏组LASSO正则化引入到本征音子说话人自适应,通过调整权重因子控制模型的复杂度,并通过一种加速近点梯度的数学优化算法来实现;最后将稀疏组LASSO约束的自适应算法与当前多种正则化约束的自适应方法进行比较。汉语连续语音识别的说话人自适应实验表明,引入稀疏组LASSO约束后,本征音子说话人自适应方法的性能得到了明显提高,且稀疏组LASSO约束方法优于l1、l2和弹性网正则化方法。 Original eigenphone speaker adaptation method performed well when the amount of adaptation data was sufficient. However, it suffered from server overfitting when insufficient amount of adaptation data was provided. A sparse group LASSO（SGL） constraint eigenphone speaker adaptation method was proposed. Firstly, the principle of eigenphone speaker adaptation was introduced in case of hidden Markov model-Gaussian mixture model（HMM-GMM） based speech recognition system. Then, a sparse group LASSO was applied to estimation of the eigenphone matrix. The weight of the SGL norm was adjusted to control the complexity of the adaptation model. Finally, an accelerated proximal gradient method was adopted to solve the mathematic optimization. The method was compared with up-to-date norm algorithms. Experiments on an mandarin Chinese continuous speech recognition task show that, the performance of the SGL constraint eigenphone method can improve remarkably the performance of the system than original eigenphone method, and is also superior to l1-norm, l2-norm and elastic net constraint methods.

作者屈丹张文林

机构地区信息工程大学信息系统工程学院

出处《通信学报》 EI CSCD 北大核心 2015年第9期47-54,共8页 Journal on Communications

基金国家自然科学基金资助项目(61175017 61302107 61403415)~~

关键词说话人自适应本征音子组稀疏约束稀疏组LASSO约束近点梯度法 speaker adaptation eigenphone group sparse constraint sparse group LASSO constraint proximal gradient method

分类号 TN912.34 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献21

1ZHANG W L, ZHANG W Q, LI B C, et al. Bayesian speaker adapta- tion based on a new hierarchical probabilistic model[J]. IEEE Transac- tions on Audio, Speech and Language Processing[J]. 2012, 20(7): 2002-2015.
2SOLOMONOFF A, CAMPBELL W M, BOARDMAN I. Advances in channel compensation[A], for SVM speaker recognition. Proceedings of International Conference on Acoustics, Speech, and Signal Proc- essing(ICASSP)[C]. Philadelphia, USA, 2005.629-632.
3PAVAN KUMAR D S, PRASAD N V, JOSHI V, et al. Modified splice and its extension to non-stereo data for noise robust speech recogni- tion[A]. Proceedings of IEEE Automatic Speech Recognition and Un- derstanding Workshop(ASRU)[C]. Olomouc, Czech Republic, 2013. 174-179.
4HAMIDI S G, RICHARD C R. Two-stage speaker adaptation in sub- space gaussian mixture models[A]. Proceedings of International Con- ference on Acoustics, Speech and Signal Processing(ICASSP)[C]. Florence, Italy, 2014. 6374-6378.
5WANG Y Q, GALE M J F. Tandem system adaptation using multiple linear feature transforms[A]. Proceedings of International Conference on Acoustics, Speech and Signal Processing(ICASSP)[C]. Vancouver, Canada, 2013.7932-7936.
6KENNY P, BOULIANNE G, OUELLETET P, et al. Speaker adapta- tion using an eigenphone basis[J]. IEEE Transaction on Audio, Speech and Language Processing, 2004, 12(6):579-589.
7ZHANG W L, ZHANG W Q, LIB C. Speaker adaptation based on speaker-dependent eigenphone estimation[A]. Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop(ASRU)[C] Hawaii, USA, 2011.48-52.
8LI J, TSAO Y, LEE, C H. Shrinkage model adaptation in automatic speech recognition[A]. Proceedings of Annual Conference on Interna- tional Speech Communication Association(INTERSPEECH)[C]. Ma- kuhari, Chiba, Japan, 2010. 1656-1659.
9OLSEN P A, HUANG J, RENNIE S J, et al. Sparse maximum a pos- teriori adaptation[A]. Proceedings of IEEE Automatic Speech Recog- nition and Understanding Workshop(ASRU)[C]. Hawaii, USA, 2011. 53-58.
10OLSEN P A, HUANG J, RENNIE S J, et al. Affine invariant sparse maximum a posteriori adaptation[A]. Proceedings of International Conference on Audio, Speech and Signal Processing(ICASSP)[C]. Kyoto, Japan, 2012.4317-4320.

二级参考文献2

1张文林,牛铜,张连海,李弼程.基于最大似然可变子空间的快速说话人自适应方法[J].电子与信息学报,2012,34(3):571-575. 被引量：3
2李虎生,刘加,刘润生.语音识别说话人自适应研究现状及发展趋势[J].电子学报,2003,31(1):103-108. 被引量：32

共引文献10

1邹建武,祝明波,高明哲,李相平.用于雷达方位超分辨的范数正则化方法[J].系统工程与电子技术,2014,36(8):1500-1504. 被引量：6
2杨绪魁,屈丹,张文林.基于正则化i-Vector算法的语种识别[J].信息工程大学学报,2015,16(2):191-196.
3屈丹,张文林.基于本征音子说话人子空间的说话人自适应算法[J].电子与信息学报,2015,37(6):1350-1356. 被引量：4
4黎术,徐中明,贺岩松,张志飞,陈思.基于弹性网正则化的广义逆波束形成[J].仪器仪表学报,2015,36(5):1170-1176. 被引量：5
5刘建航,杨喜鹏,李世宝,陈海华,黄庭培.干扰空间投影在本征音说话人自适应中的应用[J].计算机应用与软件,2017,34(11):188-191.
6徐必伟,苏成利,杨微,曹江涛.基于DTW和EMD的孤立词语音识别研究[J].辽宁石油化工大学学报,2018,38(1):74-78. 被引量：2
7王洪,牛晓灵.基于l_2正则化回声状态网络的模拟电路故障诊断[J].电子器件,2017,40(5):1283-1286. 被引量：6
8潘荔霞,徐文彬,李世宝,杨喜鹏.基于声纹识别的研讨型智慧教室构建[J].实验技术与管理,2018,35(7):245-250. 被引量：5
9许美玲,邢通,韩敏.基于时空Kriging方法的时空数据插值研究[J].自动化学报,2020,46(8):1681-1688. 被引量：4
10徐中明,李怡,张志飞,贺岩松.弹性网正则化广义逆波束形成算法改进[J].仪器仪表学报,2021,42(6):243-252. 被引量：2

1屈丹,张文林.基于本征音子说话人子空间的说话人自适应算法[J].电子与信息学报,2015,37(6):1350-1356. 被引量：4
2张文林,张连海,陈琦,李弼程.语音识别中基于低秩约束的本征音子说话人自适应方法[J].电子与信息学报,2014,36(4):981-987. 被引量：3
3王晶莹,王作英.一种利用主曲线的说话人自适应方法[J].高技术通讯,2007,17(5):470-473.
4赵力,刘怡龙,邹采荣,高西奇,吴镇扬.基于VQ-HMM的无教师说话人自适应方法[J].东南大学学报（自然科学版）,2001,31(2):23-26. 被引量：1
5马鹏,杨星,张剑云,李小波.基于Group lasso的分布式MIMO雷达参数估计与能量优化[J].信号处理,2012,28(5):729-736. 被引量：1
6向寅,张冰尘,洪文.基于Lasso的稀疏微波成像分块成像原理与方法研究(英文)[J].雷达学报（中英文）,2013,2(3):271-277. 被引量：1
7王超宇,朱晓华,李洪涛,顾陈.一种鲁棒的压缩感知高分辨率DOA估计方法[J].宇航学报,2014,35(5):590-596. 被引量：4
8颜尧平,卢朝阳,吴成柯.基于仿射变换的运动补偿[J].中国图象图形学报（A辑）,1997,2(5):304-309. 被引量：1
9傅志军,李兴仁,洪志良.基于数学优化算法的开关电容滤波器电容自动综合[J].微电子学,1998,28(5):340-344.
10王茂林,黄文明,王菊娇.基于压缩感知的语音信号编码算法[J].桂林电子科技大学学报,2012,32(4):293-297. 被引量：1

通信学报

2015年第9期

浏览历史

内容加载中请稍等...

基于稀疏组LASSO约束的本征音子说话人自适应

参考文献21

二级参考文献2

共引文献10

相关作者

相关机构

相关主题

浏览历史