基于三对角和共享分块对角转换矩阵的快速说话人自适应方法

Fast Speaker Adaptation Based on Triple Diagonal Transform Matrices and Shared Block Matrices

下载PDF

导出

摘要本文提出了两种在最大似然线性回归 (MLLR)框架下实现快速说话人自适应的方法 .这两种方法在本文中分别称为Log 谱域下基于三对角转换矩阵的说话人自适应 (SATD)和倒谱域下基于共享分块对角转换矩孟加拉国说话人自适应 (SASBD) .这两种方法在一定先验知识的基础上采用较少的参数来描述说话人间的差异 ,因而只需要少量的自适应数据就可以得到参数的鲁棒估计 .在以整词建模的孤立词识别系统和以三音子建模的孤立词识别系统上分别进行的测试表明所提出的方法相对传统的MLLR自适应方法有较快的自适应性能 . In the Maximum Likelihood Linear Regression (MLLR) framework, this paper proposes two fast speaker adaptation approaches, which are called Speaker Adaptation using Triple Diagonal matrices in the log-spectral domain (SATD) and Speaker Adaptation using Shared Block Diagonal matrices (SASBD) in the cepstral domain, respectively. Based on some prior knowledge, the proposed approaches utilize fewer parameters to describe the variation between speakers, and thus fewer adaptation data are needed to give robust estimation. Experimental results in both the whole-word-modeled isolated word recognition system and the isolated word recognition system using triphones as modeling units show that the proposed approaches can provide faster performance than the traditional MLLR approaches.

作者丁国宏徐波

机构地区中国科学院自动化研究所高技术创新中心

出处《电子学报》 EI CAS CSCD 北大核心 2004年第10期1709-1712,1719,共5页 Acta Electronica Sinica

关键词快速自适应转换矩阵 MLLR 三对角矩阵分块对角矩阵 Calculations Mathematical models Matrix algebra Maximum likelihood estimation Regression analysis Word processing

分类号 TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献7

1Ding G -H,et al.Implementing vocal length normalization in the MLLR framework[A].Proceedings of International Conference on Spoken Language Processing[C].Denver:Causal Productions,2002.1389-1392.
2Ding G -H,et al.Transform-based fast speaker adaptation using triple diagonal and shared block diagonal matrices[A].Proceedings of International Conference on Acoustics,Speech and Signal Processing[C].Hong Kong:IEEE Signal Processing Society,2003,1.300-30
3Lee L,et al.A frequency warping approach to speaker normalization[J].IEEE Transactions on Speech and Audio Processing,1998,6(1):49-60.
4Gales M J F,et al.Mean and variance adaptation within the MLLR framework[J].Computer Speech and Language,1996,10(4):249-264.
5Digalakis V V,et al.Speaker adaptation using constrained estimation of Gaussian mixtures[J].IEEE transactions on speech and audio processing,1995,3(5):357-366.
6Huang C,et al.Speaker selection training for large vocabulary continuous speech recognition[A].Proceedings of International Conference on Acoustics,Speech and Signal Processing[C].Orlando:IEEE Signal Processing Society,2002,1.609-672.
7Chen K T,et al.Fast speaker adaptation using eigenspace-based maximum likelihood linear regression[A].Proceedings of International Conference on Spoken Language Processing[C].Beijing:China Military Friendship Publish,2000,3.742-745.

1丰洪才,卢正鼎.基于MAP和MLLR的综合渐进自适应方法研究[J].计算机工程,2005,31(5):4-7. 被引量：3
2晁浩,杨占磊,刘文举.基于最大似然线性回归的随机段模型说话人自适应研究[J].计算机工程与科学,2014,36(8):1604-1608.
3钱洪伟,贺苏宁.说话人模型参数自适应技术研究[J].电信技术研究,2008(5):16-22.
4徐向华,朱杰,郭强.决策树结构对说话人自适应影响的研究[J].声学学报,2006,31(1):42-47. 被引量：3
5周宇,陈熙霖,赵德斌,姚鸿勋,高文.基于数据生成的手语识别自适应方法[J].高技术通讯,2009,19(12):1258-1264.
6罗骏,欧智坚,王作英.说话人自适应训练方法在连续语音识别中的应用[J].中文信息学报,2004,18(3):61-65. 被引量：1
7张文林,张卫强,刘加,李弼程,屈丹.一种新的基于子空间的说话人自适应方法[J].自动化学报,2011,37(12):1495-1502. 被引量：3
8努尔麦麦提.尤鲁瓦斯,张力文,吾守尔.斯拉木.说话人自适应技术在维吾尔语语音识别中的应用研究[J].中文信息学报,2016,30(3):79-84. 被引量：4
9玄兆鹏,张莉,付晓林,郭希娟.三对角矩阵的行列式的并行计算方法[J].计算机工程与应用,2004,40(20):64-66. 被引量：1
10余姗姗,张亚琼.语音识别的自适应研究[J].福建电脑,2011,27(6):53-54.

电子学报

2004年第10期

浏览历史

内容加载中请稍等...

基于三对角和共享分块对角转换矩阵的快速说话人自适应方法

参考文献7

相关作者

相关机构

相关主题

浏览历史