摘要
本文提出了两种在最大似然线性回归 (MLLR)框架下实现快速说话人自适应的方法 .这两种方法在本文中分别称为Log 谱域下基于三对角转换矩阵的说话人自适应 (SATD)和倒谱域下基于共享分块对角转换矩孟加拉国说话人自适应 (SASBD) .这两种方法在一定先验知识的基础上采用较少的参数来描述说话人间的差异 ,因而只需要少量的自适应数据就可以得到参数的鲁棒估计 .在以整词建模的孤立词识别系统和以三音子建模的孤立词识别系统上分别进行的测试表明所提出的方法相对传统的MLLR自适应方法有较快的自适应性能 .
In the Maximum Likelihood Linear Regression (MLLR) framework, this paper proposes two fast speaker adaptation approaches, which are called Speaker Adaptation using Triple Diagonal matrices in the log-spectral domain (SATD) and Speaker Adaptation using Shared Block Diagonal matrices (SASBD) in the cepstral domain, respectively. Based on some prior knowledge, the proposed approaches utilize fewer parameters to describe the variation between speakers, and thus fewer adaptation data are needed to give robust estimation. Experimental results in both the whole-word-modeled isolated word recognition system and the isolated word recognition system using triphones as modeling units show that the proposed approaches can provide faster performance than the traditional MLLR approaches.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2004年第10期1709-1712,1719,共5页
Acta Electronica Sinica
关键词
快速自适应
转换矩阵
MLLR
三对角矩阵
分块对角矩阵
Calculations
Mathematical models
Matrix algebra
Maximum likelihood estimation
Regression analysis
Word processing