摘要
该文提出一种基于最大似然可变子空间的说话人自适应方法。在训练阶段,对训练集中的说话人相关模型参数进行主分量分析,得到一组说话人基矢量;在自适应阶段,通过最大似然准则选取与当前说话人相关性最大的基矢量子集,进而将新的说话人相关模型限制在这组基矢量所张成的说话人子空间中,通过求解每一个基矢量对应的系数从而进行说话人自适应。与经典的基于子空间的说话人自适应方法不同,该文中的说话人子空间是在自适应阶段动态选取的,所需要估计的参数更少,在少量自适应数据下可以得到更稳健的自适应结果。在基于微软语料库的连续语音识别自适应实验中,给定极少量自适应数据(小于5 s),在有监督和无监督条件下,该文方法均优于经典的本征音自适应方法和基于最大似然线性回归的方法。
A new rapid speaker adaptation method based on maximum likelihood variable subspace is proposed.A set of bases of the speaker space is obtained by performing Principal Component Analysis(PCA) on the Speaker Dependent(SD) model parameters of the training speakers.Different from conventional subspace based methods,during speaker adaptation,a subset of these bases is dynamically chosen for each speaker using maximum likelihood criteria.The new speaker's model is constrained in the subspace spanned by those bases.With less free parameters required,the new method can obtain more robust SD model using very little amount of adaptation data.Speech recognition experiments show that the new method can obtain better performance than the eigenvoice method and MLLR method,both in supervised mode and in unsupervised mode.
出处
《电子与信息学报》
EI
CSCD
北大核心
2012年第3期571-575,共5页
Journal of Electronics & Information Technology
基金
国家自然科学基金(60872142)资助课题
关键词
连续语音识别
说话人自适应
本征音
子空间方法
Continuous speech recognition
Speaker adaptation
Eigenvoice
Subspace method