摘要
针对现有子空间自适应方法无法确定最佳说话人子空间的问题,提出一种基于匹配追踪的说话人自适应方法。将说话人自适应视为一种高维信号的稀疏分解问题,利用本征音和参考说话人超矢量的各自优势联合构造说话人字典;依据匹配追踪原理,通过迭代优化,以后验方式确定最佳说话人子空间维数及其基矢量。引入冗余基矢量检测与去除机制以保证算法的稳定性,并通过快速递推算法得到新说话人坐标。基于汉语连续语音识别的有监督说话人自适应实验结果表明,与本征音及参考说话人加权方法相比,平均有调音节正识率相对提高了1.9%。
Current speaker subspace based adaptation method cannot obtain the best speaker subspace. A speaker adaptation method based on matching pursuit was proposed to adress this problem. Speaker adaptation was viewed as the sparse decomposition of a high dimensional speaker supervector with an over-complete dictionary, which was constructed by eigenvoices and reference speaker supervectors. Through an efficient iteratively optimization process, the best speaker dependent subspace was determined in a maximum a posterior way. A redundant bases removing mechanism was introduced to ensure the numeric stability and new speaker's coordinate was obtained through a fast recurrence algorithm. Superised speaker adaptation on a Chinese continuous speech recognition system show that compared with the eigenvoice and reference speaker weighting methods, the recognition accuracy was improved by relatively 1.9%
出处
《声学学报》
EI
CSCD
北大核心
2014年第4期523-530,共8页
Acta Acustica
基金
国家自然科学基金(61175017)
国家高技术研究发展计划(863)(2012AA011603)资助