期刊文献+

匹配追踪说话人自适应方法

Speaker adaptation using matching pursuit
下载PDF
导出
摘要 针对现有子空间自适应方法无法确定最佳说话人子空间的问题,提出一种基于匹配追踪的说话人自适应方法。将说话人自适应视为一种高维信号的稀疏分解问题,利用本征音和参考说话人超矢量的各自优势联合构造说话人字典;依据匹配追踪原理,通过迭代优化,以后验方式确定最佳说话人子空间维数及其基矢量。引入冗余基矢量检测与去除机制以保证算法的稳定性,并通过快速递推算法得到新说话人坐标。基于汉语连续语音识别的有监督说话人自适应实验结果表明,与本征音及参考说话人加权方法相比,平均有调音节正识率相对提高了1.9%。 Current speaker subspace based adaptation method cannot obtain the best speaker subspace. A speaker adaptation method based on matching pursuit was proposed to adress this problem. Speaker adaptation was viewed as the sparse decomposition of a high dimensional speaker supervector with an over-complete dictionary, which was constructed by eigenvoices and reference speaker supervectors. Through an efficient iteratively optimization process, the best speaker dependent subspace was determined in a maximum a posterior way. A redundant bases removing mechanism was introduced to ensure the numeric stability and new speaker's coordinate was obtained through a fast recurrence algorithm. Superised speaker adaptation on a Chinese continuous speech recognition system show that compared with the eigenvoice and reference speaker weighting methods, the recognition accuracy was improved by relatively 1.9%
出处 《声学学报》 EI CSCD 北大核心 2014年第4期523-530,共8页 Acta Acustica
基金 国家自然科学基金(61175017) 国家高技术研究发展计划(863)(2012AA011603)资助
  • 相关文献

参考文献14

  • 1李虎生,刘加,刘润生.语音识别说话人自适应研究现状及发展趋势[J].电子学报,2003,31(1):103-108. 被引量:32
  • 2Kuhn R, Junqua J C, Nguyen P, Niedzielski N. Rapid speaker adaptation in eigenvoice space. IEEE Transactions on Speech and Audio Processing, 2000, 8(6): 695-707.
  • 3Kenny P, Boulianne G, Dumouchel P. Eigenvoice modeling with sparse training data. IEEE transactions on speech and audio processing, 2005, 13(3): 345-354.
  • 4Mak B, Lai T C, Hsia R. Improving reference speaker weighting adaptation by the use of maximum likelihood reference speakers. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2006:229-232.
  • 5Teng W X, Gravier G, Bimbot F, Souffiet F. Speaker adap- tation by variable reference model subspace and application to large vocabulary speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Taipei, Taiwan: IEEE, 2009: 4381 4384.
  • 6Lu L, Choshal A, Renals S. Regularized subspace Gaus- sian mixture models for speech recognition. IEEE Signal Processing Letters, 2011, 18(7): 419-422.
  • 7Sainath T N, Nahamoo D, Ramabhadran B et al. Exemplar-based sparse representation phone identification features. In: Proceedings of IEEE International Confer- ence on Acoustics, Speech and Signal Processing. Prague, Czech: IEEE, 2011:4492-4495.
  • 8Saeb A, Razzazi F. A fast compressive sensing approach for phoneme classification. In: Proceedings of IEEE Inter- national Conference on Acoustics, Speech and Signal Pro- cessing. Kyoto, Japan: IEEE, 2012:4281-4284.
  • 9He Y, Han J, Deng Set al. A solution to residual noise in speech denoising with sparse representation. In: Pro- ceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan: IEEE, 2012:4653-4656.
  • 10Mallat S G, Zhang Z. Matching pursuits with time- frequency dictionaries. IEEE Transactions on Signal Pro- cessing, 1993, 41(12): 3397-3415.

二级参考文献2

共引文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部