摘要
为提高发音质量评价性能,并压缩声学模型规模以适于嵌入式实现,提出一种英音和美音模型的融合方法。该方法基于声学距离和替代概率将声学模型划分为可替代模型、可融合模型和孤立模型。抛弃可替代模型,保留孤立模型,基于模型插值归并可融合模型。引入最小置信度和最大支持数控制参与归并模型的数目。实验结果表明:融合模型与单口音模型相比,说话人级别的相关性提高了14.1%;融合模型与混合模型的性能相近,G auss ian分量数目压缩了10.7%。本方法在保证发音质量评价性能的条件下,明显压缩了模型数量。
A British and American accents model merging method for embedded applications was developed to improve the performance of pronunciation scoring with small model sizes.In this approach,the acoustic models were classified into replaceable models,merging models,and isolating models,based on the acoustic distance and the rank of the substituting probability.The merging models were merged using model interpolation,the isolating models are kept,and the replaceable models were discarded.Tests show that the speaker level correlation between machine scores and human scores improves about 14.1% using the merged models compared to using the single-accent model and that the number of Gaussian mixtures is reduced 10.7% compared to using the combined models.The model size is dramatically reduced with no performance reduction.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2009年第S1期1344-1348,共5页
Journal of Tsinghua University(Science and Technology)
基金
国家"八六三"高技术研究发展计划重点项目(2008AA010700)
关键词
计算机辅助语言学习
发音质量评价
嵌入式应用
模型融合
computer assisted language learning
pronunciation evaluation
embedded applications
model merging