摘要
随着现代社会信息的全球化,双语以及多语混合的语言现象日趋普遍,随之而产生的双语或多语语音识别也成为语音识别研究领域的热门课题。在双语混合语音识别中,主要面临的问题有两个:一是在保证双语识别率的前提下控制系统的复杂度;二是有效处理插入语中原用语引起的非母语口音现象。为了解决双语混合现象以及减少统计建模所需的数据量,通过音素混合聚类方法建立起一个统一的双语识别系统。在聚类算法中,提出了一种新型基于混淆矩阵的两遍音素聚类算法,并将该方法与传统的基于声学似然度准则的聚类方法进行比较;针对双语语音中非母语语音识别性能较低的问题,提出一种新型的双语模型修正算法用于提高非母语语音的识别性能。实验结果表明,通过上述方法建立起来的中英双语语音识别系统在有效控制模型规模的同时,实现了同时对两种语言的识别,且在单语言语音和混合语言语音上的识别性能也能得到有效保证。
In recent years, bilingual communication becomes a common phenomenon as a result of globalization. It presents a new challenge to the real world applications of speech recognition technology. The main difficulties to handle the bilingual speech recognition for real world application are focused on two aspects: the first is to balance the performance on inter- and intra-sentential language switching and to reduce the complexity of the bilingual speech recognition system; the second is to effectively deal with the matrix language accents in embedded language. In order to process the intra-sentential language switching and reduce the amount of data required to robustly estimate statistical models, instead of using two separate monolingual models for each language, a compact single set of bilingual acoustic model derived by phone set merging and clustering is developed. In our study, a novel Two-pass phone clustering method based on Confusion Matrix (TCM) is presented and compared with the log-likelihood measure method. In order to deal with the nonnative accents in the bilingual speech recognition, a novel bilingual model modification approach is presented to improve nonnative speech recognition, considering these great variations of accented pronunciations. Experiments testify that with these proposed methods, the Chinese-English bilingual speech recognition system can handle the bilingual speech recognition effectively and efficiently.
出处
《声学学报》
EI
CSCD
北大核心
2010年第2期270-275,共6页
Acta Acustica
基金
国家高技术研究发展计划(863计划,2006AA010102)
国家科技支撑计划(2008BAI50B00)
国家重点基础研究发展规划项目计划(973计划,2004CB318106)
国家自然科学基金(10874203,60875014,60535030)资助项目