期刊文献+

汉语语音识别中基于区分性权重训练的声调集成方法 被引量:2

Tone model integration based on discriminative weight training for Putonghua speech recognition
下载PDF
导出
摘要 提出一种区分性方法,将声调信息加入大词汇量连续语音识别系统中。该方法根据最小音子错误准则,区分性地圳练模型相关的概率权重。利用这些权重对传统基于传统谱特征的隐马尔可夫模型概率以及声调模型概率进行加权,通过调整模型之间的作用程度提高系统识别率。推导了利用扩展Baum-welch算法的权重更新公式。对不同模型权重组合策略进行了评估,并利用权重之间的平滑方法来克服权重训练过拟合的问题。分别通过大词汇连续语音的带调音节输出和汉字输出两种识别任务来验证区分性模型权重训练的性能。实验结果表明在两种识别任务上,区分性的模型权重较使用全局模型权重分别获得9.5%以及4.7%的相对误识率降低。这表明了区分性模型权重对提高声调集成性能的有效性。 A discriminative framework of tone model integration into continuous speech recognition is proposed. The method uses model dependent weights to scale probabilities of the hidden Markov model based on spectral features and tone models based on tonal features. The weights are discriminatively trained by the minimum phone error criterion and update equation of model weights based on the extended Baum-Welch algorithm is derived. Variant schemes of model weight combination are evaluated and a smoothing technique is introduced to make training robust to over fitting. The proposed method is evaluated on tonal syllable output and character output speech recognition tasks. Experiments results show the proposed method has obtained 9.5% and 4.7% relative error reduction than global weight on the two tasks due to a better interpolation of the given models. This proves the effectiveness of discriminative trained model weights for tone model integration.
作者 黄浩 朱杰
出处 《声学学报》 EI CSCD 北大核心 2008年第1期1-8,共8页 Acta Acustica
关键词 汉语语音识别 权重训练 集成方法 区分性 声调 隐马尔可夫模型 Baum-Welch算法 语音识别系统 Algorithms Hidden Markov models Integration Interpolation
  • 相关文献

参考文献17

  • 1Huang C H, Side F. Pitch tracking and tone features for mandarin speech recognition. Proceedings of the 25th International Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey, 2000; 3:1523-1526
  • 2章文义,朱杰,徐向华.利用声调提高中文连续数字串语音识别系统性能[J].上海交通大学学报,2004,38(2):185-188. 被引量:3
  • 3王欢良,钱瑶,F.K.Soong,韩纪庆.基于声调建模的带噪汉语数字串语音识别[J].声学学报,2007,32(5):454-460. 被引量:2
  • 4Lei X, S M, Hwang M, Ostendorf M et al. Improved tone modeling for mandarin broadcast news speech recognition. In: Proceedings of Interspeech (ICSLP), Pittsburgh, USA, 2006:1277-1280
  • 5Wang H L, Qian Y, Soong F K, Zhou J L et al. Improved Mandarin Speech Recognition by Lattice Rescoring with Enhanced Tone models. In: Proceedings of International Symposium on Chinese Spoken Language Processing, 2006: 445-443
  • 6王韫佳.音高和时长在普通话轻声知觉中的作用[J].声学学报,2004,29(5):453-461. 被引量:33
  • 7曹阳,黄泰翼,徐波.基于统计方法的汉语连续语音中声调模式的研究[J].自动化学报,2004,30(2):191-198. 被引量:9
  • 8Yang W J, Lee J C, Chang Y C et al. Hidden Markov Model for Mandarin lexical tone recognition. IEEE Trans. on Acoustic Speech and Signal Processing, 1988; 36(7): 988-992
  • 9Thubthong N, Kijsirikul B, Tone recognition of continuous Thai speech under tonal assimilation and declination effects using half-tone model. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2001; 9(6): 815-825
  • 10CAO Yang, ZHANG Shu Wu, HUANG Tai Yi et al. Tone modeling for continuous Mandarin speech recognition. International Journal of Speech Technology, 2004; 7(2-3): 115-128

二级参考文献49

  • 1刘海滨,吴镇扬,赵力,曾毓敏.噪声环境下基于最大后验非线性变换的隐马尔可夫模型自适应算法[J].声学学报,2004,29(5):467-471. 被引量:4
  • 2吕成国,韩纪庆,王承发.动态时间规正与差别子空间相结合的变异语音识别方法[J].声学学报,2005,30(3):229-234. 被引量:2
  • 3赵蕤,王作英.语音识别中信道和噪音的联合补偿[J].声学学报,2006,31(5):466-470. 被引量:11
  • 4张家禄 齐士钤 宋美珍 等.汉语声调在言语可懂度中的重要作用.声学学报,1981,7:237-237.
  • 5[1]Zhang J S, Hirose K. Anchoring hypothesis and its application to tone recognition of Chinese continuous speech acoustics [A]. Proc IEEE Int Conf Acoust,Speech, Signal Processing [C]. Istanbul, Turkey:ICASSP, 2000. 1419-1422.
  • 6[2]u Y, Hemmi K, Inoue K. A tone recognition of polysyllabic Chinese words using an approximation model of four tone pitch patterns[A]. Proc Industrial Electronics, Control and Instrumentation Proceeding[C]. Asilomar, Califormia, USA: IECON,1991. 2115-2119.
  • 7[3]Zhang G L, Zheng F, Wu W H. Tone recognition of Chinese continuous speech[A]. International Symposium on Chinese Spoken Language Processing[C].Beijing: ISCSLP, 2000. 207-210.
  • 8[4]Kobayashi H, Shimamura T. A weighted autocorrelation method for pitch extraction of noisy speech[A]. Proc IEEE Int Conf Acoust, Speech, Signal Processing[C]. Istanbul, Turkey: ICASSP, 2000.1307- 1310.
  • 9[5]Hemandez D H, Huici M E, Lorenzo G J. Combined algorithm for pitch detection of speech signals [J].Electronics Letters, 1995, 31 ( 5 ): 15 - 16.
  • 10[6]Samad S A, Hussain A, Low K F. Pitch detection of speech signals using the cross correlation technique[A]. Intelligent Systems and Technologies for the Next Millenium[C]. Kuala Lumpur Malaysia: TENCON, 2000. 283-286.

共引文献43

同被引文献28

  • 1Rohit Sinha, Umesh S. A shift-based approach to speaker normalization using non-linear frequency-scaling model. Speech Communication, 2008; 50(3): 191--202.
  • 2Bharath Kumar S V, Umesh S, Sinha R. Study of non-linear frequency warping functions for speaker normalization. Acoustics, Speech and Signal Processing, ICASSP 2006; 1: I- 1245--I-1248.
  • 3CHI Xuemin, Morgan Sonderegger. Subglottal coupling and its influence on vowel formants. J. Acoust. Soc. Am., 2007; 122(3):1735--1745.
  • 4Lulich S M. A role for the second subglottal resonance in lexical access. J. Acoust. Soc. Am., 2007; 122(4): 2320--2327.
  • 5Wang S, Alwan A, Lulich S M. Speaker normalization based on subglottal resonances. ICASSP, 2008:4277-4280.
  • 6Wang S, Lulich S M, Alwan A. A reliable technique for detecting the second subglottal resonance and its use in cross-language speaker adaptation. Interspeech, 2008:1717--1720.
  • 7Murthi M N, Rao B D. All-pole modeling of speech based on the minimum variance distortionless response spectrum. IEEE Trans. Acoustic Speech Signal Process., 2000; 8(3): 221--239.
  • 8Yapanel U H, Satya Dharanipragada. Perceptual MVDR-based cepstral coefficients (PMCCs) for noise robust speech recognition. IEEE ICASSP, 2003; 1:644 -647.
  • 9Yapanel U H, Hansen J H L. A new perceptually motivated MVDR- based acoustic front-end (PMVDR) for robust automatic speech recognition. Speech Communication, 2008; 50(2): 142--152.
  • 10Rabiner L R. A tutorial on hidden Markov models and selected application in speech recognition. Processing of IEEE, 1989: 257-- 286.

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部