汉语语音识别中基于区分性权重训练的声调集成方法被引量：2

Tone model integration based on discriminative weight training for Putonghua speech recognition

下载PDF

导出

摘要提出一种区分性方法,将声调信息加入大词汇量连续语音识别系统中。该方法根据最小音子错误准则,区分性地圳练模型相关的概率权重。利用这些权重对传统基于传统谱特征的隐马尔可夫模型概率以及声调模型概率进行加权,通过调整模型之间的作用程度提高系统识别率。推导了利用扩展Baum-welch算法的权重更新公式。对不同模型权重组合策略进行了评估,并利用权重之间的平滑方法来克服权重训练过拟合的问题。分别通过大词汇连续语音的带调音节输出和汉字输出两种识别任务来验证区分性模型权重训练的性能。实验结果表明在两种识别任务上,区分性的模型权重较使用全局模型权重分别获得9.5%以及4.7%的相对误识率降低。这表明了区分性模型权重对提高声调集成性能的有效性。 A discriminative framework of tone model integration into continuous speech recognition is proposed. The method uses model dependent weights to scale probabilities of the hidden Markov model based on spectral features and tone models based on tonal features. The weights are discriminatively trained by the minimum phone error criterion and update equation of model weights based on the extended Baum-Welch algorithm is derived. Variant schemes of model weight combination are evaluated and a smoothing technique is introduced to make training robust to over fitting. The proposed method is evaluated on tonal syllable output and character output speech recognition tasks. Experiments results show the proposed method has obtained 9.5% and 4.7% relative error reduction than global weight on the two tasks due to a better interpolation of the given models. This proves the effectiveness of discriminative trained model weights for tone model integration.

作者黄浩朱杰

机构地区上海交通大学电子工程系

出处《声学学报》 EI CSCD 北大核心 2008年第1期1-8,共8页 Acta Acustica

关键词汉语语音识别权重训练集成方法区分性声调隐马尔可夫模型 Baum-Welch算法语音识别系统 Algorithms Hidden Markov models Integration Interpolation

分类号 TN912.34 [电子电信—通信与信息系统] TP393 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献17

1Huang C H, Side F. Pitch tracking and tone features for mandarin speech recognition. Proceedings of the 25th International Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey, 2000; 3:1523-1526
2章文义,朱杰,徐向华.利用声调提高中文连续数字串语音识别系统性能[J].上海交通大学学报,2004,38(2):185-188. 被引量：3
3王欢良,钱瑶,F.K.Soong,韩纪庆.基于声调建模的带噪汉语数字串语音识别[J].声学学报,2007,32(5):454-460. 被引量：2
4Lei X, S M, Hwang M, Ostendorf M et al. Improved tone modeling for mandarin broadcast news speech recognition. In: Proceedings of Interspeech (ICSLP), Pittsburgh, USA, 2006:1277-1280
5Wang H L, Qian Y, Soong F K, Zhou J L et al. Improved Mandarin Speech Recognition by Lattice Rescoring with Enhanced Tone models. In: Proceedings of International Symposium on Chinese Spoken Language Processing, 2006: 445-443
6王韫佳.音高和时长在普通话轻声知觉中的作用[J].声学学报,2004,29(5):453-461. 被引量：33
7曹阳,黄泰翼,徐波.基于统计方法的汉语连续语音中声调模式的研究[J].自动化学报,2004,30(2):191-198. 被引量：9
8Yang W J, Lee J C, Chang Y C et al. Hidden Markov Model for Mandarin lexical tone recognition. IEEE Trans. on Acoustic Speech and Signal Processing, 1988; 36(7): 988-992
9Thubthong N, Kijsirikul B, Tone recognition of continuous Thai speech under tonal assimilation and declination effects using half-tone model. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2001; 9(6): 815-825
10CAO Yang, ZHANG Shu Wu, HUANG Tai Yi et al. Tone modeling for continuous Mandarin speech recognition. International Journal of Speech Technology, 2004; 7(2-3): 115-128

二级参考文献49

1刘海滨,吴镇扬,赵力,曾毓敏.噪声环境下基于最大后验非线性变换的隐马尔可夫模型自适应算法[J].声学学报,2004,29(5):467-471. 被引量：4
2吕成国,韩纪庆,王承发.动态时间规正与差别子空间相结合的变异语音识别方法[J].声学学报,2005,30(3):229-234. 被引量：2
3赵蕤,王作英.语音识别中信道和噪音的联合补偿[J].声学学报,2006,31(5):466-470. 被引量：11
4张家禄齐士钤宋美珍等.汉语声调在言语可懂度中的重要作用.声学学报,1981,7:237-237.
5[1]Zhang J S, Hirose K. Anchoring hypothesis and its application to tone recognition of Chinese continuous speech acoustics [A]. Proc IEEE Int Conf Acoust,Speech, Signal Processing [C]. Istanbul, Turkey:ICASSP, 2000. 1419-1422.
6[2]u Y, Hemmi K, Inoue K. A tone recognition of polysyllabic Chinese words using an approximation model of four tone pitch patterns[A]. Proc Industrial Electronics, Control and Instrumentation Proceeding[C]. Asilomar, Califormia, USA: IECON,1991. 2115-2119.
7[3]Zhang G L, Zheng F, Wu W H. Tone recognition of Chinese continuous speech[A]. International Symposium on Chinese Spoken Language Processing[C].Beijing: ISCSLP, 2000. 207-210.
8[4]Kobayashi H, Shimamura T. A weighted autocorrelation method for pitch extraction of noisy speech[A]. Proc IEEE Int Conf Acoust, Speech, Signal Processing[C]. Istanbul, Turkey: ICASSP, 2000.1307- 1310.
9[5]Hemandez D H, Huici M E, Lorenzo G J. Combined algorithm for pitch detection of speech signals [J].Electronics Letters, 1995, 31 ( 5 ): 15 - 16.
10[6]Samad S A, Hussain A, Low K F. Pitch detection of speech signals using the cross correlation technique[A]. Intelligent Systems and Technologies for the Next Millenium[C]. Kuala Lumpur Malaysia: TENCON, 2000. 283-286.

共引文献43

1邓丹,谭坤明.母语背景对二语学习者汉语轻声感知的影响研究[J].中国语音学报,2023(2):3-12.
2殷治纲.汉语节奏研究综述[J].中国语音学报,2022(2):33-50.
3张驰,胡宁萱,张劲松.日本、韩国、哈萨克、越南普通话学习者的轻声产出[J].中国语音学报,2021(2):120-129. 被引量：1
4殷治纲.汉语词汇层面的轻重音研究[J].中国语音学报,2021(2):95-109. 被引量：1
5王韫佳.北京话词重音问题散议[J].韵律语法研究,2021(2):130-149.
6邓丹,朱琳.基于语流的普通话轻声感知研究[J].南开语言学刊,2020(2):29-37.
7梁磊,石锋.普通话两字组的音量比分析[J].南开语言学刊,2010(2):35-41. 被引量：37
8包永强,赵力,邹采荣.采用归一化补偿变换的与文本无关的说话人识别[J].声学学报,2006,31(1):55-60. 被引量：13
9邵艳秋,韩纪庆,刘挺,赵永贞.自然风格言语的汉语句重音自动判别研究[J].声学学报,2006,31(3):203-210. 被引量：17
10汤霖,尹俊勋.普通话声调的客观评测[J].中文信息学报,2007,21(6):116-124. 被引量：4

同被引文献28

1Rohit Sinha, Umesh S. A shift-based approach to speaker normalization using non-linear frequency-scaling model. Speech Communication, 2008; 50(3): 191--202.
2Bharath Kumar S V, Umesh S, Sinha R. Study of non-linear frequency warping functions for speaker normalization. Acoustics, Speech and Signal Processing, ICASSP 2006; 1: I- 1245--I-1248.
3CHI Xuemin, Morgan Sonderegger. Subglottal coupling and its influence on vowel formants. J. Acoust. Soc. Am., 2007; 122(3):1735--1745.
4Lulich S M. A role for the second subglottal resonance in lexical access. J. Acoust. Soc. Am., 2007; 122(4): 2320--2327.
5Wang S, Alwan A, Lulich S M. Speaker normalization based on subglottal resonances. ICASSP, 2008:4277-4280.
6Wang S, Lulich S M, Alwan A. A reliable technique for detecting the second subglottal resonance and its use in cross-language speaker adaptation. Interspeech, 2008:1717--1720.
7Murthi M N, Rao B D. All-pole modeling of speech based on the minimum variance distortionless response spectrum. IEEE Trans. Acoustic Speech Signal Process., 2000; 8(3): 221--239.
8Yapanel U H, Satya Dharanipragada. Perceptual MVDR-based cepstral coefficients (PMCCs) for noise robust speech recognition. IEEE ICASSP, 2003; 1:644 -647.
9Yapanel U H, Hansen J H L. A new perceptually motivated MVDR- based acoustic front-end (PMVDR) for robust automatic speech recognition. Speech Communication, 2008; 50(2): 142--152.
10Rabiner L R. A tutorial on hidden Markov models and selected application in speech recognition. Processing of IEEE, 1989: 257-- 286.

引证文献2

1侯丽敏,黄振华,谢娟敏.声门下共鸣的谱规整用于非特定人的语音识别[J].声学学报,2010,35(5):580-586.
2晁浩,杨占磊,刘文举.基于发音特征的汉语声调建模方法及其在汉语语音识别中的应用[J].计算机应用,2013,33(10):2939-2944. 被引量：2

二级引证文献2

1张居晓,曾晓勤,孟朝晖.盲用多点触摸输入法的设计与实现[J].计算机应用与软件,2015,32(10):231-235.
2闻静.基于HMM的非特定人汉语语音识别系统[J].中国工程机械学报,2014,12(5):466-470. 被引量：2

1黄浩,朱杰.TONE MODELING BASED ON HIDDEN CONDITIONAL RANDOM FIELDS AND DISCRIMINATIVE MODEL WEIGHT TRAINING[J].Transactions of Nanjing University of Aeronautics and Astronautics,2008,25(1):43-50. 被引量：1
2黄浩,朱杰.Discriminative tone model training and optimal integration for Mandarin speech recognition[J].Journal of Southeast University(English Edition),2007,23(2):174-178.
3黄浩,朱杰,哈力旦.汉语语音识别中的区分性声调建模方法[J].计算机工程与应用,2009,45(11):178-182. 被引量：4
4黄浩,李兵虎,吾守尔.斯拉木.区分性模型组合中基于决策树的声学上下文建模方法[J].自动化学报,2012,38(9):1449-1458. 被引量：1
5杨飚,尚秀伟.加权随机森林算法研究[J].微型机与应用,2016,35(3):28-30. 被引量：9
6李锋,俞能海,郑裕峰,陈羽中.移动自组网中一种新颖的基于概率权重的门限签名方案[J].电路与系统学报,2007,12(3):107-111.
7努尔麦麦提.尤鲁瓦斯,吾守尔.斯拉木,热依曼.吐尔逊.基于音节的维吾尔语大词汇连续语音识别系统[J].清华大学学报（自然科学版）,2013,53(6):741-744. 被引量：5
8努尔麦麦提.尤鲁瓦斯,吾守尔.斯拉木,热依曼.吐尔逊.维吾尔语大词汇语音识别系统识别单元研究[J].北京大学学报（自然科学版）,2014,50(1):149-152. 被引量：4
9陆国丽,王小华,王荣波.最大词重降维算法与模拟退火算法相结合的文本聚类方法研究[J].现代图书情报技术,2008(12):43-47. 被引量：2
10桑农,张涛,李斌,吴翔.基于字典学习的背景建模[J].华中科技大学学报（自然科学版）,2013,41(9):28-31. 被引量：2

声学学报

2008年第1期

浏览历史

内容加载中请稍等...

汉语语音识别中基于区分性权重训练的声调集成方法被引量：2

参考文献17

二级参考文献49

共引文献43

同被引文献28

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

汉语语音识别中基于区分性权重训练的声调集成方法 被引量：2

参考文献17

二级参考文献49

共引文献43

同被引文献28

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

汉语语音识别中基于区分性权重训练的声调集成方法被引量：2