期刊文献+

汉语语音声学特征复合的研究 被引量:3

Combining acoustic features for mandarin speech
原文传递
导出
摘要 抽取短时声学特征参数如MFCC、PLP,使用高斯混合模型(GMM)估计特征参数对应音素的概率分布的隐马尔可夫模型(HMM)在大词汇连续语音识别系统(LVCSR)已取得了良好识别效果.但短时特征却不能有效反应连续帧之间的相关特性,因此运用神经网络多层感知器(MLP)产生两类差异特征用于描述该帧的音素后验概率,并将其与传统特征复合为新的特征参数流,利用新特征流对GMHMM模型进行重构.对比实验结果表明,采用该混合声学特征的LVCSR系统其错字率(CER)有了3%~7%的改善. Typically Hidden Markov Model(HMM) in large vocabulary continuous speech recognition system(LVCSR),extracting short-term acoustic features vectors such as MFCC,PLP,estimating the distributions of the decelerated acoustic features that correspond to phoneme units by Gaussian mixture model(GMM),has achieved good recognition results.However,these short-time features are not explicitly optimized for phone discrimination.In this paper,two kind of multi-layer perceptrons(MLPs) are used to estimate posterior phone probabilities at the frame level.By combining the two neural-net discriminative features and regular features as base features processing with GMM,a large improvement is achieved.Experiments show the improved acoustic features leads to an absolute reduction of the character error rate(CER) of about 3% —7% .
出处 《云南大学学报(自然科学版)》 CAS CSCD 北大核心 2010年第S1期368-371,共4页 Journal of Yunnan University(Natural Sciences Edition)
关键词 声学特征 差异特征 神经网络 多层感知器 acoustic features discriminative features Artificial Neural Networks (ANN) multi-layer perceptron (MLP)
  • 相关文献

参考文献10

  • 1吕丹桔,Mei-Yuh Huang,B Hoffmeister.汉语连续语音识别之音素声学模型的改进[J].计算机仿真,2010,27(5):355-358. 被引量:7
  • 2吕丹桔,Ch. Plahl,B.Hoffmeister.大词汇连续汉语语音的MLP声学特征的研究[J].电脑知识与技术,2010,6(5):3470-3471. 被引量:1
  • 3HERMANSKY H,,ELLIS D P W,SHARMA S.Tandem connectionist feature stream extraction for conventional hmm systems. Proc IEEE Int Conf on Acoustics,Speech,and Signal Processing . 2000
  • 4ZHU Qi-feng.Incorporating tandem/HATs MLP features into SRI’s conversational speech recognition system. Proc.DARPA RTWorkshop . 2004
  • 5Jing Zheng.Combining Discriminative Feature,Transform,and Model Training for Large Vocabulary Speech Recognition. Proc.IEEE Int.Conf.on Acoustics,Speech,and Signal Processing . 2007
  • 6M Y Hwang,et al.Building a highly accurate mandarin speechrecognizer. Proc.IEEE Automatic Speech Recognition andUnderstanding Workshop . 2007
  • 7Chen B.Learning long-term temporal features in LVCSR using neural networks. Proc.Int.Conf.on Spoken Language Processing . 2004
  • 8Valente F,Hermansky H.Combination of acoustic classifiers based on dempster-shafer theory of evidence. Proc.IEEE Int.Conf.on Acoustics,Speech,and Signal Processing . 2007
  • 9Morgan N,Chen B Y,Zhu Q,et al.Trapping Conversational Speech:Extending TRAP/Tandem approaches to conversational telephonespeech recognition. Proceedings of IEEE ICASSP . 2004
  • 10Plahl C,Hoffmeister B,Hwang M,et al.Recent Improvements of the RWTH GALE Mandarin LVCSR System. Interspeech . 2008

二级参考文献17

  • 1李净,徐明星.汉语连续语音识别中声学模型基元比较:音节、音素、声韵母[C].第六届全国人机语音通信会议,20014:267-280.
  • 2MA Bin and HUO Qiang. Benchmark results of triphone - based acoustic modeling on HKU96 and HKU99 putonghua corpora [ J ]. International Symposium on Chinese Spoken Language Processing ( ISCSLP' 00), Oct. 13 - 15 2000. 359 - 362.
  • 3M Y Hwang, et. al. Building a highly accurate mandarin speech recognizer[ C ]. in Proc. IEEE Automatic Speech Recognition and Understanding Workshop, Kyoto, Japan, Dec. 2007. 490 - 495.
  • 4M Y Hwang, X D Huang and F Alleva. Predicting unseen triphones with senones[C], in Proc. ICASSP, 1993.311 -314.
  • 5C J Chen, et. al. Recognize tone languages using pitch information on the main vowel of each syllable[C], in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Salt LakeCity, USA, May 2001,1:61 -64.
  • 6P F WONG and M H SIU. Decision tree based tone modeling for Chinese speech recognition[ C ]. in Prec. ICASSP, 2004,1. 905 -908.
  • 7B Hoffmeister et. al. Development of the 2007 RWTH mandarin LVCSR system[ C]. in Proc. IEEE Automatic Speech Recognition and Understanding Workshop, Kyoto, Japan, Dec. 2007. 455 - 460.
  • 8C Plaid, B Hoffmeister, M Hwang, D Lu, G I-leigold, J L?? f, R Schluter and H Ney. Recent Improvements of the RWTH GALE Mandarin LVCSR System[J]. In Interspeech, Brisbane, Australia, September 2008. 2426 -2429.
  • 9ZHU Qi-feng.Incorporating tandem/HATs MLP features into SRI's conversational speech recognition system[J].in Proc.DARPA RT Workshop 2004.
  • 10Jing Zheng.Combining Discriminative Feature,Transform,and Model Training for Large Vocabulary Speech Recognition[C].in Proc.IEEE Int.Conf.on Acoustics,Speech,and Signal Processing,Honolulu,Hawaii,2007(4):633-636.

共引文献6

同被引文献20

引证文献3

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部