期刊文献+

基于语速调整和音位属性后验概率的音素识别 被引量:5

A Speaking Rate Adaptation Technique and phonological Attribute Posterior for Phone Recognition
下载PDF
导出
摘要 基于语音事件检测的自动语音识别是当前研究的热点问题。针对说话人语速变化导致模型适应性差的问题,提出了一种语速自适应调整算法。该算法以语句为单位,采用连续变化的帧长与帧移间隔对语句进行归一化调整,使调整后速率与语料库平均速率一致,减小速率因素对模型训练的影响;另外,通过计算音位属性的后验概率向量间夹角,得到测试集的语速,相比采用训练模型的语速检测方法减轻了系统负担。本文将语速调整算法应用于音位属性的提取,并对音位属性特征进行非线性变换,最后采用隐马尔科夫模型进行建模,实验表明:经过语速调整后,音素的平均持续帧数较为恒定,动态变化范围减小,使得音素识别率提升了1.3%。 The event detection-based method has become state of the art technique in Automatic Speech Recognition (ASR). The differences in speaking rate may impair the adaptation ability of acoustical models, On account of this, A novel adaptation algorithm is proposed in this paper, which adjust the frame and step size in the front end of the system with the cell of one utterance, after adaptation, the speaking rate consistent with the average rate of the speech corpus and decreasing it' s effect in model training. In addition, this method calculates the angle between vectors of the posterior probability to get the speed of the testing set, which eased the burden of system compared to that by training models. The algorithm was used in the pre-processing before the phonological features detection stage, and then with the nonlinear transformation, we put them as the observation of Hidden Markov Models based phone recognition systems. After the adaptation approach, the av- erage frame of one phone in an utterance becomes constant and the dynamic range decreases, therefore the phoneme classifi- cation rate increase about 1.3%.
出处 《信号处理》 CSCD 北大核心 2012年第2期295-300,共6页 Journal of Signal Processing
基金 国家自然科学基金(61175017)
关键词 语速调整 音位属性检测 隐马尔可夫模型 自动语音识别 Speaking Rate Adaptation Phonological Attributes Detection Hidden Markov Models Automat-ic Speech Recognition
  • 相关文献

参考文献13

  • 1Chin-Hui Lee,Mark A.Clements,Sorin Dusan.An Overview on Automatic Speech Attribute Transcription(ASAT) [C]// Conference on the International Speech Communication Association.Antwerp,Belgium;InterSpeech Express, 2007.1825-1828.
  • 2S.King,P.Taylor.Detection of phonological features in continuous speech recognition using neural networks[J]. Computer,Speech and Language,2000,14(4):333-353.
  • 3M.A.Siegler,R.M.Stern.On the effects of speech rate in large vocabulary speech recognition systems[C]// International Conference on Acoustics,Speech,and Signal Processing. Detroit,MI:ICASSP express,1995.612-615.
  • 4V.R.Gadde,K.Sonmez,H.Franco.Multirate ASR Models for Phone-class Dependent N-best List Rescoring [C]//IEEE Workshop on Automatic Speech Recognition and Understanding(ASRU ).San Juan:IEEE express, 2005.157-161.
  • 5S.Dimopoulos,A.Potamianos,E.-F.Lussier,L.Chin-Hui. Multiple time resolution analysis of speech signal using MCE training with application to speech recognition [C]// International Conference on Acoustics,Speech, and Signal Processing.Tai Bei:IEEE express,2009. 3801-3804.
  • 6I-F Chen,Hsin-Min Wang.Articulatory Feature Asynchrony Analysis and Compensation in Detection-Based ASR//.International Speech Communication Association, Brighton United Kingdom,2009:3059-3062.
  • 7Zoltan Tuske,Christian Plahl,Ralf Schluter.A study on Speaker Normalized MLP Features in LVCSR[C]//Conference on the International Speech Communication Association. Florence,Italy,2011:1089-1092.
  • 8N.Strom,.“The NICO Artificial Neural Network Toolkit”, http://nico.nikkostrom.com.
  • 9Frantisek Grezl.Trap-Based Probabilistic Features For Automatic Speech Recognition[D].Brno,CZ:Brno University of Technology,2007.
  • 10Afsaneh Asaei,Benjamin Picart,Herve Bourlard.Analysis of Phone Posterior Feature space Exploiting Class-Specific Sparsity And MLP-Based Similarity Measure[C]// International Conference on ICASSP.Dallas,TX:2010. 4886-4889.

二级参考文献9

  • 1刘波,戴礼荣,王仁华,杜俊,李锦宇.基于双高斯GMM的特征参数规整及其在语音识别中的应用[J].自动化学报,2006,32(4):519-525. 被引量:4
  • 2R. C. Gonzalez, R. E. Woods. Digital Image Processing [ M ] , New Jersey, Prentice-Hall, 2002.
  • 3O. Viikki, K. Laurila. Cepstral Domain Segmental Feature Vector Normalization for Noise Robust Speech Recogni- tion[ J ]. Speech Communication, 1998,1 (25) : 133-147.
  • 4Hilger F, Molan S, Ney H. Quantile based histogram e- qualization for online application. Proceedings of Interna- tional Conference of Spoken Language Proceessing, Run- die Mall,Australia, Causal Productions,2002,237-240.
  • 5Segura J C, Benitez M C, de la Torre A, Rubio A J. Fea- ture extraction combining spectral noise reduction and cepstral histogram equalization for robust ASR [ J ]. Pro- ceedings of International Conference of Spoken Language Processing 2002, Rundle Mall, Australia, Causal Produc- tions, 2002,225-228.
  • 6Segura J C, Benitez M C, de la Torre A. VTS residual noise compensation [ J ]. Proceedings of International Conference on Acoustics and Signal Processing 2002.Piscataway, USA, IEEE Press,2002,209-212.
  • 7J. C. Segura, C. Benitez, ~. de la Torre, A. J. Rubio, J. Ramfrez. Cepstral Domain Segmental Nonlinear Feature Transformations for Robust Speec Recognition [ J ]. IEEE Signal Processing Letters ,2004,5( 11 ) :517-520.
  • 8Young S,Evermann G, Hain T et al. The HTK Book (for HTK Version 3.2.1 ). 2002, http : ff htk. eng. cam. ac. uk.
  • 9H. Y. Jun. Filtering of Filter-Bank Energies for Robust Speech Recognition [ J ]. ETRI, 3 ( 26 ), 2004,273-276.

共引文献3

同被引文献31

  • 1孙震,张江鑫.关于线性预测滤波器阶数的分析研究[J].杭州电子科技大学学报(自然科学版),2010,30(5):153-156. 被引量:7
  • 2Jansen A, Niyogi P. Point Process Models for Spotting Keywords in Continuous Speech[ J ]. IEEE Transactions on Audio, Speech, and 'Language Processing. 2009, 17 (8) : 1457-1470.
  • 3Jansen A. Whole Word Discriminative Point Process Mod- els[ C ]. IEEE International Conference on Acoustics, Speech and Signal Processing, 2011:5180-5183.
  • 4Deng L. An Overview of Deep-Structured Learning for Infor- marion Processing: APSIPA ASC 2011[C]. Xi'an: 2011.
  • 5Mohamed A, Dahl G E, Hinton G. Acoustic Modeling Using Deep Belief Networks[ J ]. IEEE Transactions on Audio, Speech, and Language Processing. 2012, 20 (1): 14-22.
  • 6Himon G E, Osindero S, Teh Y. A Fast Learning Algo- rithm for Deep Belief Nets [ J ]. Neural Computation. 2006, 18: 1527-1554.
  • 7Hinron G E, Salakhutdinov R. Reducing the Dimension- ality of Data with Neural Networks[ J]. Science. 2006, 313(5786) : 504-507.
  • 8Mostafa A. Salanm, Aboul Ella Hassanien, Aly A. Fahmy. Deep Belief Network for Clustering and Classification of a Continuous Data[ J]. IEEE Inlemational Symposium on Sig- nal Processing and Ibformation Technology, 2010: 473-477.
  • 9Mohamed A, Sainath T, Dahl G. Deep belief networks using discriminative features for phone recognition [ C ]. IEEE International Conference on Acoustics, Speech and Signal Processing, 2011: 5060-5063.
  • 10Pan J, Liu C, Wang Z, Hu Y, Jiang H. Investigalion of Deep Neural Networks (DNN) for Large Vocabulary Con-tinuous Speech Recognition Why DNN Surpasses GMMs in Acoustic Modeling. In Proceedings of International Sympo- sium on Chinese Spoken Language Processing 2012, un- published.

引证文献5

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部