期刊文献+

一种改进的DNN-HMM的语音识别方法 被引量:17

An improved speech recognition method based on DNN-HMM model
下载PDF
导出
摘要 针对深度神经网络与隐马尔可夫模型(DNN-HMM)结合的声学模型在语音识别过程中建模能力有限等问题,提出了一种改进的DNN-HMM模型语音识别算法.首先根据深度置信网络(DBN)结合深度玻尔兹曼机(DBM),建立深度神经网络声学模型,然后提取梅尔频率倒谱系数(MFCC)和对数域的Mel滤波器组系数(Fbank)作为声学特征参数,通过TIMIT语音数据集进行实验.实验结果表明:结合了DBM的DNN-HMM模型相比DNN-HMM模型更具优势,其中,使用MFCC声学特征在词错误率与句错误率方面分别下降了1.26% 和0.20%.此外,使用默认滤波器组的Fbank特征在词错误率与句错误率方面分别下降了0.48% 和0.82%,并且适量增加滤波器组可以降低错误率.总之,研究取得句错误率与词错误率分别降低到21.06% 和3.12% 的好成绩. The acoustic model combined with deep neural network and hidden Markov model (DNN-HMM) has been used extensively in today's speech recognition system.In this paper, an improved DNN-HMM model speech recognition algorithm is proposed. First, a deep neural network acoustic model is built by the deep belief network (DNN) and the deep Boltzmann machine (DBM). Then the Mel frequency cepstral coefficient (MFCC) and the log filter coefficient of the log domain (Fbank) are extracted as an acoustic feature parameter. Finally, the experiment is performed on the TIMIT speech data set. The experimental results show that the DNN-HMM model combined with DBM has more advantages than DNN-HMM model, in which the MFCC acoustic features can reduce the word error rate and sentence error rate by 1.26% and 0.20% respectively. Moreover, using the Fbank feature default filter group rate decreased the word error rate and sentence error rate by 0.48% and 0.82% respectively, and an appropriate increase in the filter bank group can reduce the error rate. In brief the sentence error rate and the word error rate were reduced to 21.06% and 3.12% respectively.
作者 李云红 梁思程 贾凯莉 张秋铭 宋鹏 何琛 王刚毅 李禹萱 LI Yunhong;LIANG Sicheng;JIA Kaili;ZHANG Qiuming;SONG Peng;HE Chen;WANG Gangyi;LI Yuxuan(School of Electronics and Information, Xi’an Polytechnic University, Xi’an 710048, China;State Grid Xi’an Power Supply Company, Xi’an 710032, China)
出处 《应用声学》 CSCD 北大核心 2019年第3期371-377,共7页 Journal of Applied Acoustics
基金 国家自然科学基金资助项目(61471161) 陕西省科技厅自然科学基础研究重点项目(2016JZ026) 国家级大学生创新创业项目(201810709009)
关键词 语音识别 深度神经网络 声学模型 声学特征 Speech recognition Deep neural network Acoustic model Acoustic feature
  • 相关文献

参考文献5

二级参考文献59

  • 1赵铭,崔慧娟,唐昆,杜文.谱包络参数的平滑算法[J].清华大学学报(自然科学版),2005,45(4):448-451. 被引量:5
  • 2赵毅,尹雪飞,陈克安.一种新的基于倒谱的共振峰频率检测算法[J].应用声学,2010,29(6):416-424. 被引量:9
  • 3何峰,陈晓清,李国锁,林嘉宇.一种新的语音信号共振峰提取的算法[J].信号处理,2007,23(4):618-621. 被引量:6
  • 4荣薇,陶智,顾济华,赵鹤鸣.基于改进LPCC和MFCC的汉语耳语音识别[J].计算机工程与应用,2007,43(30):213-216. 被引量:17
  • 5Bouzid M.Robust quantization of LPC parameters for speech communication over noisy channel[C]//Proceedings of the 2nd International Conference on the Applications of DigtialInformation and Web Technologies, Aug 2009 : 713-718.
  • 6Zhang X Y, Guo Y L,Hou X M.A speech recognition method of isolated words based on modified LPC ceps- trum[C]//Proceedings of the IEEE International Confer- ence on Granular Computing,Nov 2007:481-485.
  • 7Hosseinzadeh D, Krishnan S.Combining vocal source and MFCC features for enhanced speaker recognition perfor- mance using GMMs[C]//Proceedings of the IEEE 9th Workshop on Multimedia Signal Processing, Oct 2007: 365-368.
  • 8Skowronski M D,Harris J G.Increased MFCC filter band- width for noise-robust phoneme recognition[C]//Proc of IEEE Int'l Conf on Acoustics Speech and Signal Pro- cessing, 2002 : 801-804.
  • 9Ezzaidi H, Rouat J.Pitch and MFCC dependent GMM models for Speaker Identification systems[C]//Proceedings of the Canadian Conference on Electrical and Computer Engineering, May 2004 : 43-44.
  • 10Shannon B J,Paliwal K K.Feature extraction fxom higher- lag autocorrelation coefficients for robust speech recog- nition[J].Speech Communication, 2006,48(1 1):1458-1485.

共引文献130

同被引文献175

引证文献17

二级引证文献78

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部