期刊文献+

复杂环境下基于自适应深度神经网络的鲁棒语音识别 被引量:5

Robust speech recognition based onadaptive deep neural network in complex environment
下载PDF
导出
摘要 在连续语音识别系统中,针对复杂环境(包括说话人及环境噪声的多变性)造成训练数据与测试数据不匹配导致语音识别率低下的问题,提出一种基于自适应深度神经网络的语音识别算法。结合改进正则化自适应准则及特征空间的自适应深度神经网络提高数据匹配度;采用融合说话人身份向量i-vector及噪声感知训练克服说话人及环境噪声变化导致的问题,并改进传统深度神经网络输出层的分类函数,以保证类内紧凑、类间分离的特性。通过在TIMIT英文语音数据集和微软中文语音数据集上叠加多种背景噪声进行测试,实验结果表明,相较于目前流行的GMM-HMM和传统DNN语音声学模型,所提算法的识别词错误率分别下降了5.151%和3.113%,在一定程度上提升了模型的泛化性能和鲁棒性。 In a continuous speech recognition system,aiming at the complex environments(including the variability of speakers and environmental noise),the training data does not match the test data,which results in a low voice recognition rate.A speech recognition method based on adaptive deep neural network is studied.The improved regularized adaptive criterion and the adaptive deep neural network in the feature space are combined to improve data matching.The fusion of speaker identity vector i-vector and noise perception training are used to overcome speaker and environmental noise changes and improve the classification function of the output layer of the traditional deep neural network,which ensures the characteristics of compactness within the class and separation between classes.The test experiment was carried out by superimposing various background noises under the TIMIT English speech data set and the Microsoft Chinese speech data set.The results show that,compared with the current popular GMM-HMM and traditional DNN speech acoustic models,our proposal decreases the recognition word error rate by 5.151%and 3.113%respectively,which improves the generalization performance and robustness of the model to a certain extent.
作者 张开生 赵小芬 ZHANG Kai-sheng;ZHAO Xiao-fen(School of Electrical and Control Engineering,Shaanxi University of Science and Technology,Xi’an 710021,China)
出处 《计算机工程与科学》 CSCD 北大核心 2022年第6期1105-1113,共9页 Computer Engineering & Science
基金 国家自然科学基金(61601271) 陕西省科技计划(2017GY-063) 陕西省榆林市2020年科技计划(CXY-2020-090)。
关键词 语音识别 深度神经网络 改进自适应准则 特征空间 speech recognition deep neural network improved adaptive criterion feature space
  • 相关文献

参考文献12

二级参考文献86

  • 1Lee C H,Lin C H,and Juang B H.A study on speakeradaptation of the parameters of continuous density hiddenMarkov models[J].IEEE Transactions on Signal Processing,1991,39(4):806-814.
  • 2Ghoshal A,Povey D,Agarwal M,et al..A novel estimationof feature-space MLLR for full-covariance models[C].International Conference on Acoustics,Speech and SignalProcessing,Dallas,Texas,USA,2010:4310-4313.
  • 3Kuhn R,Junqua J C,Nguyen P,et al..Rapid speakeradaptation in eigenvoice space[J].IEEE Transactions onSpeech and Audio Processing,2000,8(6):695-707.
  • 4Teng W X,Gravier G,Bimbot F,et al..Rapid speakeradaptation by reference model interpolation[C].Interspeech,Antwerp,Belgium,2007:258-261.
  • 5Teng W X,Gravier G,Bimbot F,et al..Speaker adaptationby variable reference model subspace and application tolarge vocabulary speech recognition[C].InternationalConference on Acoustics,Speech and Signal Processing,Taipei,China,2009:4381-4384.
  • 6Jeong Y and Sim H S.New speaker adaptation method using2-D PCA[J].IEEE Signal Processing Letters,2010,17(2):193-196.
  • 7Jeong Y.Speaker adaptation based on the multilineardecomposition of training speaker models[C].InternationalConference on Acoustics,Speech and Signal Processing,Dallas,Texas,USA,2010:4870-4873.
  • 8Young S,Evermann G,Gales M,et al..The HTK Book.HTKVersion 3.4,2009.
  • 9Chang E,Shi Y,Zhou J,et al..Speech lab in a box:aMandarin speech toolbox to jumpstart speech relatedresearch[C].EUROSPEECH-2001,Aalborg,Denmark,2001:2799-2802.
  • 10Zhang Wen-lin,Zhang Wei-qiang,Li Bi-cheng,et al..Bayesian speaker adaptation based on a new hierarchical probabilistic model[J].IEEE Transactions on Audio,Speech and Language Processing,2012,20(7): 2002-2015.

共引文献168

同被引文献48

引证文献5

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部