期刊文献+

基于瓶颈复合特征的声学模型建立方法 被引量:3

Acoustic Model Construction Method Based on Bottleneck Compound Feature
下载PDF
导出
摘要 针对梅尔频率倒谱系数(MFCC)语音特征不能有效反映连续帧之间有效信息的问题,基于深度神经网络相关性和紧凑性特征,提出一种融合神经网瓶颈特征与MFCC特征的复合特征构造方法,提高语音的表征能力和建模能力。从语音数据中提取MFCC特征作为输入数据,将MFCC特征和BN特征进行串接得到新的复合特征,并进行GMM-HMM声学建模。在TIMIT数据库上的实验结果表明,与单一的瓶颈特征和深度神经网络后验特征相比,该方法识别率明显提升。 The Mel-Frequency Cepstral Coefficient(MFCC)speech features cannot effectively reflect the effective information between consecutive frames.To address the problem,this paper uses deep neural network to extract bottleneck features with long-term correlation and compactness of speech,and on this basis proposes a compound feature construction method that combines the neural network bottleneck features and the MFCC feature,so as to improve the speech characterization and modeling capabilities.The MFCC feature is extracted from the speech data as the input,and then concatenated with the BN feature to obtain a new compound feature.On this basis the acoustic modeling of Mixture Model-Hidden Markov Model(GMM-HMM)is implemented.Experimental results on the TIMIT database show that compared with the methods based on the single bottleneck feature and deep neural network posterior feature,the proposed method can significantly increases the recognition rate.
作者 郑文秀 赵峻毅 文心怡 姚引娣 ZHENG Wenxiu;ZHAO Junyi;WEN Xinyi;YAO Yindi(School of Communication and Information Engineering,Xi’an University of Posts and Telecommunications,Xi’an 710121,China)
出处 《计算机工程》 CAS CSCD 北大核心 2020年第11期301-305,314,共6页 Computer Engineering
基金 国际科技合作项目一般项目“基于大数据信息决策的智慧农业自动灌溉系统研究”(2018KW-025)。
关键词 深度神经网络 梅尔频率倒谱系数 瓶颈特征 复合特征 高斯混合模型-隐马尔科夫模型 Deep Neural Networks(DNN) Mel-Frequency Cepstral Coefficient(MFCC) bottleneck feature compound feature Gaussian Mixture Model-Hidden Markov Model(GMM-HMM)
  • 相关文献

参考文献10

二级参考文献93

  • 1马光志,倪国元.一种增量式模糊聚类算法[J].微计算机应用,2005,26(1):5-7. 被引量:8
  • 2李思一,戴蓓蒨,王海祥.基于子带GMM-UBM的广播语音多语种识别[J].数据采集与处理,2007,22(1):14-18. 被引量:2
  • 3Rabiner L R,Sambur M R.An algorithm for determining the endpoints of isolated utterances[J].The Bell System Technical Journal,1975,54(2):297-315.
  • 4Reynolds D A,Quatieri T F,Dunn R B.Speaker verification using adapted Gaussian mixture models[C] //Digital Signal Processing.2000:19-41.
  • 5Campbell W M,Sturim D E,Reynolds D A.Support vector machines using GMM supervectors for speaker verification[J].IEEE Signal Processing Letters,2006,13:308-11.
  • 6Bilmes JA.Maximum mutual information based reduction strategies for cross-correlation based joint distribution modeling[C] //IEEE Int.Conf.Acoust.,Speech,Signal Processing (ICASSP).Seattle,USA,May 1998.
  • 7Yang H H,Sharna S,van Vuuren S,et al.Relevance of timefrequency features for phonetic and speaker-channel classification[J].Speech Communication,2000,31 (1):35-50.
  • 8Fousek P,Lamel L,Gauvain J-L.Transcribing Broadcast Data using MLP Features[C] //Proceedings of Interspeech.2008.
  • 9Park J,Diehl F,Gales M,et al.Training and Adapting MLPFeatures for Arabic Speech Recognition[C] //Proc,of IEEE Conf.Acoust.Speech Signal Process(ICASSP).2009.
  • 10Picheny M,Nahamoo D,Goel V,et al.Trends and Advances in Speech Recognition[J].IBM Journal of Research and Development,2011,55(5):2.

共引文献95

同被引文献11

引证文献3

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部