期刊文献+

基于区分性准则的Bottleneck特征及其在LVCSR中的应用 被引量:2

Discriminative Criterion Based Bottleneck Feature and Its Application in LVCSR
下载PDF
导出
摘要 基于深层神经网络中间层的Bottleneck(BN)特征由于可以采用传统的混合高斯模型-隐马尔可夫建模(Gaussian mixture model-hidden Markov model,GMM-HMM),在大规模连续语音识别中获得了广泛的应用。为了提取区分性的BN特征,本文提出在使用传统的BN特征训练好GMM-HMM模型之后,利用最小音素错误率(Minimum phone error,MPE)准则来优化BN网络参数以及GMM-HMM模型参数。该算法相对于其他区分性训练算法而言,采用的是全部数据作为一个大的数据包,而不是小的包方式来训练深度神经网络,从而可以大大加快训练速度。实验结果表明,优化后的BN特征提取网络比传统方法能获得9%的相对词错误率下降。 Bottleneck (BN) features based on the middle layer of deep neural network have been widly ap‐plicated to large vocabulary continuous speech recognition (LVCSR) ,because they can use the traditional Gaussian mixture density hidden Markov model (GMM‐HMM) for acoustic modeling .In order to extract discriminative bottleneck features ,the parameters of the BN feature extractor and GMM‐HMM are opti‐mized jointly by using the minimum phone error (MPE) criterion after training the GMM‐HMM using the conventional BN features .Different from other discriminative training method ,large batches instead of mini‐batch in conventional neural network optimization are used to obtain the statistics ,which acceler‐ates training speed .Experiments demonstrate that the proposed bottleneck feature extractor can outper‐form the traditional methods with 9% relative word error reduction .
作者 刘迪源 郭武
出处 《数据采集与处理》 CSCD 北大核心 2016年第2期331-337,共7页 Journal of Data Acquisition and Processing
关键词 语音识别 神经网络 区分性训练 Bottleneck特征 speech recognition neural networks discriminative training Bottleneck feature
  • 相关文献

参考文献16

  • 1Kapadia S, Valtchev V, Young S J. MMI training for continuous phoneme recognition on the TIMIT database[C]//Proceed i ngs of International Conference on Acoustics, Speech and Signal Processing. Minnesota, USA IEEE, 1993491-494.
  • 2Juang B H, Chou W, Lee C H. Minimum classification error rate methods for speech recognition[J]. IEEE Transactions, 1997,5(3) : 257-265.
  • 3Mc Derrnott E, Hazen T, Roux J L, et al. Discriminative training for large vocabulary speech recognition using minimum classification error[J]. IEEE Transactions, 2007,15 (1) : 203-223.
  • 4Povey D, Woodland P. Minimum phone error and I-smoothing for improved discriminative training[C]//Proceedings of In- ternational Conference on Acoustics, Speech and Signal Processing. Florida, USA: IEEE, 2002:105-108.
  • 5Povey D, Kingsbury B, Mangu L, et al. FMPE: Discriminatively trained features for speech recognition[C]//Proceedings of International Conference on Acoustics, Speech and Signal Processing. Philadelphia, USA: IEEE, 2005:961-964.
  • 6Povey D, Kanevsky D, Kingsbury B, et al. Boosted MMI for model and feature-space discriminative training[C]//Proceed- ings of International Conference on Acoustics, Speech and Signal Processing. Las Vegas, USA: IEEE, 2008:4057-4060.
  • 7Saon G, Kingshury B. Discriminative feature-space transforms using deep neural networks[C]//Proceedings of International Speech Communication Association. Portland, USA IEEE, 2012.
  • 8余华,黄程韦,金赟,赵力.基于粒子群优化神经网络的语音情感识别[J].数据采集与处理,2011,26(1):57-62. 被引量:20
  • 9徐以中.神经网络模拟实验与语言认知研究的互动[J].南京航空航天大学学报(社会科学版),2010,12(1):75-79. 被引量:1
  • 10Dahl G E, Yu D, Deng L, et al. Context-dependent pre-trained deep neural networks for large vocabulary speech recognition [J]. IEEE Transactions, 2012,20(1) ..30-42.

二级参考文献35

  • 1孙宁,孙劲光,孙宇.基于神经网络的语音识别技术研究[J].计算机与数字工程,2006,34(3):58-61. 被引量:9
  • 2余嘉元.认知心理学与神经网络[M]//周志华,曹存根.神经网络及其应用.北京:清华大学出版社,2004.
  • 3Plaut D C, Kello C T. The interplay of speech comprehension and production in phonological development: A forward modeling approach[C]//In B. Mac Whinney (Ed.), The emergence of language. Mahwah. New Jersey: Lawrence Erlbaum Associates. 1999: 381-415.
  • 4Pinker S, Prince A. On language and connectionism: Analysis of a parallel distributed processing model of language acquisition[J]. Cognition, 1988 (28): 73-193.
  • 5Joanisse M. F, Seidenberg M. S. , Impairments in verb morphology after brain injury: A connectionist model[J]. Proceedings of the National Academy of Sciences of the United States of America. 1999(96):7592-7597.
  • 6Banich M T, Mack M. Mind, Brain, and Language: Multidisciplinary Perspectives [M ]. New Jersey: Lawrence Erlbaum Associates, Inc. Publishers, 2002 : 158-162.
  • 7唐一源,唐焕文,等.神经网络及其应用[M].北京:清华大学出版社,2004.
  • 8Wright J F, Ahmad K. The connectionist Simulation of Aphasic Naming[J]. Brain and Language, 1997 (59).
  • 9Nadeau S E. Phonology: A Review and Proposals from a Connectionist Perspective[J]. Brain and Language, 2001 (79): 511-579.
  • 10Husain F T. Tagamets M.-A. , Fromm S. J. , Braun A. R. , Horwitz B. Relating neuronal dynamics for auditory object processing to neuroimaging activity: a computational modeling and an Fmri study[J]. NeuroImage, 2004 (21): 1701-1720.

共引文献19

同被引文献7

引证文献2

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部