期刊文献+

一种新的基于瓶颈深度信念网络的特征提取方法及其在语种识别中的应用 被引量:10

New Feature Extraction Method Based on Bottleneck Deep Belief Networks and its Application in Language Recognition
下载PDF
导出
摘要 在语种识别中,传统的MFCC特征由于每帧信号上的信息量不足,很容易受到噪声污染,且抗噪能力较弱。同时,目前普遍使用的SDC特征提取方法在参数选择上需要人为设定,这增加了识别结果的不确定性。针对上述不足,将深度学习方法引入特征提取之中,提出了基于瓶颈深度信念网络的特征提取方法。最后在NIST2007数据库上对瓶颈层的大小、隐层数目以及瓶颈层位置进行了相关的对比实验,结果表明,提出的方法相对于传统的特征提取方法能够取得更高的识别率。 In language recognition,due to the insufficiency of information in each frame,traditional MFCC feature extraction is easily suffered from noise pollution.Meanwhile,the general method of SDC feature extraction depends on artificially setting in parameter selection which increases the uncertainty of recognition performance.In order to overcome these drawbacks,the deep learning method was introduced and a novel feature extraction approach named BN-DBN which is based on deep learning was proposed.Finally,the relevant comparative experiments for the bottleneck layer size,the number of hidden layers and the position of the bottleneck layer were carried out in NIST2007 database.Experimental results show that extraction method of the bottleneck features based on deep belief networks are more effective in language recognition,compared with traditional methods.
出处 《计算机科学》 CSCD 北大核心 2014年第3期263-266,共4页 Computer Science
基金 国家自然科学基金项目(61272333)资助
关键词 语种识别 瓶颈特征 深度信念网络 Language recognition Bottleneck features Deep belief networks
  • 相关文献

参考文献18

  • 1Rabiner L R,Sambur M R.An algorithm for determining the endpoints of isolated utterances[J].The Bell System Technical Journal,1975,54(2):297-315.
  • 2Reynolds D A,Quatieri T F,Dunn R B.Speaker verification using adapted Gaussian mixture models[C] //Digital Signal Processing.2000:19-41.
  • 3Campbell W M,Sturim D E,Reynolds D A.Support vector machines using GMM supervectors for speaker verification[J].IEEE Signal Processing Letters,2006,13:308-11.
  • 4Bilmes JA.Maximum mutual information based reduction strategies for cross-correlation based joint distribution modeling[C] //IEEE Int.Conf.Acoust.,Speech,Signal Processing (ICASSP).Seattle,USA,May 1998.
  • 5Yang H H,Sharna S,van Vuuren S,et al.Relevance of timefrequency features for phonetic and speaker-channel classification[J].Speech Communication,2000,31 (1):35-50.
  • 6Fousek P,Lamel L,Gauvain J-L.Transcribing Broadcast Data using MLP Features[C] //Proceedings of Interspeech.2008.
  • 7Park J,Diehl F,Gales M,et al.Training and Adapting MLPFeatures for Arabic Speech Recognition[C] //Proc,of IEEE Conf.Acoust.Speech Signal Process(ICASSP).2009.
  • 8Picheny M,Nahamoo D,Goel V,et al.Trends and Advances in Speech Recognition[J].IBM Journal of Research and Development,2011,55(5):2.
  • 9Deng L.An Overview of Deep-Structured Learning for Information Processing[C] //APSIPA ASC 2011.Xi'an:2011.
  • 10Hinton G E,Osindero S,Teh Y.A Fast Learning Algorithm for Deep Belief Nets[J].Neural Computation,2006,18:1527-1554.

二级参考文献7

  • 1Navrátil J.Spoken language recognition-a step toward multilinguality in speech processing[J].IEEE Transactions on Speech and Audio Processing,2001,9(6):678-685.
  • 2Zissman M A.Comparison of four approaches to automatic language identification of telephone speech[J].IEEE Transactions on Speech and Audio Processing,1996,4(1):31-46.
  • 3Torres-Carrasquillo P A,Singer E,Kohler M A,et al.Approaches to language identification using Gaussian mixture models and shifted delta cepstral features[C]//ICSLP.Denver,CO:IEEE Press,2002:89-92.
  • 4Torres-Carrasquillo P A,Reynolds D A,Deller J R.Language identification using Gaussian mixture model tokenization[C]//ICASSP.Orlando,FL:IEEE Press,2002:757-760.
  • 5Singer E,Torres-Carrasquillo P A,Gleason T P,et al.Acoustic,phonetic and discriminative approaches to automatic language recognition[C]//Eurospeech.Geneva,CH:Institate for Perceptual Artificial Intelligence,2003:1345-1348.
  • 6Reynolds D A,Quatieri T F,Dunn R B.Speaker verification using adapted Gaussian mixture models[J].Digital Signal Processing,2000,10:19-41.
  • 7Backfried G,Rainoldi R,Riedler J.Automatic language identification in broadcast news[C]//IJCNN.Honolalu,HI:IEEE Press,2002,2:1406-1410.

共引文献1

同被引文献104

  • 1HINTON G E,OSINDERO S,TEH Y W.A fast learning algorithm for deep belief nets[J].Neural Computation.2006,18(7):1527-1554.
  • 2JI Nannan,ZHANG Jiangshe,ZHANG Chunxia A sparse - response deep belief network based on rate distortion theory[J]. Pattern Recognition,2014,47(9):3179-3191.
  • 3LIN Miaozhen,XIN Fan.Low resolution face recognition with pose variations using deep belief networks[C]//Proc.20114 th International Congress on Image and Signal Processing.Shanghai:CISP,2011:1522-1526.
  • 4MOHAMED A,DAHL G E,HINTON G.Acoustic modeling using deep belief networks[J].IEEE Trans.Audio,Speech and Language Processing,2012,20(1):14-22.
  • 5DAHL G E,DONG Y,LI D,et al.Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J].IEEE Trans.Audio,Speech and Language Processing,2012,20(1):30-42.
  • 6AREL I,ROSE D C, KARNOWSKI T P. Deep machine learn- ing- A new frontier in artificial intelligence research [ J l- Computational intelligence magazine,2010,5(4) :13-18.
  • 7MARKOFF J. Scientists see promise in deep-learning pro- grams [ N ]. The New York Times, 2012-11-23.
  • 8PLIS S M , HJELM D R,SALAKHUTDINOV R,et al. Deep learning for neuroimaging: a validation study [ J ] Frontiers in neuroscience,2014( 8 ) :229.
  • 9ELFWING S, UCHIBE E, DOYA K. Expected energy-based restricted Boltzmann machine for classification [ J 1- Neural networks, 2015 ( 64 ) : 29 -38.
  • 10MOCANU D C, AMMAR H B. Factored four way conditional restricted Boltzmann machines for activity recognition [ J ]. Pattern recognition letters, 2015 (66) : 100-108.

引证文献10

二级引证文献150

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部