期刊文献+

一种基于层次结构深度信念网络的音素识别方法 被引量:2

Hierarchical Structure of Deep Belief Network for Phoneme Recognition
下载PDF
导出
摘要 针对现有音素识别系统识别准确率不高、建模方法表征能力不强且易陷入局部最优解等问题,提出了一种基于层次结构深度信念网络(deep belief network,DBN)的音素识别新方法.该方法由基于层次结构DBN的瓶颈特征以及基于DBN的音素分类器两部分组成:其中的瓶颈特征能够充分利用DBN能够处理长时段语音、监督性的提取方法等特性;而基于DBN的音素分类器则具有更强的建模和表征能力.因此,将两者结合在一起能够在提取低维、监督性特征的同时,利用DBN更加有效地对音素后验概率进行识别.在TIMIT数据库上进行的实验结果表明,所提出的音素识别方法在识别正确率上相对于以往音素识别系统有较大提高. To overcome the problem of poor recognition performance and being prone to be trapped in local optima, this paper proposes a hierarchical phoneme classification method based on deep belief network (DBN). The system consists of two parts: a bottleneck feature and a phoneme classifier, both DBN based. The two parts are combined to form a phoneme recognition system. The system can extract low dimensional and supervising features, and improve classification accuracy. Experiments on TIMIT corpus suggest that the proposed system can obtain 18.5% phoneme error rate as compared with existing systems.
出处 《应用科学学报》 CAS CSCD 北大核心 2014年第5期515-522,共8页 Journal of Applied Sciences
基金 国家自然科学基金(No.61272333) 安徽省自然科学基金(No.1208085MF94 No.1308085QF99)资助
关键词 音素识别 层次结构 深度信念网络 瓶颈特征 phoneme recognition, hierarchical structure, deep belief network, bottleneck feature
  • 相关文献

参考文献15

  • 1SCHWARZ P. Phoneme recognition based on long temporal context [D]. Faculty of Information Technology BUT, Brno University of Technology, Brno, Czech, 2008.
  • 2JANSEN A, NIYOGI P. Point process models for spotting keywords in continuous speech [J]. IEEE Transaction on Audio, Speech, and Language Processing, 2009, 17(8): 1457-1470.
  • 3SIOHAN 0, BACCHIANI M. Fast vocabulary independent audio search using path-based graph indexing [C]/ /Proceedings of the Eurospeech 2005, Lisbon, Portugal, 4-8 September 2005.
  • 4MATEJKA P, SCHWARZ P, CERNOCKY J, CHYTIL P. Phonotactic language identification using high quality phoneme recognition [C]/ /Proceedings of the INTERSPEECH, Lisbon, Portugal, 2005: 2237-2240.
  • 5DENG L. An overview of deep-structured learning for information processing [C]/ /Proceedings of the Asian-Pacific Signal and Information ProcessingAnnual Summit and Conference, Xian, China, 2011: 1-14.
  • 6HINTON G, SALAKHUTDINOV R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507.
  • 7BAO Y, JIANG H, LIU C. Investigation on dimensionality reduction of concatenated features with deep neural network for LVCSR systems [C]/ /Proceedings of the IEEE 11th International Conference on Signal Processing (ICSP2012), Beijing, China, 2012: 562- 566.
  • 8MOHAMED A, DAHL G, HINTON G. Acoustic modeling using deep belief networks [J]. IEEE Transaction on Audio, Speech, and Language Processing, 2012, 20(1): 14-22.
  • 9DAHL G, DONG Y, DENG L, ACERO A. Contextdependent pre-trained deep neural networks for large vocabulary speech recognition [J]. IEEE Transaction on Audio, Speech, and Language Processing, 2012, 20(1): 30-42.
  • 10PINTOJ, SIVARAMG, MAGIMAI-DOssM, HERMANSKY H, BOURLARD H. Analysis of MLP based hierarchical phoneme [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(2): 225-24l.

同被引文献32

  • 1HINTON G E,OSINDERO S,TEH Y W.A fast learning algorithm for deep belief nets[J].Neural Computation.2006,18(7):1527-1554.
  • 2JI Nannan,ZHANG Jiangshe,ZHANG Chunxia A sparse - response deep belief network based on rate distortion theory[J]. Pattern Recognition,2014,47(9):3179-3191.
  • 3LIN Miaozhen,XIN Fan.Low resolution face recognition with pose variations using deep belief networks[C]//Proc.20114 th International Congress on Image and Signal Processing.Shanghai:CISP,2011:1522-1526.
  • 4MOHAMED A,DAHL G E,HINTON G.Acoustic modeling using deep belief networks[J].IEEE Trans.Audio,Speech and Language Processing,2012,20(1):14-22.
  • 5DAHL G E,DONG Y,LI D,et al.Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J].IEEE Trans.Audio,Speech and Language Processing,2012,20(1):30-42.
  • 6Mohamed A, Dahl G, I-Iinton G. Acoustic modeling using deep belief networks[ J ]. IEEE Transactions on Audio, Speech and Language Processing, 2012, 20(1) : 14-22.
  • 7Hanlin Goh, Nicolas Thome, Matthieu Cord. Biasing restricted bolt- zmann machines to manipulate latent selectivity and sparsity [ C ]// NIPS workshop on deep learning and unsuperdsed feature learning. 2010.
  • 8Luo Heng, Shen Ruimin, Niu Changyong, et al. Sparse group re- stricted boltzmann machines [ C ]// Proceedings of the National Con- ference on Artificial Intelligence. San Francisco: AAAI, 2011: 429-434.
  • 9Andrew Maas, Awni Harlnun, Andrew Ng. Rectifier nonlinearities improve neural network acoustic models [ C 3// Proceedings of the 30th International Conference on Machine Learning. Atlanta : JMLR, 2013.
  • 10Senior A, Xin Lei. Fine context, low-rank, softplus deep neural networks for mobile speech recognition[ C]//2014 IEEE Internation- al Conference on Acoustics, Speech and Signal Processing. Florence: ICASSP, 2014: 7644-7648.

引证文献2

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部