期刊文献+

基于网格的语音关键词检索算法改进 被引量:2

Improved lattice-based speech keyword spotting algorithm
原文传递
导出
摘要 针对多候选汉语音节网格语音关键词检索任务,在Gauss混合模型以及多候选识别算法方面进行了研究改进。首先探讨了Gauss混合模型的不同简化策略并用实验进行了验证,证明了全协方差矩阵在识别性能上的优越性;随后对经典的多候选令牌传递算法做出了针对汉语特点的改进。实验表明这2方面的研究不仅提高了以音节作为输出的语音识别引擎的单候选识别效果,也大幅提高了多候选的识别性能。最后搭建了一个基于多候选网格的语音关键词检索系统,在该系统中验证了上述改进的效果。 An improved lattice-based speech keyword spotting system was developed from the Gaussian mixture model and an improved N-best speech recognition algorithm.First,tests were used to evaluate different simplified structures of Gaussian mixture models.Then,an N-best token passing algorithm was developed from the classic token passing algorithm using some unique pronunciation rules for the Chinese language.These two modifications improve the performance of both the 1-best and N-best speech recognition candidates.Finally,a key word spotting system was developed based on an N-best lattice to show the effectiveness of these improvements.
作者 肖熙 王竞千
出处 《清华大学学报(自然科学版)》 EI CAS CSCD 北大核心 2015年第5期508-513,共6页 Journal of Tsinghua University(Science and Technology)
关键词 语音关键词检索 多候选网格 Gauss混合模型 CUDA 三音子模型 speech keyword spotting multi-candidate lattice Gaussian mixture model compute unified device architecture(CUDA) triphone model
  • 相关文献

参考文献11

  • 1Reynolds D A, Rose R C. Robust text-independent speaker identification using Gaussian mixture speaker models [J]. IEEE Trans Speech Audio Process, 1995, 3(1) : 72 - 83.
  • 2Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups [J]. IEEE Signal Process Mag, 2012, 29(6) : 82-97.
  • 3Yu D, Deng L, Seide F, The deep tensor neural network with applications to large vocabulary speech recognition [J]. IEEE Trans Audio Speech Lang Process, 2013, 21(2): 388 - 396.
  • 4Pang Z, Tu S, Su D, et al. Discriminative training of GMM-HMM acoustic model by RPCL learning [J]. Front Electr Electron Eng China, 2011, 6(2) : 283 - 290.
  • 5Povey D, Burger L, Agarwal M, et al. The subspace Gaussian mixture model: A structured model for speech recognition [J]. Comput Speech Lang, 2011, 25(2): 404- 439.
  • 6Du J, Hu Y, Jiang H. Boosted mixture learning of gaussian mixture hidden markov models based on maximum likelihood for speech recognition [J]. IEEE Trans Audio Speech Lang Process, 2011, 19(7)I 2091-2100.
  • 7Veiga A, Lopes C, Sd L, et al. Acoustic similarity scores for keyword spotting [J]. Computational Processing of the Portuguese Language, 2014, 8775: 48-58.
  • 8Thambiratnam K, Sridharan S. Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting [C]// Proc ICASSP. Philadelphia, PA, USA: IEEE Press, 2005: 465-468.
  • 9罗骏,欧智坚,王作英.基于拼音图的两阶段关键词检索系统[J].清华大学学报(自然科学版),2005,45(10):1356-1359. 被引量:1
  • 10Young S J, Russell N H, Thornton J H S. Token Passing, A Simple Conceptual Model for Connected Speech Recognition Systems, CUED/F-INFENG/TR, 38 [R]. Cambridge, UK: University of Cambridge, 1989.

二级参考文献8

  • 1欧智坚 罗骏 谢达东.多功能语音/音频信息检索系统的研究与实现[A]..全国网络与信息安全技术研讨会2004论文集[C].北京,2004.106-112.
  • 2Wilpon J, Rabiner L, Lee L, et al. Automatic recognition of keywords in unconstrained speech using hidden Markov models [J]. IEEE Trans on Acoustics, Speech and Signal Processing, 1990, 38(11): 1870- 1878.
  • 3Johnson S E, Jourlin P, Moore G L, et al. The Cambridge University spoken document retrieval system [A]. Proc of the IEEE International Conference on Acoustics, Speech,and Signal Processing [C]. Phoenix: IEEE Press, 1999.49-52.
  • 4Peter S C, Mark C, Michael S M. Phonetic searching vs.LVCSR: How to find what you really want in audio archives[J]. International Journal of Speech Technology, 2002, 5:9-22.
  • 5Young S J, Russel N H, Thornton J H S. Token passing: a simple conceptual model for connected speech recognition systems [EB/OL]. http: ∥svr-www. eng. cam. ac. uk, Jul.1989.
  • 6Leggetter C J, Woodland P C. Maximum likelihood linear regression for speaker adaptation of continuous density HMMs [J]. Computer Speech and Language, 1995, 9:171 - 186.
  • 7ZHAO Qingwei, WANG Zuoying, LU Daji. A study of duration in continuous speech recognition based on DDBHMM [A]. Proc 6th European Conf on Speech Communication and Technology (Eurospeech'99) [C].Budapest, Hungary: ISCA (International Speech Communication Association), 1999. 1511 - 1514.
  • 8Frank W, Ralf S, Klaus M, et al. Confidence measures for large vocabulary continuous speech recognition [J]. IEEE Trans on Speech and Audio Processing, 2001, 9(3):288 - 298.

同被引文献5

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部