期刊文献+

基于拼音图的两阶段关键词检索系统 被引量:1

Two-stage keyword spotting system based on syllable graphs
原文传递
导出
摘要 针对当前关键词检索系统中单阶段系统检索速度慢,基于大词汇量连续语音识别(LVCSR)的两阶段系统又不够稳健的现状,提出一种新的基于拼音图的两阶段检索系统以满足快速、稳健检索的需要。两阶段分为预处理阶段和检索阶段。预处理阶段将语音数据识别成具有高覆盖率的拼音图。检索阶段响应用户的频繁查询,在拼音图中查找出与关键词拼音匹配的拼音串,并采用基于N元拼音文法的前后向算法计算置信度以实现对检索结果的筛选。实验表明:系统的二字词召回率及正确率可达72.19%和72.68%,三字词召回率及正确率可达73.51%和82.98%,均优于LVCSR系统,且检索阶段仅需0.01倍实时,具有良好的实用价值。 One-stage keyword spotting systems are time consuming, while two-stage systems based on large vocabulary continuous speech recognition (LVCSR) are instable. This paper introduces a two-stage keyword spotting system based on syllable graphs for fast and stable information retrieval from speech data. The system includes preprocessing and searching. In the preprocessing stage, the audio data is recognized into the syllable graph with high accuracy syllable candidates. In the search stage, searches for the matched keyword are only performed in the graph for likely syllable strings to answer frequent users queries. A forward-backward algorithm based on syllable N-grammar model is used to calculate confidence measures for further filtering of the search result. Test results show that the system achieves 72.19% recall rate and 72.68% accuracy with 2-syllable words and 73.51% recall rate and 82.98% accuracy with 3-syllable words, which outperforms the LVCSR system. The search stage uses only 1% of the real time, which is needed on practical applications.
出处 《清华大学学报(自然科学版)》 EI CAS CSCD 北大核心 2005年第10期1356-1359,共4页 Journal of Tsinghua University(Science and Technology)
基金 国家网络与信息安全保障持续发展计划(917专项)资助
关键词 信息检索 关键词检索 拼音图 置信度 information retrieval keyword spotting syllable graph confidence measure
  • 相关文献

参考文献8

  • 1欧智坚 罗骏 谢达东.多功能语音/音频信息检索系统的研究与实现[A]..全国网络与信息安全技术研讨会2004论文集[C].北京,2004.106-112.
  • 2Wilpon J, Rabiner L, Lee L, et al. Automatic recognition of keywords in unconstrained speech using hidden Markov models [J]. IEEE Trans on Acoustics, Speech and Signal Processing, 1990, 38(11): 1870- 1878.
  • 3Johnson S E, Jourlin P, Moore G L, et al. The Cambridge University spoken document retrieval system [A]. Proc of the IEEE International Conference on Acoustics, Speech,and Signal Processing [C]. Phoenix: IEEE Press, 1999.49-52.
  • 4Peter S C, Mark C, Michael S M. Phonetic searching vs.LVCSR: How to find what you really want in audio archives[J]. International Journal of Speech Technology, 2002, 5:9-22.
  • 5Young S J, Russel N H, Thornton J H S. Token passing: a simple conceptual model for connected speech recognition systems [EB/OL]. http: ∥svr-www. eng. cam. ac. uk, Jul.1989.
  • 6Leggetter C J, Woodland P C. Maximum likelihood linear regression for speaker adaptation of continuous density HMMs [J]. Computer Speech and Language, 1995, 9:171 - 186.
  • 7ZHAO Qingwei, WANG Zuoying, LU Daji. A study of duration in continuous speech recognition based on DDBHMM [A]. Proc 6th European Conf on Speech Communication and Technology (Eurospeech'99) [C].Budapest, Hungary: ISCA (International Speech Communication Association), 1999. 1511 - 1514.
  • 8Frank W, Ralf S, Klaus M, et al. Confidence measures for large vocabulary continuous speech recognition [J]. IEEE Trans on Speech and Audio Processing, 2001, 9(3):288 - 298.

共引文献1

同被引文献10

  • 1Reynolds D A, Rose R C. Robust text-independent speaker identification using Gaussian mixture speaker models [J]. IEEE Trans Speech Audio Process, 1995, 3(1) : 72 - 83.
  • 2Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups [J]. IEEE Signal Process Mag, 2012, 29(6) : 82-97.
  • 3Yu D, Deng L, Seide F, The deep tensor neural network with applications to large vocabulary speech recognition [J]. IEEE Trans Audio Speech Lang Process, 2013, 21(2): 388 - 396.
  • 4Pang Z, Tu S, Su D, et al. Discriminative training of GMM-HMM acoustic model by RPCL learning [J]. Front Electr Electron Eng China, 2011, 6(2) : 283 - 290.
  • 5Povey D, Burger L, Agarwal M, et al. The subspace Gaussian mixture model: A structured model for speech recognition [J]. Comput Speech Lang, 2011, 25(2): 404- 439.
  • 6Du J, Hu Y, Jiang H. Boosted mixture learning of gaussian mixture hidden markov models based on maximum likelihood for speech recognition [J]. IEEE Trans Audio Speech Lang Process, 2011, 19(7)I 2091-2100.
  • 7Veiga A, Lopes C, Sd L, et al. Acoustic similarity scores for keyword spotting [J]. Computational Processing of the Portuguese Language, 2014, 8775: 48-58.
  • 8Thambiratnam K, Sridharan S. Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting [C]// Proc ICASSP. Philadelphia, PA, USA: IEEE Press, 2005: 465-468.
  • 9Young S J, Russell N H, Thornton J H S. Token Passing, A Simple Conceptual Model for Connected Speech Recognition Systems, CUED/F-INFENG/TR, 38 [R]. Cambridge, UK: University of Cambridge, 1989.
  • 10李春,王作英.基于语音学分类的三音子识别单元的研究[C]//第六届全国人机语音通讯学术会议论文集.深圳:中国中文信息学会,2001:257-262.

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部