基于拼音图的两阶段关键词检索系统被引量：1

Two-stage keyword spotting system based on syllable graphs

导出

摘要针对当前关键词检索系统中单阶段系统检索速度慢,基于大词汇量连续语音识别(LVCSR)的两阶段系统又不够稳健的现状,提出一种新的基于拼音图的两阶段检索系统以满足快速、稳健检索的需要。两阶段分为预处理阶段和检索阶段。预处理阶段将语音数据识别成具有高覆盖率的拼音图。检索阶段响应用户的频繁查询,在拼音图中查找出与关键词拼音匹配的拼音串,并采用基于N元拼音文法的前后向算法计算置信度以实现对检索结果的筛选。实验表明:系统的二字词召回率及正确率可达72.19%和72.68%,三字词召回率及正确率可达73.51%和82.98%,均优于LVCSR系统,且检索阶段仅需0.01倍实时,具有良好的实用价值。 One-stage keyword spotting systems are time consuming, while two-stage systems based on large vocabulary continuous speech recognition （LVCSR） are instable. This paper introduces a two-stage keyword spotting system based on syllable graphs for fast and stable information retrieval from speech data. The system includes preprocessing and searching. In the preprocessing stage, the audio data is recognized into the syllable graph with high accuracy syllable candidates. In the search stage, searches for the matched keyword are only performed in the graph for likely syllable strings to answer frequent users queries. A forward-backward algorithm based on syllable N-grammar model is used to calculate confidence measures for further filtering of the search result. Test results show that the system achieves 72.19% recall rate and 72.68% accuracy with 2-syllable words and 73.51% recall rate and 82.98% accuracy with 3-syllable words, which outperforms the LVCSR system. The search stage uses only 1% of the real time, which is needed on practical applications.

作者罗骏欧智坚王作英

机构地区清华大学电子工程系

出处《清华大学学报（自然科学版）》 EI CAS CSCD 北大核心 2005年第10期1356-1359,共4页 Journal of Tsinghua University(Science and Technology)

基金国家网络与信息安全保障持续发展计划(917专项)资助

关键词信息检索关键词检索拼音图置信度 information retrieval keyword spotting syllable graph confidence measure

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献8

1欧智坚罗骏谢达东.多功能语音/音频信息检索系统的研究与实现[A]..全国网络与信息安全技术研讨会2004论文集[C].北京,2004.106-112.
2Wilpon J, Rabiner L, Lee L, et al. Automatic recognition of keywords in unconstrained speech using hidden Markov models [J]. IEEE Trans on Acoustics, Speech and Signal Processing, 1990, 38(11): 1870- 1878.
3Johnson S E, Jourlin P, Moore G L, et al. The Cambridge University spoken document retrieval system [A]. Proc of the IEEE International Conference on Acoustics, Speech,and Signal Processing [C]. Phoenix: IEEE Press, 1999.49-52.
4Peter S C, Mark C, Michael S M. Phonetic searching vs.LVCSR: How to find what you really want in audio archives[J]. International Journal of Speech Technology, 2002, 5:9-22.
5Young S J, Russel N H, Thornton J H S. Token passing: a simple conceptual model for connected speech recognition systems [EB/OL]. http: ∥svr-www. eng. cam. ac. uk, Jul.1989.
6Leggetter C J, Woodland P C. Maximum likelihood linear regression for speaker adaptation of continuous density HMMs [J]. Computer Speech and Language, 1995, 9:171 - 186.
7ZHAO Qingwei, WANG Zuoying, LU Daji. A study of duration in continuous speech recognition based on DDBHMM [A]. Proc 6th European Conf on Speech Communication and Technology (Eurospeech'99) [C].Budapest, Hungary: ISCA (International Speech Communication Association), 1999. 1511 - 1514.
8Frank W, Ralf S, Klaus M, et al. Confidence measures for large vocabulary continuous speech recognition [J]. IEEE Trans on Speech and Audio Processing, 2001, 9(3):288 - 298.

共引文献1

1罗骏,欧智坚.一种高效的语音关键词检索系统[J].通信学报,2006,27(2):113-118. 被引量：9

同被引文献10

1Reynolds D A, Rose R C. Robust text-independent speaker identification using Gaussian mixture speaker models [J]. IEEE Trans Speech Audio Process, 1995, 3(1) : 72 - 83.
2Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups [J]. IEEE Signal Process Mag, 2012, 29(6) : 82-97.
3Yu D, Deng L, Seide F, The deep tensor neural network with applications to large vocabulary speech recognition [J]. IEEE Trans Audio Speech Lang Process, 2013, 21(2): 388 - 396.
4Pang Z, Tu S, Su D, et al. Discriminative training of GMM-HMM acoustic model by RPCL learning [J]. Front Electr Electron Eng China, 2011, 6(2) : 283 - 290.
5Povey D, Burger L, Agarwal M, et al. The subspace Gaussian mixture model: A structured model for speech recognition [J]. Comput Speech Lang, 2011, 25(2): 404- 439.
6Du J, Hu Y, Jiang H. Boosted mixture learning of gaussian mixture hidden markov models based on maximum likelihood for speech recognition [J]. IEEE Trans Audio Speech Lang Process, 2011, 19(7)I 2091-2100.
7Veiga A, Lopes C, Sd L, et al. Acoustic similarity scores for keyword spotting [J]. Computational Processing of the Portuguese Language, 2014, 8775: 48-58.
8Thambiratnam K, Sridharan S. Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting [C]// Proc ICASSP. Philadelphia, PA, USA: IEEE Press, 2005: 465-468.
9Young S J, Russell N H, Thornton J H S. Token Passing, A Simple Conceptual Model for Connected Speech Recognition Systems, CUED/F-INFENG/TR, 38 [R]. Cambridge, UK: University of Cambridge, 1989.
10李春,王作英.基于语音学分类的三音子识别单元的研究[C]//第六届全国人机语音通讯学术会议论文集.深圳:中国中文信息学会,2001:257-262.

引证文献1

1肖熙,王竞千.基于网格的语音关键词检索算法改进[J].清华大学学报（自然科学版）,2015,55(5):508-513. 被引量：2

二级引证文献2

1孙甲松,张菁芸,杨毅.基于子带频谱质心特征的高效音频指纹检索[J].清华大学学报（自然科学版）,2017,57(4):382-387. 被引量：5
2胡颖杰,张秋余,李昱州.基于声母和深度哈希的密文语音全文检索方法[J].华中科技大学学报（自然科学版）,2021,49(12):83-88. 被引量：2

1罗骏,欧智坚.一种高效的语音关键词检索系统[J].通信学报,2006,27(2):113-118. 被引量：9
2孙健,王作英.基于DDBHMM的LVCSR系统的单步搜索算法[J].清华大学学报（自然科学版）,2006,46(10):1735-1738.
3一江春水.快速恢复系统词库中的字词[J].电脑迷,2009(8):67-67.
4飞龙,高光来,闫学亮,王炜华.基于分割识别的蒙古语语音关键词检测方法的研究[J].计算机科学,2013,40(9):208-211. 被引量：2
5倪崇嘉,刘文举,徐波.汉语大词汇量连续语音识别系统研究进展[J].中文信息学报,2009,23(1):112-123. 被引量：39
6孙健,王作英.集成语种辨识的中英文LVCSR系统[J].计算机工程与设计,2007,28(8):1931-1933.
7大灰熊.丰富多采软件下载[J].计算机与网络,2005,31(23):19-19.
8陈玉坤.提高汉字输入法“智能ABC”的“智力”[J].微型机与应用,2000,19(9):54-55.
9白丽.针对服务器操作系统进行安全加固新品面市[J].中国信息化,2006(22):93-93.
10浪潮SSR系统上市[J].世界电信,2006,19(11):79-80.

清华大学学报（自然科学版）

2005年第10期

浏览历史

内容加载中请稍等...

基于拼音图的两阶段关键词检索系统被引量：1

参考文献8

共引文献1

同被引文献10

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于拼音图的两阶段关键词检索系统 被引量：1

参考文献8

共引文献1

同被引文献10

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于拼音图的两阶段关键词检索系统被引量：1