基于网格的语音关键词检索算法改进被引量：2

Improved lattice-based speech keyword spotting algorithm

导出

摘要针对多候选汉语音节网格语音关键词检索任务,在Gauss混合模型以及多候选识别算法方面进行了研究改进。首先探讨了Gauss混合模型的不同简化策略并用实验进行了验证,证明了全协方差矩阵在识别性能上的优越性;随后对经典的多候选令牌传递算法做出了针对汉语特点的改进。实验表明这2方面的研究不仅提高了以音节作为输出的语音识别引擎的单候选识别效果,也大幅提高了多候选的识别性能。最后搭建了一个基于多候选网格的语音关键词检索系统,在该系统中验证了上述改进的效果。 An improved lattice-based speech keyword spotting system was developed from the Gaussian mixture model and an improved N-best speech recognition algorithm.First,tests were used to evaluate different simplified structures of Gaussian mixture models.Then,an N-best token passing algorithm was developed from the classic token passing algorithm using some unique pronunciation rules for the Chinese language.These two modifications improve the performance of both the 1-best and N-best speech recognition candidates.Finally,a key word spotting system was developed based on an N-best lattice to show the effectiveness of these improvements.

作者肖熙王竞千

机构地区清华大学电子工程系

出处《清华大学学报（自然科学版）》 EI CAS CSCD 北大核心 2015年第5期508-513,共6页 Journal of Tsinghua University(Science and Technology)

关键词语音关键词检索多候选网格 Gauss混合模型 CUDA 三音子模型 speech keyword spotting multi-candidate lattice Gaussian mixture model compute unified device architecture（CUDA） triphone model

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献11

1Reynolds D A, Rose R C. Robust text-independent speaker identification using Gaussian mixture speaker models [J]. IEEE Trans Speech Audio Process, 1995, 3(1) : 72 - 83.
2Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups [J]. IEEE Signal Process Mag, 2012, 29(6) : 82-97.
3Yu D, Deng L, Seide F, The deep tensor neural network with applications to large vocabulary speech recognition [J]. IEEE Trans Audio Speech Lang Process, 2013, 21(2): 388 - 396.
4Pang Z, Tu S, Su D, et al. Discriminative training of GMM-HMM acoustic model by RPCL learning [J]. Front Electr Electron Eng China, 2011, 6(2) : 283 - 290.
5Povey D, Burger L, Agarwal M, et al. The subspace Gaussian mixture model: A structured model for speech recognition [J]. Comput Speech Lang, 2011, 25(2): 404- 439.
6Du J, Hu Y, Jiang H. Boosted mixture learning of gaussian mixture hidden markov models based on maximum likelihood for speech recognition [J]. IEEE Trans Audio Speech Lang Process, 2011, 19(7)I 2091-2100.
7Veiga A, Lopes C, Sd L, et al. Acoustic similarity scores for keyword spotting [J]. Computational Processing of the Portuguese Language, 2014, 8775: 48-58.
8Thambiratnam K, Sridharan S. Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting [C]// Proc ICASSP. Philadelphia, PA, USA: IEEE Press, 2005: 465-468.
9罗骏,欧智坚,王作英.基于拼音图的两阶段关键词检索系统[J].清华大学学报（自然科学版）,2005,45(10):1356-1359. 被引量：1
10Young S J, Russell N H, Thornton J H S. Token Passing, A Simple Conceptual Model for Connected Speech Recognition Systems, CUED/F-INFENG/TR, 38 [R]. Cambridge, UK: University of Cambridge, 1989.

二级参考文献8

1欧智坚罗骏谢达东.多功能语音/音频信息检索系统的研究与实现[A]..全国网络与信息安全技术研讨会2004论文集[C].北京,2004.106-112.
2Wilpon J, Rabiner L, Lee L, et al. Automatic recognition of keywords in unconstrained speech using hidden Markov models [J]. IEEE Trans on Acoustics, Speech and Signal Processing, 1990, 38(11): 1870- 1878.
3Johnson S E, Jourlin P, Moore G L, et al. The Cambridge University spoken document retrieval system [A]. Proc of the IEEE International Conference on Acoustics, Speech,and Signal Processing [C]. Phoenix: IEEE Press, 1999.49-52.
4Peter S C, Mark C, Michael S M. Phonetic searching vs.LVCSR: How to find what you really want in audio archives[J]. International Journal of Speech Technology, 2002, 5:9-22.
5Young S J, Russel N H, Thornton J H S. Token passing: a simple conceptual model for connected speech recognition systems [EB/OL]. http: ∥svr-www. eng. cam. ac. uk, Jul.1989.
6Leggetter C J, Woodland P C. Maximum likelihood linear regression for speaker adaptation of continuous density HMMs [J]. Computer Speech and Language, 1995, 9:171 - 186.
7ZHAO Qingwei, WANG Zuoying, LU Daji. A study of duration in continuous speech recognition based on DDBHMM [A]. Proc 6th European Conf on Speech Communication and Technology (Eurospeech'99) [C].Budapest, Hungary: ISCA (International Speech Communication Association), 1999. 1511 - 1514.
8Frank W, Ralf S, Klaus M, et al. Confidence measures for large vocabulary continuous speech recognition [J]. IEEE Trans on Speech and Audio Processing, 2001, 9(3):288 - 298.

同被引文献5

1张雪源,贺前华,李艳雄,叶婉玲.一种基于倒排索引的音频检索方法[J].电子与信息学报,2012,34(11):2561-2567. 被引量：8
2李晖,孙文海,李凤华,王博洋.公共云存储服务数据安全及隐私保护技术综述[J].计算机研究与发展,2014,51(7):1397-1409. 被引量：93
3王秋生,孙圣和.一种在数字音频信号中嵌入水印的新算法[J].声学学报,2001,26(5):464-467. 被引量：58
4张秋余,胡文进,乔思斌,张涛.基于LP-MMSE的高效语音感知哈希认证算法[J].华中科技大学学报（自然科学版）,2016,44(12):127-132. 被引量：1
5查正军,郑晓菊.多媒体信息检索中的查询与反馈技术[J].计算机研究与发展,2017,54(6):1267-1280. 被引量：16

引证文献2

1孙甲松,张菁芸,杨毅.基于子带频谱质心特征的高效音频指纹检索[J].清华大学学报（自然科学版）,2017,57(4):382-387. 被引量：5
2胡颖杰,张秋余,李昱州.基于声母和深度哈希的密文语音全文检索方法[J].华中科技大学学报（自然科学版）,2021,49(12):83-88. 被引量：2

二级引证文献6

1周金傲,龙华.基于音频特征参数的多语种分类算法[J].通信技术,2018,51(10):2350-2355. 被引量：3
2刘红梅.基于音频指纹技术的乐曲节拍识别系统[J].微型电脑应用,2021,37(7):137-139. 被引量：1
3唐月梅.基于计算机辅助技术的电子音乐检测方法[J].信息与电脑,2022,34(6):8-10.
4陈树丽,张学帅,张鹏远,刘建.静音掩蔽和频域分段的音频指纹检索算法[J].声学学报,2022,47(4):531-540. 被引量：1
5齐梅,刘则芬,樊浩,李升.采用语义一致性编码网络的跨模态语音关键词检索[J].宜宾学院学报,2022,22(12):6-13. 被引量：2
6黄羿博,王宁,张秋余.基于卢氏特征安全模板的语音生物哈希检索算法[J].华中科技大学学报（自然科学版）,2023,51(11):60-66.

1张建宁,孙立峰,钟玉琢.基于最优化分类的视频镜头谱聚类算法[J].清华大学学报（自然科学版）,2007,47(10):1700-1703. 被引量：2
2谢振斌.汉语特点与拼音文字[J].中文信息,1993(3):34-35.
3刘玉宇,吴及,王作英.汉语三音子模型观测概率比较[J].中文信息学报,2003,17(3):47-52. 被引量：2
4张翼,董宝田.基于PKI的简化安全策略研究[J].铁路计算机应用,2007,16(7):6-8.
5印勇,田逢春,等.从关系数据库中挖掘规则的粗集方法[J].计算机测量与控制,2002,10(11):759-761. 被引量：1
6万成凯,袁保宗,苗振江.一种基于活动轮廓和Gauss背景模型的固定摄像机运动目标分割算法[J].中国科学（F辑:信息科学）,2009,39(4):391-396. 被引量：6
7信息处理技术[J].中国学术期刊文摘,2008,14(2):143-149.
8杨宏宇,唐瑞文.基于电量消耗的Android平台恶意软件检测[J].清华大学学报（自然科学版）,2017,57(1):44-49. 被引量：4
9罗骏,欧智坚.一种高效的语音关键词检索系统[J].通信学报,2006,27(2):113-118. 被引量：9
10张力文,努尔麦麦提.尤鲁瓦斯,吾守尔.斯拉木.维吾尔语语音检索技术研究[J].中文信息学报,2014,28(5):182-186. 被引量：3

清华大学学报（自然科学版）

2015年第5期

浏览历史

内容加载中请稍等...

基于网格的语音关键词检索算法改进被引量：2

参考文献11

二级参考文献8

同被引文献5

引证文献2

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

基于网格的语音关键词检索算法改进 被引量：2

参考文献11

二级参考文献8

同被引文献5

引证文献2

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

基于网格的语音关键词检索算法改进被引量：2