期刊文献+

基于分段动态时间规整的语音样例快速检索 被引量:5

Fast Query-by-Example Spoken Term Detection Using Segmental Dynamic Time Warping
下载PDF
导出
摘要 提出了一种融合下界估计和分段动态时间规整的语音样例快速检索方法。该方法针对缺乏合适的训练数据等语音资源较为有限的语言进行快速检索所设计。此方法首先提取查询样例和测试集的音素后验概率;然后,根据限制条件在测试语句中选定候选分段,并计算查询样例和每个候选分段之间实际动态时间规整得分的下界估计,再运用K最近邻搜索算法搜索与查询样例相似度最高的分段;最后,使用虚拟相关反馈技术对检索结果进行修正。实验结果表明:尽管此方法的检索精度略低于直接运用动态时间规整进行检索的检索精度,但其检索速度优于后者,且检索结果经过虚拟相关反馈技术修正后,其检索精度也得到有效提升。 A method for query-by-example spoken term detection(QbE STD) using segmental dynamic time warping(SDTW) and lower-bound estimate(LBE) is presented. The approach is designed for tow-resource situations in which limited or no in-domain training material is avail- able. According to this method, the phone posterior probabilities of query examples and test materials should be got firstly, and then the candidate segments are selected in test materials and LBE of actual DTW scores are computed between the query example and all candidate seg- ments in test materials quickly. The K nearest neighbor (KNN) search algorithm is chosen to search for the segments that have maximal similarity. Finally, the retrieval results can be modified by pseudo relevance feedback(PRF). The experimental result indicates that although there is a slight degradation in retrieval precision when compared with formulating DTW proce- dure directly, the retrieval speed of the method presented in the paper is higher than the latter, and the retrieval precision can be enhanced availably after the retrieval results modified by PRF.
出处 《数据采集与处理》 CSCD 北大核心 2014年第2期265-273,共9页 Journal of Data Acquisition and Processing
基金 国家自然科学基金(61175017)资助项目
关键词 语音样例检索 音素后验概率 分段动态时间规整 下界估计 虚拟相关反馈 query-by-example spoken term detection phone posterior probability segmental dynamic time warping lower-bound estimate pseudo relevance feedback
  • 相关文献

参考文献15

  • 1Shen W, White C M, Hazen T J. A comparison of query-by example methods for spoken term detection [C]//Conference of the International Speech Com- munication Association 2009. Brighton, United Kingdom : [s. n. ], 2009 : 2143-2146.
  • 2Chelha C, Hazen T J, Saraclar M. Retrieval and browsing of spoken content[J]. IEEE Signal Process- ing Magazine, 2008, 3(25): 39-49.
  • 3Tzanetakis G, Ermolinsky A, Cook P. Pitch histo- grams in audio and symbolic music information re- trieval[J]. Journal of New Music Research, 2003,2 (32) :143-152.
  • 4Saraclar M, Sproat R W. Lattice-based search for spoken utterance retrieval [C]//Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computa- tional Linguistics. Boston, America.. [s. n. ], 2004: 129 -136.
  • 5Miller D, Kleber M, Kimball O, et al. Rapid and ac- curate spoken term detection[C]//Conference on the International Speech Communication Association. Antwerp, Belgium..[s. n. ], 2007:314-317.
  • 6Ng K. Subword-based approaches for spoken docu- ment retrieval[D]. Massachusetts Institute of Tech- nology, 2000 :53-69.
  • 7Yu Peng, Chen Kaijiang, Ma Chengyuan, et al. Vo- cabulary-independent indexing of spontaneous speech [J]. IEEE Trans on Speech Audio Processing, 2005, 5(13) : 635-643.
  • 8Hazen T J, Shen W, White C. Query-by-example spo ken term detection using phonetic posteriorgram templates[C]//Automatic Speech Recognition and Understanding. Merano/Meran, Italy: [ s. n. ], 2009:421-426.
  • 9Tejedor J, Sz6ke I, Fapgo M. Novel methods for query selection and query combination inquery-hy-ex- ample spoken term detection[C]//SSCS 2010. Palaz- zo Vecchio.. [s. n. ],2010:15-20.
  • 10Chan Chunan, Lee Linshan. Unsupervised spoken term detection with spoken queries using segment- based dynamic time warping[C]//Interspeech 2010. Chiba, Japan:[s. n. ],2010: 2141-2144.

同被引文献45

  • 1李瑞峰,曹雏清,王丽.基于深度图像和表观特征的手势识别[J].华中科技大学学报(自然科学版),2011,39(S2):88-91. 被引量:10
  • 2毛雁明,章立亮.基于Kinect深度信息的手势分割与识别[J].系统仿真学报,2015,27(4):830-835. 被引量:10
  • 3徐君,李莉.基于马尔可夫矩阵模型的企业集群状态预测[J].辽宁工程技术大学学报(自然科学版),2006,25(B06):16-18. 被引量:5
  • 4韦雪芳,王冬梅,刘思,周鹏.信号肽及其在蛋白质表达中的应用[J].生物技术通报,2006,22(6):38-42. 被引量:64
  • 5NIPS. Deep learning and unsupervised feature learning[ EB/OL]. [ 2012-10-12 ]. Http://nips. cc/conferences/2012/.
  • 6Aren Jansen, Emmanuel Dupoux, Sharon Goldwater. A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition [ C ]// ICASSP 2013. 2013 : 8111-8115.
  • 7Park A S, Glass J R. Unsupervised pattern discovery in speech[ J]. IEEE Transactions on Audio, Speech and Language Pro- cessing,2008, 16( 1 ) : 186-197.
  • 8Aradilla G, Vepa J, Bourlad H. Using posterior-based features in template matching for speech reco-nition[ C ]//Interspeech 2006. 2006 : 1186-1189.
  • 9Hazen J, Shen W, White C. Query-by-example spoken term detection using phonetic postefiorgram templates[ C ]//Automatic Speech Recognition and Understanding 2009. 2009:421-426.
  • 10Zhang Yaodong, Glass J. Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams [ C ]//Auto- matic Speech Recognition and Understanding 2009. 2009:398-403.

引证文献5

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部