期刊文献+

低资源语言的无监督语音关键词检测技术综述 被引量:3

Survey on unsupervised spoken term detection for low-resource languages
原文传递
导出
摘要 目的低资源(low-resource)语言的无监督的关键词检测技术近年来引起了广泛的研究兴趣。低资源语言由于缺乏足够的标注数据及相关的专家知识,使得传统的基于大词汇量语音识别系统的关键词检测技术无法使用。近年来,研究者试图寻找一种无监督的技术来完成针对低资源语言的语音关键词检测。方法首先阐述了该技术目前面临的问题与挑战,然后介绍了该技术使用的主流的基于动态时间规整的算法框架,并从特征表示、模板匹配方法、效率提升等几个重要方面介绍了近几年来主要的研究成果,最后介绍了该任务常用的系统评价标准及目前所能达到的水平,讨论了未来可能的研究方向。结果该任务的研究目前取得了很多成果,但仍处于实验室阶段,多系统融合策略导致系统庞大,而且目前还没有好的进行索引的方法,导致检测时间过长,对于低资源语音的关键词检测技术,还有很多研究工作要做。结论期望通过对目前低资源语言的无监督的关键词检测技术做出一个全面的综述,从而给研究者的工作带来便利。 Objective Query-by-example spoken term detection for low-resource languages has recently drawn considerable research interest. For low-resource languages that lack sufficient annotated data and related expert knowledge, spoken term detection techniques based on traditional large vocabulary speech recognition cannot be directly used. Researchers have re- cently attempted to determine an unsupervised technique to perform this task for low-resource languages. Method In this study, we first present the challenges confronting this task. We then introduce the algorithm framework based on dynamic time warping (DTW) commonly used in this task. We finally present the recent research devoted to feature representation, template matching, speed-up, and other related topics. Result Although the research of this technique on low-resource language has got much progress, there are not real-life applications. Some unified feature representation and indexing method must be proposed to attain both good effectiveness and efficiency. Conclusion We present the commonly used performance evaluation standards. The conclusion of our investigation is presented, and possible future research directions are discussed.
出处 《中国图象图形学报》 CSCD 北大核心 2015年第2期211-218,共8页 Journal of Image and Graphics
基金 国家自然科学基金项目(61175018) 霍英东青年教师基础研究基金项目(131059)
关键词 语音关键词检测 低资源 动态时间规整 spoken term detection low-resource dynamic time warping
  • 相关文献

参考文献32

  • 1Chelba C, Hazen T J, Sarac,lar M. Retrieval and browsing of spoken content[J]. IEEE Signal Process. Mag., 2008, 25(3): 39-49.
  • 2Saraclar M, Sproat R. Lattice-based search for spoken utterance retrieval[C]//Proceedings of the 2004 Conference of the North American Chapter of the Association for Computational Linguistics. North American: Association for Computational Linguistics, 2004:129-136.
  • 3Glass J. Towards unsupervised speech processing[C]//Proceedings of the 11th International Conference on Information Sciences, Signal Processing and their Applications. Montreal, USA:IEEE, 2012: 1-4.
  • 4Metze F, Rajput N, Anguera X, et al. The spoken WEB search task at mediaeval 2011[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan:IEEE, 2012: 5165-5168.
  • 5Metze F, Anguera X, Barnard E, et al. The spoken web search task at mediaEval 2012[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada:IEEE, 2013: 8121-8125.
  • 6Rodriguez-Fuentes L J, Varona A, Penagarikano M, et al. High-performance query-by-example spoken term detection on the SWS 2013 evaluation[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Florence, Italy:IEEE, 2014: 7869-7873.
  • 7Abad A, Rodriguez-Fuentes L J, Pena-garikano M, et al. On the calibration and fusion of heterogeneous spoken term detection systems[C]//Proceedings of the 17th Annual Conference of the International Speech Communication Association. Lyon, France:ISCA, 2013.
  • 8Wang H, Lee T, Leung C C, et al. Using parallel tokenizers with DTW matrix combination for low-resource spoken term detection[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada:IEEE, 2013: 8545-8549.
  • 9Wang H, Leung C C, Lee T, et al. An acoustic segment modeling approach to query-by-example spoken term detection[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan:IEEE, 2012: 5157-5160.
  • 10Müller M. Dynamic time warping [J]. Information Retrieval for Music and Motion, 2007: 69-84.

二级参考文献13

  • 1Saffran J R, Aslin R N, Newport E L. Statistical learning by 8 month old infants [J]. Science, 1996, 74: 1926- 1928.
  • 2Park A, Glass J. Unsupervised pattern discovery in speech [J]. IEEE Transaction on Acoustic, Speech and Language Processing, 2008, 6(1) : 1558 - 1569.
  • 3ZHANG Yaodong, Glass J. Towards multi-speaker unsupervised speech pattern discovery [C]// IEEE International Conference on Acoustic, Speech, and Signal Processing. Piscataway, NJ, USA: IEEE Press, 2010: 4366 - 4369.
  • 4Jansen A, Church K, Hermansky H. Towards spoken term discovery at scale with zero resources [C]// Interspeech. Grenoble, France: ISCA, 2010: 1676- 1679.
  • 5Muscariello A, Gravier G, Bimbot F. Audio keyword extraction by unsupervised word discovery [C]// Interspeech. Grenoble, France: ISCA, 2009:2843-2846.
  • 6Anguera X, Macrae R, Oliver N. Partial sequence matching using an unbounded dynamic time warping algorithm [C]// IEEE International Conference on Acoustic, Speech, and Signal Processing. Piscataway, NJ, USA.- IEEE Press, 2010: 3582-3585.
  • 7Dredze M, Jansen A, Coppersmith G, et al. NLP on spoken documents without ASR [C]// EMNLP. Boston, MA, USA: MIT Press, 2010: 460-470.
  • 8ZHENG Lilei, Leung C, XIE Lei, et al. Acoustic texttiling for story segmentation of spoken documents [C]// IEEE International Conference on Acoustic, Speech, and Signal Processing. Piscataway, NJ, USA: IEEE Press, 2012: 5121 - 5124.
  • 9Aradilla G, Vepa J, Bourlard H. Using posterior-based features in template matching for speech recognition [C]// Interspeech. Grenoble, France= ISCA, 2006= 2570-2573.
  • 10WANG Haipeng, Leung C, Lee T, et al. An acoustic segment modeling approach to query-by-example spoken term detection [C]// IEEE International Conference on Acoustic, Speech, and Signal Processing. Piscataway, NJ, USA: IEEE Press, 2012: 5157-5160.

共引文献1

同被引文献9

引证文献3

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部