期刊文献+

基于WGAN的音频关键词识别研究 被引量:2

Research on Audio Keywords Recognition Based on Wasserstein Generative Adversarial Network
下载PDF
导出
摘要 基于语音识别的关键词识别方法增大了关键词识别工作量,降低了识别效率,还使得识别准确率受语音识别和文字查找办法影响,并对无文字语言不适用。针对此问题,提出将Wasserstein生成式对抗网络(WGAN)应用于语音关键词识别中,利用生成器输出的生成序列分析语音中有无关键词。为了获取语音中关键词的位置信息,该文为WGAN网络定义了一个定位损失函数,以此保证生成的掩码序列可以精确定位出关键词的位置。在四川话、普通话和粤语三门语言的数据集上进行实验,结果表明该技术可以识别无文字语言的关键词,相比于模板匹配方法其识别速度有显著提升。 The keyword recognition method based on speech recognition increases the workload of keyword recognition,reduces the recognition efficiency and makes the accuracy affected by speech recognition and text search methods,which is not applicable to language without words.To solve this problem,the Wasserstein generative adversarial network(WGAN)is applied to speech keyword recognition,and the generated sequence output by generator is used to analyze whether there are keywords in speech.In order to obtain the position information of the keywords in speech,we define a positioning loss function for the WGAN to ensure that the generated mask sequence can accurately locate the position of the keywords.Results on datasets of three languages,Sichuan dialect,Mandarin and Cantonese,show that the proposed method can recognize keywords in languages without characters,and the recognition speed is significantly improved compared with the template matching method.
作者 李全兵 文钊 田艳梅 詹茂豪 余秦勇 杨辉 LI Quan-bing;WEN Zhao;TIAN Yan-mei;ZHAN Mao-hao;YU Qin-yong;YANG Hui(China Electronic Technology Cyber Security Co.,Ltd.,Chengdu 610041,China;Big Data Application on Improving Government Governance Capabilities National Engineering Laboratory,Guiyang 550022,China;CETC Big Data Research Institute Co.,Ltd.,Guiyang 550022,China;School of Information and Software Engineering,University of Electronic Science and Technology of China,Chengdu 610054,China)
出处 《计算机技术与发展》 2021年第8期26-32,共7页 Computer Technology and Development
基金 四川省重大科技专项项目(2017GZDZX0002)。
关键词 语音识别 音频关键词识别 深度学习 Wasserstein生成式对抗网络 关键词定位 speech recognition audio spoken keyword detection deep learning Wasserstein generative adversarial network(WGAN) keyword targeting
  • 相关文献

参考文献4

二级参考文献32

  • 1陈国良,韩文廷.人工神经网络理论研究进展[J].电子学报,1996,24(2):70-75. 被引量:20
  • 2孙宁,孙劲光,孙宇.基于神经网络的语音识别技术研究[J].计算机与数字工程,2006,34(3):58-61. 被引量:9
  • 3屈丹,王波.VolP语音处理与识别[M].北京:国防工业出版社,2010:5-6.
  • 4KESHET J, GRANGIER D, BENGIO S. Discriminative keyword spotting[ EB/OL]. [ 2013- 03- 20]. http://eprints, pascal-network. org/archive/OOOO3299/O2/KeshetGrBe07, pdf.
  • 5ROSE R C, PAUL D B. A hidden Markov model based keyword recognition system [ C]/! ICASSP'90: Proceedings of the 1990 In- ternational Conference on Acoustics, Speech, and Signal Process- ing. Albuquerque: Albuquerque Convention Center, 1990:129 - 132.
  • 6BARAKAT M S, RITZ C H, STIRLING D A. Keyword spotting based on the analysis of template matching distances [ C]//ICSPCS 2011: Proceedings of the 5th International Conference on Signal Pro- cessing and Communication Systems. New York: 1EEE Communica- tions Society, 2011, 1 -6.
  • 7SAKOE H, CHIBA S. Dynamic programming algorithm optimization for spoken word recognition [ J] IEEE Transactions on Acoustics Speech and Signal Processing, 1978, 26(1): 43 -49.
  • 8BARAKAT M S, RITZ C H, STIRLING D A . Detecting offensive user video blogs: an adaptive keyword spotting approach [ C]// ICALIP 2012: Proceedings of the 2012 International Conference on Audio, Language and Image Processing. Washington, DC: IEEE Computer Society, 2012, 419 - 425.
  • 9GAROFOLO J S, LAMEL L F. TIMIT acoustic-phonetic continuous speech corpus, 2013[ EB/OL]. [ 2013-03-21]. http://www, ldc. up- enn. edu/Catalog.
  • 10POWERS D M W. Evaluation: from precision, recall and F-factor to ROC, informedness, markedness & correlation[ J]. Journal of Ma- chine Learning Technologies, 2011, 2(1): 37-63.

共引文献156

同被引文献16

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部