摘要
基于语音识别的关键词识别方法增大了关键词识别工作量,降低了识别效率,还使得识别准确率受语音识别和文字查找办法影响,并对无文字语言不适用。针对此问题,提出将Wasserstein生成式对抗网络(WGAN)应用于语音关键词识别中,利用生成器输出的生成序列分析语音中有无关键词。为了获取语音中关键词的位置信息,该文为WGAN网络定义了一个定位损失函数,以此保证生成的掩码序列可以精确定位出关键词的位置。在四川话、普通话和粤语三门语言的数据集上进行实验,结果表明该技术可以识别无文字语言的关键词,相比于模板匹配方法其识别速度有显著提升。
The keyword recognition method based on speech recognition increases the workload of keyword recognition,reduces the recognition efficiency and makes the accuracy affected by speech recognition and text search methods,which is not applicable to language without words.To solve this problem,the Wasserstein generative adversarial network(WGAN)is applied to speech keyword recognition,and the generated sequence output by generator is used to analyze whether there are keywords in speech.In order to obtain the position information of the keywords in speech,we define a positioning loss function for the WGAN to ensure that the generated mask sequence can accurately locate the position of the keywords.Results on datasets of three languages,Sichuan dialect,Mandarin and Cantonese,show that the proposed method can recognize keywords in languages without characters,and the recognition speed is significantly improved compared with the template matching method.
作者
李全兵
文钊
田艳梅
詹茂豪
余秦勇
杨辉
LI Quan-bing;WEN Zhao;TIAN Yan-mei;ZHAN Mao-hao;YU Qin-yong;YANG Hui(China Electronic Technology Cyber Security Co.,Ltd.,Chengdu 610041,China;Big Data Application on Improving Government Governance Capabilities National Engineering Laboratory,Guiyang 550022,China;CETC Big Data Research Institute Co.,Ltd.,Guiyang 550022,China;School of Information and Software Engineering,University of Electronic Science and Technology of China,Chengdu 610054,China)
出处
《计算机技术与发展》
2021年第8期26-32,共7页
Computer Technology and Development
基金
四川省重大科技专项项目(2017GZDZX0002)。