摘要
关键词检测旨在从语音中检测出待识别的关键词,深度神经网络为小型关键词检测任务提供了有效的解决方案。大多数现有关键词检测方法采用Softmax最小化交叉熵损失函数,假设测试和训练样本来自相同分布,侧重于在训练集上最大化分类精度,而未考虑训练集外的未知语音。若训练数据有限,关键词检测系统在遇到未知语音时,实现鲁棒性和高准确率仍比较困难。该文研究了开放集学习方法,结合深度特征编码器和基于卷积原型学习、互斥点学习的分类器,用于开放集关键词检测任务。该文提出的关键词检测方法不仅提高了关键词的分类精度,而且具有较好的非关键词检测性能。在Google Speech Commands数据集V0.01和V0.02,以及由Libri Seechp衍生的Libri Words数据集上的试验结果表明:该文提出的关键词检测方法在大多数评估指标上优于基线方法。
[Objective] Keyword spotting(KWS) aims to detect recognizable keywords from speech.Deep neural networks have provided effective solutions for KWS in small-scale applications.However,most KWS methods employ Softmax-based cross-entropy loss,assuming that the test and training samples have identical distributions.These methods focus on maximizing the classification accuracy of the training set,often neglecting unknown speech data outside the training samples.This approach can lead to significant challenges in real-world scenarios where limited training data is available and individuals frequently encounter unfamiliar speech.[Methods] This paper introduces a approach to KWS by exploring open-set learning methods that can accommodate the open vocabulary of KWS tasks.These methods combine deep feature encoders with classifiers based on convolutional prototype learning and reciprocal point learning.For convolutional prototype learning,this paper first replaces the Softmax network with the prototype network to eliminate the closed-world assumption.Subsequently,constructs prototypes for each keyword that represent class-level features in the feature space.This paper uses a distance-based method to represent the similarity between the sample and the keyword for classification,maximizing the likelihood probability of the sample.To effectively reject non-keywords,this paper applies a regularization constraint on the boundary of the prototypes,which improves the robustness of the system.For reciprocal point learning,this paper constructs reciprocal points that represent features not associated with the keyword class.This paper assumes that the probability of a sample belonging to a keyword is proportional to the distance between this point and the reciprocal point,and uses this as a classification criterion.To detect non-keywords,this paper restricts the boundary range of reciprocal points.In addition,this paper explores variants of reciprocal point learning,such as adversarial reciprocal point learning,which uses a more effective distance function and an adequate boundary constraint to further improve system performance.The backbone network used for training the small-footprint KWS systems is ResNet 15.The KWS system developed from these methods not only enhances the classification accuracy but also improves the detection of non-keyword categories.This paper employs classification accuracy(ACC),macro-averaged F_(1) score,and area under the receiver operating characteristic curve(AUC) to measure the performance of the proposed methods.[Results] This paper conducted experiments on Google Speech Command(GSC) datasets V0.01 and V0.02,as well as the LibriWords dataset derived from LibriSpeech,to evaluate the performance of the proposed method.The results showed that the proposed method outperforms the baseline approaches in most evaluation metrics.The proposed method,which was grounded on reciprocal point learning,achieved the best performance in terms of classification ACC.In addition,methods based on generalized convolution prototype learning and adversarial reciprocal point learning equaled or even surpassed the performance of the baseline methods.When detecting non-keywords,the method based on adversarial reciprocal point learning exhibited the best performance on the GSC dataset.As the number of non-keywords in the LibriWords dataset increases,the method employing generalized convolutional prototype loss achieved optimal detection performance.[Conclusions] By introducing generalized convolution prototype learning and reciprocal point learning,this paper significantly improves the performance of the KWS system in open scenarios.The experimental results show that the proposed method significantly outperforms existing approaches on small-footprint systems with limited training data.
作者
黄子峻
张晓雷
HUANG Zijun;ZHANG Xiaolei(School of Marine Science and Technology,Northwestern Polytechnical University,Xi'an 710072,China;Shenzhen Research Institute,Northwestern Polytechnical University,Shenzhen 518057,China)
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2024年第11期1927-1935,共9页
Journal of Tsinghua University(Science and Technology)
基金
国家自然科学基金面上项目(62176211)
深圳市科创委国际合作研究项目(GJHZ20240218114401004)。
关键词
有限训练数据
关键词检测
开放集识别
原型学习
limited training data
keywork spotting
open set recognition
prototype learning