基于有限训练数据和开放集学习的鲁棒小型关键词检测系统

Open-set learning for a robust small-footprint keyword spotting system with limited training data

导出

摘要关键词检测旨在从语音中检测出待识别的关键词,深度神经网络为小型关键词检测任务提供了有效的解决方案。大多数现有关键词检测方法采用Softmax最小化交叉熵损失函数,假设测试和训练样本来自相同分布,侧重于在训练集上最大化分类精度,而未考虑训练集外的未知语音。若训练数据有限,关键词检测系统在遇到未知语音时,实现鲁棒性和高准确率仍比较困难。该文研究了开放集学习方法,结合深度特征编码器和基于卷积原型学习、互斥点学习的分类器,用于开放集关键词检测任务。该文提出的关键词检测方法不仅提高了关键词的分类精度,而且具有较好的非关键词检测性能。在Google Speech Commands数据集V0.01和V0.02,以及由Libri Seechp衍生的Libri Words数据集上的试验结果表明:该文提出的关键词检测方法在大多数评估指标上优于基线方法。 [Objective] Keyword spotting(KWS) aims to detect recognizable keywords from speech.Deep neural networks have provided effective solutions for KWS in small-scale applications.However,most KWS methods employ Softmax-based cross-entropy loss,assuming that the test and training samples have identical distributions.These methods focus on maximizing the classification accuracy of the training set,often neglecting unknown speech data outside the training samples.This approach can lead to significant challenges in real-world scenarios where limited training data is available and individuals frequently encounter unfamiliar speech.[Methods] This paper introduces a approach to KWS by exploring open-set learning methods that can accommodate the open vocabulary of KWS tasks.These methods combine deep feature encoders with classifiers based on convolutional prototype learning and reciprocal point learning.For convolutional prototype learning,this paper first replaces the Softmax network with the prototype network to eliminate the closed-world assumption.Subsequently,constructs prototypes for each keyword that represent class-level features in the feature space.This paper uses a distance-based method to represent the similarity between the sample and the keyword for classification,maximizing the likelihood probability of the sample.To effectively reject non-keywords,this paper applies a regularization constraint on the boundary of the prototypes,which improves the robustness of the system.For reciprocal point learning,this paper constructs reciprocal points that represent features not associated with the keyword class.This paper assumes that the probability of a sample belonging to a keyword is proportional to the distance between this point and the reciprocal point,and uses this as a classification criterion.To detect non-keywords,this paper restricts the boundary range of reciprocal points.In addition,this paper explores variants of reciprocal point learning,such as adversarial reciprocal point learning,which uses a more effective distance function and an adequate boundary constraint to further improve system performance.The backbone network used for training the small-footprint KWS systems is ResNet 15.The KWS system developed from these methods not only enhances the classification accuracy but also improves the detection of non-keyword categories.This paper employs classification accuracy(ACC),macro-averaged F_(1) score,and area under the receiver operating characteristic curve(AUC) to measure the performance of the proposed methods.[Results] This paper conducted experiments on Google Speech Command(GSC) datasets V0.01 and V0.02,as well as the LibriWords dataset derived from LibriSpeech,to evaluate the performance of the proposed method.The results showed that the proposed method outperforms the baseline approaches in most evaluation metrics.The proposed method,which was grounded on reciprocal point learning,achieved the best performance in terms of classification ACC.In addition,methods based on generalized convolution prototype learning and adversarial reciprocal point learning equaled or even surpassed the performance of the baseline methods.When detecting non-keywords,the method based on adversarial reciprocal point learning exhibited the best performance on the GSC dataset.As the number of non-keywords in the LibriWords dataset increases,the method employing generalized convolutional prototype loss achieved optimal detection performance.[Conclusions] By introducing generalized convolution prototype learning and reciprocal point learning,this paper significantly improves the performance of the KWS system in open scenarios.The experimental results show that the proposed method significantly outperforms existing approaches on small-footprint systems with limited training data.

作者黄子峻张晓雷 HUANG Zijun;ZHANG Xiaolei(School of Marine Science and Technology,Northwestern Polytechnical University,Xi'an 710072,China;Shenzhen Research Institute,Northwestern Polytechnical University,Shenzhen 518057,China)

机构地区西北工业大学航海学院西北工业大学深圳研究院

出处《清华大学学报（自然科学版）》 EI CAS CSCD 北大核心 2024年第11期1927-1935,共9页 Journal of Tsinghua University(Science and Technology)

基金国家自然科学基金面上项目(62176211) 深圳市科创委国际合作研究项目(GJHZ20240218114401004)。

关键词有限训练数据关键词检测开放集识别原型学习 limited training data keywork spotting open set recognition prototype learning

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

1程立英,谷利茹,晏旻,管文印,王晓伟,张志美.基于K-means与宽度学习的肺炎图像分类算法[J].沈阳师范大学学报（自然科学版）,2024,42(4):334-339.
2Andrew Walcott Beckwith.Unruh Metric Tensor HUP via Planckian Space-Time Compared to HUP Based Complexity of Measured System Results to Obtain Inflaton Potential Magnitude[J].Journal of High Energy Physics, Gravitation and Cosmology,2024,10(4):1628-1642.
3刘丽红,张艳,马畅畅,方征平,王滕滕.硼酸锌/三氧化二锑/可膨胀石墨协同体系阻燃氯丁橡胶及作用机理[J].高分子材料科学与工程,2024,40(9):65-73.
4刘波.肝硬化急性上消化道出血的危险分层[J].中国临床医生杂志,2024,52(11):1261-1266.
5刘美婷,孟晶,田婧卓,赵雍,易艳,李春英,柳辰玥,崔爽,张宇实,王连嵋,梁爱华.长期灌服细辛水煎剂及其主要马兜铃酸类成分对小鼠主要脏器的影响[J].中国药物警戒,2024,21(10):1087-1094.
6崔萍,王明,陈玉娴.次磷酸铝的形貌可控制备及其共混改性[J].精细石油化工,2024,41(6):53-56.

清华大学学报（自然科学版）

2024年第11期

浏览历史

内容加载中请稍等...

基于有限训练数据和开放集学习的鲁棒小型关键词检测系统

相关作者

相关机构

相关主题

浏览历史