期刊文献+

基于虚拟教师蒸馏模型的说话人确认方法 被引量:1

Speaker Verification Based on Teacher-Free Knowledge Distillation Model
下载PDF
导出
摘要 无文本说话人确认模型通过复杂的网络结构和多变的特征提取方式来获得必要的性能,然而这会产生巨大的内存消耗和递增的计算成本,导致模型难以在资源有限的硬件设施上部署。针对该问题,利用虚拟教师蒸馏模型(teacher-free knowledge distillation,Tf-KD)可以带来百分之百的分类正确率、平滑的输出概率分布的优势,在轻量级残差网络的基础上构建虚拟教师说话人确认模型(teacher-free speaker verification model,Tf-SV)。同时引入空间共享而通道分离的动态激活函数和附加角裕度损失函数,使所提模型在特征表达、训练效率以及模型压缩后性能等方面的水平得到极大提升,最终达到无文本说话人确认模型能够在存储或者计算资源有限设备上部署的目的。基于VoxCeleb1数据集的实验表明,虚拟教师说话人确认模型的等错误率(EER)降低到3.4%。与已有成果相比,指标有明显提升,证明了在说话人确认任务上所提压缩模型的有效性。 The text-independent speaker verification models achieve powerful performance through complex network structure and changeable feature extraction methods, however, they need huge memory consumption and incremental computing costs, which makes it difficult to deploy the models on resource-limited hardware facilities. Focusing on this problem, this research takes advantage of the teacher-free knowledge distillation(Tf-KD)model, which can bring one hundred percent classification accuracy and smoothing output probability distribution to establish a teacher-free speaker verification(Tf-SV)model based on a lightweight residual network. At the same time, the spatial-shared and channel-wise dynamic rectified linear units function and the additive angular margin loss function(AAM-Softmax)are introduced, which greatly improve the performance of the proposed model in terms of feature expression, training efficiency and compressed model’s capabilities, and finally achieve the aim of deploying the given Tf-SV model on limited-storage or limited-computing facilities. Based on the VoxCeleb1 dataset, experimental results show that the equal error rate(EER)of the Tf-SV model is reduced to 3.4%. This is a significant improvement over the published results, and demonstrates the effectiveness of the compression model on the speaker verification task.
作者 肖金壮 李瑞鹏 纪盟盟 XIAO Jinzhuang;LI Ruipeng;JI Mengmeng(College of Electronic Information Engineering,Hebei University,Baoding,Hebei 071000,China)
出处 《计算机工程与应用》 CSCD 北大核心 2022年第8期198-203,共6页 Computer Engineering and Applications
基金 河北省自然科学基金面上项目(H2016201201) 河北省高等学校科学技术研究重点项目(ZD2016149)。
关键词 虚拟教师知识蒸馏 动态激活函数 附加角裕度损失函数 模型压缩 说话人确认 teacher-free knowledge distillation dynamic rectified linear units function additive angular margin loss function model compression speaker verification
  • 相关文献

参考文献4

二级参考文献28

  • 1荣薇,陶智,顾济华,赵鹤鸣.基于改进LPCC和MFCC的汉语耳语音识别[J].计算机工程与应用,2007,43(30):213-216. 被引量:17
  • 2Bouzid M.Robust quantization of LPC parameters for speech communication over noisy channel[C]//Proceedings of the 2nd International Conference on the Applications of DigtialInformation and Web Technologies, Aug 2009 : 713-718.
  • 3Zhang X Y, Guo Y L,Hou X M.A speech recognition method of isolated words based on modified LPC ceps- trum[C]//Proceedings of the IEEE International Confer- ence on Granular Computing,Nov 2007:481-485.
  • 4Hosseinzadeh D, Krishnan S.Combining vocal source and MFCC features for enhanced speaker recognition perfor- mance using GMMs[C]//Proceedings of the IEEE 9th Workshop on Multimedia Signal Processing, Oct 2007: 365-368.
  • 5Skowronski M D,Harris J G.Increased MFCC filter band- width for noise-robust phoneme recognition[C]//Proc of IEEE Int'l Conf on Acoustics Speech and Signal Pro- cessing, 2002 : 801-804.
  • 6Ezzaidi H, Rouat J.Pitch and MFCC dependent GMM models for Speaker Identification systems[C]//Proceedings of the Canadian Conference on Electrical and Computer Engineering, May 2004 : 43-44.
  • 7Shannon B J,Paliwal K K.Feature extraction fxom higher- lag autocorrelation coefficients for robust speech recog- nition[J].Speech Communication, 2006,48(1 1):1458-1485.
  • 8Twang Y, Li B, Jiang X Q, et al.Speaker recognition based on dynamic MFCC parameters[C]//Proceedings of the International Conference on Image Analysis and Signal Processing, Aril, 2009 : 406-409.
  • 9Li Ruwei, Bao Changchun, Dou Huijing.Pith detection method for noisy speech signals based on pre-filter and weighted wavelet coefficients[C]//Proc of the 9th Inter- national Conference on Signal Processing, Beijing, China, 2008 : 530-533.
  • 10Lung S Y.Wavelet feature domain adaptive noise reduc- tion using learning algorithm for text-independent speaker recognition[J].Pattern Recognition, 2007,40 : 2603-2606.

共引文献79

同被引文献13

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部