期刊文献+

一种基于卷积神经网络的快速说话人识别方法 被引量:4

A Fast Speaker Recognition Method Based on Convolutional Neural Network
下载PDF
导出
摘要 提出了一种基于Gammatone滤波器倒谱系数(Gammatone Frequency Cepstral-Coefficients,GFCC)动态组合参数的卷积神经网络(Convolutional Neural Networks,CNN)结构来实现快速说话人识别的方法。提取语音样本的GFCC及其一阶差分和二阶差分系数作为代表语音的特征参数,对特征参数进行归一化处理,将得到的统计特征构造成CNN的输入形式。实验结果表明,与通用背景模型(Gaussian Mixture Model-Universal Background Model,GMM-UBM)相比,提出的模型方法学习速度更快,在提高识别率的同时减少了训练时间和识别时间。 A convolutional neural network structure based on GFCC dynamic combination parameters is proposed to realize fast speaker recognition.Firstly,the GFCC and its first-order and second-order difference coefficients of the speech samples are taken as the characteristic parameters representing the speech.Then the feature parameters are normalized and the statistical features are used as the input of convolution neural network.The experimental results show that compared with GMM-UBM model,the proposed model method has faster learning speed,and reduces the training time and recognition time while improving the recognition rate.
作者 蔡倩 高勇 CAI Qian;GAO Yong(College of Electronics and Information Engineering,Sichuan University,Chengdu 610065,China)
出处 《无线电工程》 2020年第6期447-451,共5页 Radio Engineering
基金 四川大学科研资助项目(0020505501743)。
关键词 动态组合参数 说话人识别 一阶差分 二阶差分 统计特征 dynamic combination parameters speaker recognition first-order difference second-order difference statistical features
  • 相关文献

参考文献8

二级参考文献77

  • 1李朝晖,迟惠生.听觉外周计算模型研究进展[J].声学学报,2006,31(5):449-465. 被引量:22
  • 2Junqua J C, Mak B, Reaves B. A robust algorithm for word boundary detection in the presence of noise [J]. IEEE Transactions on speech and Audio Processing, 1994, 2(3):406-412.
  • 3Beritelli F, Casale S, Ruggeri G, et al. Performances evaluation and comparision of G. 729/AMR/fuzzy voice activity detectors [J]. IEEE Signal Processing Letters,2002, 9(3): 85-88.
  • 4Pencak J, Neloson D. The NP speech activity detection algorithm [J]. Int Conf Acoustics, Speech and Signal Processing, 1995. 381 - 384.
  • 5Nemer E, Goubran R, Mahmoud S. Robust voice activity detection using higher-order statistics in the LPC residual domain [J]. IEEE Trans Speech and Audio Processing,2001, 9(3): 217-231.
  • 6Woo K H, Yang T Y, Park K J, et al. Robust voice activity detection algorithm for estimating noise spectrum [J].Electronics Letters, 2000, 36(2) : 180 - 181.
  • 7迟惠生 杨行峻 唐昆.语音信号数字处理[M].北京:电子工业出版社,1995..
  • 8Irino T, Patterson R D. A Dynamic Compressive Gammachirp Auditory Filterbank[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(6): 2222-2232.
  • 9Lyon R F, Katsiamis A G, Drakakiss E M. History and Future of Auditory Filter Models[C]//Proc. of ISCAS'10. Paris, France: Is. n.], 2010: 3809-3812.
  • 10Plack C J, Oxenham A J. Basilar-membrane Nonlinearity Estimated by Pulsation Threshold[J]. Journal of the Acoustical Society of America, 2000, 107(1): 501-507.

共引文献116

同被引文献33

引证文献4

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部