摘要
提出了一种基于Gammatone滤波器倒谱系数(Gammatone Frequency Cepstral-Coefficients,GFCC)动态组合参数的卷积神经网络(Convolutional Neural Networks,CNN)结构来实现快速说话人识别的方法。提取语音样本的GFCC及其一阶差分和二阶差分系数作为代表语音的特征参数,对特征参数进行归一化处理,将得到的统计特征构造成CNN的输入形式。实验结果表明,与通用背景模型(Gaussian Mixture Model-Universal Background Model,GMM-UBM)相比,提出的模型方法学习速度更快,在提高识别率的同时减少了训练时间和识别时间。
A convolutional neural network structure based on GFCC dynamic combination parameters is proposed to realize fast speaker recognition.Firstly,the GFCC and its first-order and second-order difference coefficients of the speech samples are taken as the characteristic parameters representing the speech.Then the feature parameters are normalized and the statistical features are used as the input of convolution neural network.The experimental results show that compared with GMM-UBM model,the proposed model method has faster learning speed,and reduces the training time and recognition time while improving the recognition rate.
作者
蔡倩
高勇
CAI Qian;GAO Yong(College of Electronics and Information Engineering,Sichuan University,Chengdu 610065,China)
出处
《无线电工程》
2020年第6期447-451,共5页
Radio Engineering
基金
四川大学科研资助项目(0020505501743)。
关键词
动态组合参数
说话人识别
一阶差分
二阶差分
统计特征
dynamic combination parameters
speaker recognition
first-order difference
second-order difference
statistical features