摘要
语音信号是一个时变信号,受个体、环境等影响较大。为提高说话人识别率,对原始语音信号进行一定预处理是必要的,提出一种利用卷积神经网络的说话人识别算法。该算法利用卷积神经网络的卷积和降采样两种操作对说话人识别中的语音信号进行预处理,通过构建一维和二维卷积操作,对预处理后的信号提取梅尔频率倒谱系数特征参数,并采用经典的通用背景模型对说话人进行识别模型建模。通过自建库和TIMIT标准库测试表明,该算法与经典的直接基于梅尔频率倒谱系数特征和通用背景模型的方法相比,识别率提升了8%~15%,并且有效地降低了算法的时间复杂度和空间复杂度。
Speech signal is a time varying signal which influences by speaker and environment easily.In order to improve speaker recognition rate,some preprocesses are needed.A speaker recognition algorithm based on Convolutional Neural Network( CNN) was proposed.In CNN,there were two main operations named convolution and down-sampling respectively,and the two operations were adopted to preprocess speech signal before feature extraction with MFCC( Mel Frequency Cepstrum Coefficient) processing,and then the classical universal background model method was used to model the speaker features.Experimental results based on a self-built database and TIMIT database show that the proposed method outperforms the classical method using MFCC features and GMM( Gaussian Mixture Model)-UBM( Universal Background Model) classifier,with respect to recognition rate improvement by 8% to 15% and time and space complexity reduction.
出处
《计算机应用》
CSCD
北大核心
2016年第A01期79-81,200,共4页
journal of Computer Applications
基金
国家自然科学基金资助项目(60862003)
科技部国际合作项目(2009DFR10530)
贵州省工业科技攻关项目(黔科合GY字(2010)2054)
教育部高等院校博士点基金资助项目(20095201110002)
贵州大学研究生创新基金资助项目(2015081)
关键词
卷积神经网络
说话人识别
通用背景模型
梅尔频率倒谱系数
预处理
Convolutional Neural Network(CNN)
speaker recognition
universal background model
Mel Frequency Cepstrum Coefficient(MFCC)
preprocessing