摘要
针对梅尔频率倒谱系数(MFCC)语音特征不能有效反映连续帧之间有效信息的问题,基于深度神经网络相关性和紧凑性特征,提出一种融合神经网瓶颈特征与MFCC特征的复合特征构造方法,提高语音的表征能力和建模能力。从语音数据中提取MFCC特征作为输入数据,将MFCC特征和BN特征进行串接得到新的复合特征,并进行GMM-HMM声学建模。在TIMIT数据库上的实验结果表明,与单一的瓶颈特征和深度神经网络后验特征相比,该方法识别率明显提升。
The Mel-Frequency Cepstral Coefficient(MFCC)speech features cannot effectively reflect the effective information between consecutive frames.To address the problem,this paper uses deep neural network to extract bottleneck features with long-term correlation and compactness of speech,and on this basis proposes a compound feature construction method that combines the neural network bottleneck features and the MFCC feature,so as to improve the speech characterization and modeling capabilities.The MFCC feature is extracted from the speech data as the input,and then concatenated with the BN feature to obtain a new compound feature.On this basis the acoustic modeling of Mixture Model-Hidden Markov Model(GMM-HMM)is implemented.Experimental results on the TIMIT database show that compared with the methods based on the single bottleneck feature and deep neural network posterior feature,the proposed method can significantly increases the recognition rate.
作者
郑文秀
赵峻毅
文心怡
姚引娣
ZHENG Wenxiu;ZHAO Junyi;WEN Xinyi;YAO Yindi(School of Communication and Information Engineering,Xi’an University of Posts and Telecommunications,Xi’an 710121,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2020年第11期301-305,314,共6页
Computer Engineering
基金
国际科技合作项目一般项目“基于大数据信息决策的智慧农业自动灌溉系统研究”(2018KW-025)。
关键词
深度神经网络
梅尔频率倒谱系数
瓶颈特征
复合特征
高斯混合模型-隐马尔科夫模型
Deep Neural Networks(DNN)
Mel-Frequency Cepstral Coefficient(MFCC)
bottleneck feature
compound feature
Gaussian Mixture Model-Hidden Markov Model(GMM-HMM)