摘要
为了提高带有口音的说话人的语音识别的准确率,提出一种有向无环图-深度置信网络多口音分类的方法.通过提取说话人的梅尔频率倒谱系数及其一阶、二阶差分特征,获取语音参数的静态特性和动态特性.使用主成分分析对特征参数进行降维,减少了计算复杂度.使用基于有向无环图拓扑结构的深度置信网络,不仅缩短了多口音分类的测试时间,同时能够得到较高的分类精度.使用TIMIT语音库进行实验测试,分类准确率达到87.46%,和其他多口音分类方法相比该方法明显提高了分类速度以及分类准确率.
To improve the accuracy of speech recognition of accented speakers,a method of multi-accent classification of directed acyclic graph deep belief network is proposed.By extracting the speaker’s Mel frequency cepstrum coefficient and its first and second order difference features,the static and dynamic characteristics of speech parameters are obtained.Principal component analysis is used to reduce the dimensionality of the feature parameters,reducing computational complexity.Using a deep confidence network based on a directed acyclic graph topology not only shorten the test time of multi-accent classification,but also achieves higher classification accuracy.Using the TIMIT phonetic database for experimental testing,the classification accuracy rate reached 87.46%.Compared with other multi-accent classification methods,this method obviously improves the classification speed and classification accuracy.
作者
肖萌萌
徐志京
XIAO Meng-meng;XU Zhi-jing(School of Information Engineering,Shanghai Maritime University,Shanghai 201306,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2019年第12期2545-2549,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61673259)资助
关键词
口音分类
梅尔频率倒谱系数
主成分分析
有向无环图
深度置信网络
accent classification
mel frequency cepstrum coefficient
principal component analysis
directed acyclic graph
deep belief network