摘要
针对现有音素识别系统识别准确率不高、建模方法表征能力不强且易陷入局部最优解等问题,提出了一种基于层次结构深度信念网络(deep belief network,DBN)的音素识别新方法.该方法由基于层次结构DBN的瓶颈特征以及基于DBN的音素分类器两部分组成:其中的瓶颈特征能够充分利用DBN能够处理长时段语音、监督性的提取方法等特性;而基于DBN的音素分类器则具有更强的建模和表征能力.因此,将两者结合在一起能够在提取低维、监督性特征的同时,利用DBN更加有效地对音素后验概率进行识别.在TIMIT数据库上进行的实验结果表明,所提出的音素识别方法在识别正确率上相对于以往音素识别系统有较大提高.
To overcome the problem of poor recognition performance and being prone to be trapped in local optima, this paper proposes a hierarchical phoneme classification method based on deep belief network (DBN). The system consists of two parts: a bottleneck feature and a phoneme classifier, both DBN based. The two parts are combined to form a phoneme recognition system. The system can extract low dimensional and supervising features, and improve classification accuracy. Experiments on TIMIT corpus suggest that the proposed system can obtain 18.5% phoneme error rate as compared with existing systems.
出处
《应用科学学报》
CAS
CSCD
北大核心
2014年第5期515-522,共8页
Journal of Applied Sciences
基金
国家自然科学基金(No.61272333)
安徽省自然科学基金(No.1208085MF94
No.1308085QF99)资助
关键词
音素识别
层次结构
深度信念网络
瓶颈特征
phoneme recognition, hierarchical structure, deep belief network, bottleneck feature