摘要
语音段的有效表示方法存在易混淆语种和短时语音段识别率较低等问题,为满足不同时长和方言的识别要求,提出基于深度神经网络不同层的有效语音段表示方法.采用含有中间瓶颈层的深层神经网络作为前端特征提取,综合利用该网络的输出层和中间瓶颈层输出结果,得到不同形式的语音段表示并用于语种识别.在美国国家标准技术局语种识别评测2009年和2011年阿拉伯方言数据集上验证了方法的有效性.
Aiming at the problems of confusable dialects and short-duration utterance in automatic spoken language identification (LID), an improved utterance representation method is proposed based on different layers of deep neural network ( DNN ). Deep bottleneck network ( DBN ), a DNN with an internal bottleneck layer, is employed as a front-end feature extractor. Different representations based on output layer and middle bottleneck layer of DBN for LID are obtained and fused. Evaluations on the NIST LRE2009 dataset and NIST LRE2011 Arabic dialect dataset demonstrate that the proposed method based on DBN achieves good performance.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2015年第12期1093-1099,共7页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金项目(No.61172158)资助
关键词
语种识别
深度神经网络
语音段表示
深度瓶颈特征
Language Identification, Deep Neural Network, Utterance Representation, DeepBottleneck Feature