摘要
提出了一种识别多维语音信息的方法,用来同时识别说话人身份、性别和情感信息,选择身份特征参数I-vector向量表示语句特征。首先基于深度置信网络(DBN)设计了一个性别相关的多维语音识别基线系统,然后在基线系统基础上又提出了一种基于渐进式神经网络技术(Progressive Neural Network,ProgNets)的多维说话人信息识别方法。在性别相关的基础上,将辅助语音识别模型知识迁移学习到主语音识别模型中,进而增强语音识别性能。实验结果表明,基线系统识别结果比非同时识别的单维语音识别DBN模型的平均识别率提升了4.73%,而基于ProgNets系统的多维系统识别精度比基线系统高1.8%。
A novel approach for estimating speaker identity,gender and emotional information simultaneously is proposed.This simultaneous process is called the multi-dimensional speaker information recognition.The identity parameter I-vector is selected to represent the utterance feature.Firstly,a gender-related multi-dimensional speaker recognition baseline system is established based on deep belief network(DBN).Based on the baseline system.Then, a multi-dimensional speaker information recognition system is proposed based on progressive neural networks(ProgNets).On the basis of gender correlation,the knowledge of the speech recognition of the auxiliary model is transferred to the main speech recognition model to enhance the speech recognition performance.Experimental results show that the recognition rate of the baseline system is 4.73% better than the average recognition rate of single-dimensional speech recognition DBN model with non-simultaneous recognition,while the recognition rate of the multi-dimensional ProgNets system is 1.8% higher than that of the baseline system.
作者
陈海霞
徐珑婷
杨震
CHEN Haixia;XU Longting;YANG Zhen(Key Lab of Broadband Wireless Communication and Sensor Network Technology,Ministry of Education,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;School of Information Science and Technology,Donghua University,Shanghai 201620,China;National Local Joint Engineering Research Center for Communication and Network Technology,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
出处
《南京邮电大学学报(自然科学版)》
北大核心
2019年第1期45-51,共7页
Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition
基金
国家自然科学基金(61271335)
国家高技术研究发展计划(863计划)(2006AA010102)资助项目