Emotional speaker recognition based on prosody transformation 被引量：1

基于韵律变换的情感说话人识别(英文)

下载PDF

导出

摘要 A novel emotional speaker recognition system （ESRS） is proposed to compensate for emotion variability. First, the emotion recognition is adopted as a pre-processing part to classify the neutral and emotional speech. Then, the recognized emotion speech is adjusted by prosody modification. Different methods including Gaussian normalization, the Gaussian mixture model （GMM） and support vector regression （SVR） are adopted to define the mapping rules of F0s between emotional and neutral speech, and the average linear ratio is used for the duration modification. Finally, the modified emotional speech is employed for the speaker recognition. The experimental results show that the proposed ESRS can significantly improve the performance of emotional speaker recognition, and the identification rate （IR） is higher than that of the traditional recognition system. The emotional speech with F0 and duration modifications is closer to the neutral one. 为了解决由情感变化引起的说话人识别性能下降问题,提出了一种新的情感说话人识别系统. 首先,通过引入情感识别作为前端处理模块,对中性语音和情感语音进行分类. 然后,对情感语音进行韵律修正,分别采用高斯归一化、高斯混合模型( GMM) 和支持向量回归( SVR) 等方法建立情感语音和中性语音的基频映射规则,并根据平均线性变化率对时长进行了修正. 最后,对韵律修正后的情感语音进行识别. 实验结果表明,提出的情感说话人识别系统可以有效地提高情感说话人识别的性能,识别率相比传统方法有了显著的提高. 并且通过基频和时长修正的情感语音更接近于中性语音.

作者宋鹏赵力邹采荣

机构地区东南大学水声信号处理教育部重点实验室佛山科学技术学院

出处《Journal of Southeast University(English Edition)》 EI CAS 2011年第4期357-360,共4页 东南大学学报（英文版）

基金 The National Natural Science Foundation of China (No.60872073, 60975017, 51075068) the Natural Science Foundation of Guangdong Province (No. 10252800001000001) the Natural Science Foundation of Jiangsu Province (No. BK2010546)

关键词 emotion recognition speaker recognition F0 transformation duration modification 情感识别说话人识别基频转换时长修正

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献10

1Scherer K L,Johnstone T,Klasmeyer G,et al.Can automatic speaker verification be improved by training the algorithms on emotional speech?. International Conference on Spoken Language Processing . 2000
2Wu Z H,Li D D,Yang Y C.Rules based feature modification for affective speaker recognition. International Conference on Acoustics,Speech,and Signal Processing . 2006
3Campbell W M,Sturim D E,Reynolds D A,et al.SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. International Conference on Acoustics,Speech,and Signal Processing . 2006
4Hu H,Xu M M,Wu W.GMM supervector based SVM with spectral features for speech emotion recognition. International Conference on Acoustics,Speech,and Signal Processing . 2007
5Sinha R,Ghai S.On the use of pitch normalization for improving children’’s speech recognition [C ]. 10th Annual Conference of the International Speech Communication Association . 2009
6Wu Z Z,Kinnunen T,Chng E S,et al.Text-independent F0 transformation with non-parallel data for voice conversion. 11th Annual Conference of the International Speech Communication Association . 2010
7SHAN Zhenyu,YANG Yingchun,YE Ruizhi.Natural-emotion GMM transformation algorithm foremotional speaker recognition. Proc of Interspeech 2007 . 2007
8Debasish B,Pal S,Patranabis D.C.Support Vector Regression. Neural Information Processing . 2007
9Reynolds D A,Quatieri T F,Dunn R B.Speaker verification using adapted Gaussian mixture models. Digital Signal Processing . 2000
10Tao,J.,Kang,Y.,Li,A.Prosody Conversion From Neutral Speech to Emotional Speech. IEEE Transactions on Audio, Speech and Language Processing . 2006

同被引文献19

1郭武,戴礼荣,王仁华.采用UBM更新量作为支持向量机特征的说话人确认[J].清华大学学报（自然科学版）,2008,48(S1):704-707. 被引量：4
2谢焱陆.基于特征变换和分类的文本无关电话语音说话人识别研究[D].合肥:中国科学技术大学,2007.
3CAMPBELL J. Speaker reeognition: a tutorial[ C ]// Pro- ceedings of IEEE 1997 Custom Integrated Circuits Confer- ence. Califomia:IEEE,1997, 85(9) : 1A37-1462.
4KERSTA L G. Voiceprint identification [J]. Nature, 1962, 196: 1253-1257.
5LUCK J E: Automatic speaker verification using cepstral measurements[J]. Journal of the Acoustical Society of A- merica, 1979, 46(4) :966-978.
6ATAL B S. Automatic recognition of speakers from their voices [ J ]. Proceedings of IEEE, 1976, 64 (4) : 460-475.
7DAVIS S B, MERMELSTEIN P. Comparison of parametric representations for monosyllabic word recognition in contin- uously spoken sentences [ J ]. IEEE Transactions on Acous- tics,Speech and Signal Processing,1980,28(4) :357-366.
8RABINER L R. A tutorial on hidden Markov models and selected applications in speech recognition[J-. Proceed- ings of IEEE, 1989, 77(2) : 57-286.
9REYNOLDS D A, ROSE R C. Robust text-independent speaker identification using gaussian mixture speaker models[ J]. IEEE Transactions on Audio, Speech and Language Processing, 1995, 17 : 91-108.
10REYNOLDS D A, QUATIERI T, DUNN R. Speaker ver- ification using adapted gaussian mixture models [ J ]. Dig- ital Signal Processing, 2000, 10(10) : 19-41.

引证文献1

1酆勇,李宓,李子明.文本无关的说话人识别研究[J].数字通信,2013,40(4):48-52. 被引量：1

二级引证文献1

1王伟,韩纪庆,郑铁然,郑贵滨,周星宇,金声.基于i-vector声纹识别上课点名系统的设计与实现[J].智能计算机与应用,2016,6(6):108-110.

1周江扬,柴佩琪.基于正弦模型的汉语语音时长和音高的修正[J].同济大学学报（自然科学版）,2001,29(3):312-316.
2于正坤.如何设计一个可靠的电源[J].科技资讯,2006,4(17):86-87.
3三星LCD技术白皮书——应用篇[J].微电脑世界,2003(7):86-86.
4遇登庆.如何设计一个可靠的电源[J].黑龙江科技信息,2007(04S):19-19.
5沙泉,周江扬.一种调整汉语语速的新方法[J].计算机工程与科学,2000,22(4):64-66. 被引量：2
6沙泉.基于正弦模型的汉语文—语转换系统[J].上海应用技术学院学报（自然科学版）,2001,1(2):118-121.
7陈芝,张玲华.基频轨迹转换算法及在语音转换系统中的应用研究[J].南京邮电大学学报（自然科学版）,2010,30(5):83-87. 被引量：1
8韩志艳,王健.面向语音与面部表情信号的情感可视化方法[J].电子设计工程,2016,24(11):146-149.
9凌震华,高丽,戴礼荣.基于目标逼近特征和双向联想贮存器的情感语音基频转换[J].天津大学学报（自然科学与工程技术版）,2015,48(8):670-674. 被引量：3
10欧阳勤.空间行波管[J].真空电子技术,2003,16(2):29-32. 被引量：6

Journal of Southeast University(English Edition)

2011年第4期

浏览历史

内容加载中请稍等...

Emotional speaker recognition based on prosody transformation 被引量：1

参考文献10

同被引文献19

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史