期刊文献+

基于目标逼近特征和双向联想贮存器的情感语音基频转换 被引量:3

F0 Transformation for Emotional Speech Synthesis Using Target Approximation Features and Bidirectional Associative Memories
下载PDF
导出
摘要 提出了一种用于情感语音合成的基频转换方法.该方法使用定量目标逼近(q TA)特征作为语音音节层的基频描述,并用高斯双向联想贮存器(GBAM)实现中性合成语音音节层q TA参数向目标情感语音音节层q TA参数的转换.在模型训练阶段,首先基于中性语料库和统计参数语音合成方法构建中性语音合成系统;然后利用少量情感录音数据,将从情感语音文本对应的中性合成语音中提取的q TA参数作为源数据,将情感录音中提取的q TA参数作为目标数据,进行GBAM转换模型的训练.在情感语音合成阶段,利用训练得到的GABM模型,实现中性合成语音基频特征向目标情感的转换.实验结果表明,该方法在目标情感数据较少的情况下可以取得比最大似然线性回归(MLLR)模型自适应方法更好的情感表现力. In this paper,an F0 transformation method for emotional speech synthesis was proposed.Quantitative target approximation(qTA)features were used to represent F0 contour in syllable level.And Gaussian directional as-sociative memories(GBAM)was used to complete the transformation of syllable-level qTA parameters from synthe-sized neutral speech to target emotional recordings.In the training stage,firstly HMM-based statistical parametric speech synthesis was used to construct a neutral speech synthesis system with neutral corpus as training set.And then,with a small amount of emotional recording data,GBAM-based transformation model was trained by using the qTA parameters extracted from synthesized neutral speech corresponding to the emotional text as the source feature and the qTA parameters extracted from target emotional recordings as the target patterns of GBAM,respectively.In the generation of emotional speech,the trained GBAM model was utilized to complete the transformation of syllable-level F0 features from synthesized neutral speech to target emotional recordings.The experiment results indicate that,in the case of little emotional recording data,the proposed method performs better in emotional expressivity than the adaptation method using maximum likelihood linear regression(MLLR).
出处 《天津大学学报(自然科学与工程技术版)》 EI CAS CSCD 北大核心 2015年第8期670-674,共5页 Journal of Tianjin University:Science and Technology
基金 国家自然科学基金资助项目(61273032)
关键词 情感语音合成 定量目标逼近 高斯双向联想贮存器 基频转换 emotional speech synthesis qTA GBAM F0 transformation
  • 相关文献

参考文献11

  • 1Yamagishi J, Onishi K, Masuko T, et al. Acousticmodeling of speaking styles and emotional expressions in HMM-based speech synthesis [J]. IEICE Transactions on Information andSystems, 2005, 88(3): 502-509.
  • 2Masuko T, Tokuda K, Kobayashi T, et al. Voice char- acteristics conversion for HMM-based speech synthesis system [C]//Proceedings of the 1EEE International Con- ference on Acoustics, Speech, and Signal Processing. 1997: 1611-1614.
  • 3Tamura M, Masuko T, Tokuda K, et al. Adaptation of pitch and spectrum for HMM-based speech synthesis us- ing MLLR[C]//Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Process- ing. 2001 : 805-808.
  • 4Junichi Y, Tachibana M, Masuko T, et al. Speaking style adaptation using context clustering decision tree for HMM-based speech synthesis [C]// Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 2004: 5-8.
  • 5Veaux C, Rodet X. Intonation conversion from neutral to expressive speech[C]//INTERSPEECH. Florence, Italy, 2011: 2765-2768.
  • 6Tao J, Kang Y, Li A. Prosody conversion from neutral speech to emtional speech [J]. IEEE Transactions on Au- dio, Speech, and Language Processing, 2006, 14(4): 1145-1154.
  • 7Xu Y, Wang E Q. Pitch targets and their realization: Evidence from Mandarin Chinese [J]. Speech Communi- cation, 2001, 33: 319-337.
  • 8Prom-On S, Xu Y, Thipakorn B. Modeling tone and intonation in Mandarin and English as a process of target approximation [J]. The Journal of the Acoustical Society of America, 2009, 125(1): 405-424.
  • 9Gao L, Ling Z H, Chen L H, et al. Improving F0 pre- diction using bidirectional associative memories and syl- lable-level F0 features for HMM-based Mandarin speech synthesis [C]//1SCSLP. Singapore, 2014: 275-279.
  • 10Liu L J, Chen L H, Ling Z H, et al. Using bidirec- tional associative memories for joint spectral envelope modeling in voice conversion[C]// IEEE International Conference on Acoustics, Speech, and Signal Process- ing. Florence, Italy, 2014: 7884-7888.

同被引文献33

  • 1高莉琴.从维吾尔人学汉语看第二语言习得的几个问题[J].语言文字应用,1994(1):79-85. 被引量:5
  • 2简志华,杨震.基于混合线性变换的语声转换算法[J].电子与信息学报,2007,29(7):1700-1702. 被引量:2
  • 3申毅,简志华,杨震.改进的GMM模型语声转换系统[J].南京邮电大学学报(自然科学版),2007,27(5):11-15. 被引量:2
  • 4简志华,杨震.语声转换技术发展及展望[J].南京邮电大学学报(自然科学版),2007,27(6):88-94. 被引量:3
  • 5Yadav J,Rao K S.Prosodic mapping using neural networksfor emotion conversion in Hindi language[J].CircuitsSystems & Signal Processing,2015(1):1-24.
  • 6Xue Y,Hamada Y,Akagi M.Emotional speech synthesissystem based on a three-layered model using a dimensionalapproach[C].Proceedings of Asia-Pacific Signal andInformation Processing Association Summit and Conference,2015.
  • 7Yadav J,Rao K S.Generation of emotional speech byprosody imposition on sentence,word and syllable levelfragments of neutral speech[C].Proceedings of InternationalConference on Cognitive Computing and InformationProcessing,2015.
  • 8Tao J,Kang Y,Li A.Prosody conversion from neutralspeech to emotional speech[J].IEEE Transactions on Audio,Speech,and Language Processing,2006,14(4):1145-1154.
  • 9Veaux C,Rodet X.Intonation conversion from neutral toexpressive speech[C].Proceedings of the INTERSPEECH2011,2011:2765-2768.
  • 10Teutenberg J,Watson C,Riddle P.Modelling and synthesisingF0 contours with the Discrete Cosine Transform[C].Proceedings of International Conference on Acoustics,Speech,and Signal Processing(ICASSP-88),2008:3973-3976.

引证文献3

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部