期刊文献+

汉语儿童情感语声合成

Affective speech synthesis of Chinese children
下载PDF
导出
摘要 情感语声合成技术对于人机交互具有重要的意义。面对儿童情感语声合成所需汉语语声数据资源缺乏以及模型训练时长较长等问题,该文提出利用迁移学习实现汉语儿童情感语声合成的方法。首先基于汉语语声数据库训练深度学习模型实现中文语声端到端合成模型,再使用高质量大样本的中文情感语料库完成情感语声合成模型,最后利用自行采样的小样本汉语儿童情感语料对模型进行迁移学习实现低资源的语声合成。客观实验结果中梅尔倒谱失真指标为4.91,主观听辨实验指标分别为3.61和4.17。通过实验对比表明,该文的方法在情感语声合成技术的应用上具有良好的性能表现,并且优于现有先进的低资源情感语声合成方法。 Emotional speech synthesis technology is of great significance for human-computer interaction.Facing the lack of Chinese speech data resources required for children’s emotional speech synthesis and the long time of model training,this paper proposes a method of using transfer learning to realize Chinese children’s emotional speech synthesis.This paper first implements the Chinese speech end-to-end synthesis model based on the Chinese speech database training depth learning model,then uses the high-quality and large sample Chinese emotional corpus to complete the emotional speech synthesis model,and finally uses the self sampled small sample Chinese children’s emotional corpus to transfer the model to realize low resource speech synthesis.The objective experimental results show that the Mel cepstrum distortion index is 4.91,and the subjective auditory discrimination experimental indexes are 3.61 and 4.17 respectively.The experimental comparison shows that the method in this paper has good performance in the application of emotional speech synthesis technology,and is better than the existing advanced low resource emotional speech synthesis methods.
作者 胡航烨 王蔚 HU Hangye;WANG Wei(School of Educational Science,Nanjing Normal University,Nanjing 210097,China)
出处 《应用声学》 CSCD 北大核心 2023年第1期76-83,共8页 Journal of Applied Acoustics
基金 国家社会科学基金项目(BCA150054)。
关键词 儿童 情感语声合成 迁移学习 低资源 Children Emotion speech synthesis Transfer learning Low resource
  • 相关文献

参考文献4

二级参考文献136

  • 1赵力,王治平,卢韦,邹采荣,吴镇扬.全局和时序结构特征并用的语音信号情感特征识别方法[J].自动化学报,2004,30(3):423-429. 被引量:15
  • 2王治平,赵力,邹采荣.基于基音参数规整及统计分布模型距离的语音情感识别[J].声学学报,2006,31(1):28-34. 被引量:26
  • 3Picard R W. Affective computing[M]. Cambridge: MIT Press, 1997.
  • 4Picard R W. Toward computers that recognize and respond to user emotion[J]. IBM Technical Journal, 2000, 38(2): 705-719.
  • 5Scherer K R, Banziger T. Emotional expression in prosody: A review and an agenda for future research [C]//SP2004(Speech Prosody 2004). Nara, Japan: International Speech Communication Association, 2004:355-369.
  • 6Arnold M. Emotion and personality[J]. Psychologi- cal Aspects, 1960,1 : 11-116.
  • 7Tomkins A S S. The negative affects[J]. Affect, Imagery, Consciousness, 1962,2 : 111-116.
  • 8vMurray I, Amott J L. Towards the simulation of e motion in synthetic speech: A review of the literature on human vocal emotion[J]. Journal of the Acoustic Society of America, 1993,93(2) : 1097-1108.
  • 9Ortony A, Turner T J. Whatrs basic about basic e- motions[J]. Psychological Review, 1990, 97 (3): 315-331.
  • 10Stibbard R M. Vocal expression of emotions in mon laboratory speech: An investigation of the reading/ leeds emotion in speech porject annotation data[D]. UK: University of Reading,2001.

共引文献512

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部