期刊文献+

基于多领域条件生成的语音情感转换

Emotional Voice Conversion Based on Multiple Domain Conditional Generation
下载PDF
导出
摘要 语音情感转换是在不改变话者声纹、语义的情况下,将一种情感语音转换成另一种情感语音的技术,本质是实现语音的风格迁移。主流的风格迁移技术有对抗生成技术(如CycleGAN,StarGAN)和实例规一化技术(如IN,CIN)。CIN相对于IN添加了均值方差选择性模块,具有更强的风格迁移能力。提出了将StarGAN和CIN结合的语音情感转换模型CIN-StarGAN,将CIN模块嵌入到StarGAN生成器。在ESD数据集上的实验结果表明,CINStarGAN比基于CycleGAN的情感转换模型收敛速度快28%,具有较好的风格转换能力。在多领域情感转换方法上具有潜在研究价值。 Emotional voice conversion was a technology that converted the emotion of a speech into another without changing the speaker′s timbre and semantics.Its essence was to transfer style of speech.The mainstream style transfer technologies included generative adversarial network(such as CycleGAN,Star-GAN)and instance normalization technology(such as IN,CIN).Compared with IN,CIN added a mean variance selective module,which had stronger style transfer ability.StarGAN and CIN were combined,and proposed a new speech emotion conversion model,CIN-StarGAN.The model embeded the CIN module into the StarGAN generator.The experimental results on ESD data sets showed that CIN-StarGAN converged 28%faster than CycleGAN based emotion conversion model,and had better style transfer ability.It had potential research value in multi domain emotion transfer methods.
作者 姚文翰 柯登峰 黄良杰 胡睿欣 项敏特 张劲松 YAO Wenhan;KE Dengfeng;HUANG Liangjie;HU Ruixin;XIANG Minte;ZHANG Jinsong(Department of Information Science,Beijing Language and Culture University,Beijing 100089,China)
出处 《郑州大学学报(理学版)》 CAS 北大核心 2023年第5期67-72,共6页 Journal of Zhengzhou University:Natural Science Edition
基金 汉考国际科研基金项目(HT-202011-374)。
关键词 语音情感转换 域转换 条件实例归一化 生成对抗网络 emotional speech conversion domain transfer conditional instance normalization generator adversarial network
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部