摘要
提出了一种基于条件生成对抗网络的情感语音生成技术,在引入情感条件的基础上,通过学习语音库中的情感信息,能够自主生成全新的富有指定情感的语音.生成式对抗网络是由一个判别网络和一个生成器组成.使用TensorFlow作为学习框架,利用条件GAN模型对大量情感语音进行训练,利用语音生成网络G和生成网络D构成动态"博弈过程",更好地学习观测语音情感数据的条件分布.其生成样本接近原始学习内容的自然语音信号,具有多样性,而且能够逼近符合真实情感的语音数据.所提出的解决方案在交互式情绪二进制动作捕捉IEMOCAP语料库和自建情感语料库上进行评估,并且与现有情感语音生成算法相比显示出提供更准确的结果.
An affective speech generation technology based on a conditional generative adversarial network(GAN) is proposed in this study. After the introduction of affective conditions and the learning of affective information from the phonetic database, a brand new affective speech with specified emotions can be generated independently. GAN is composed of a discrimination network and a generator. With TensorFlow as the learning framework, the conditional GAN model is employed to train plenty of affective speech, and the speech generation network G and generation network D are used to form a dynamic “game process” for better learning and observation of the conditional distribution of speech emotion data. The generated sample is close to the natural speech signal of the original learning content, which has diversity and can approximate the speech data consistent with the real emotion. The proposed solution is evaluated on the interactive emotional dyadic motion capture(IEMOCAP) corpus and the self-built emotional corpus. It generates more accurate results than the existing affective speech generation algorithms.
作者
崔新明
贾宁
周洁美慧
CUI Xin-Ming;JIA Ning;ZHOU Jie-Mei-Hui(School of Computer and Software,Dalian Neusoft Institute of Information,Dalian 116023,China)
出处
《计算机系统应用》
2022年第1期322-326,共5页
Computer Systems & Applications
基金
辽宁省教育厅校际合作项目(86896244)
大连市科技计划(2019RQ120)。