汉语儿童情感语声合成

Affective speech synthesis of Chinese children

下载PDF

导出

摘要情感语声合成技术对于人机交互具有重要的意义。面对儿童情感语声合成所需汉语语声数据资源缺乏以及模型训练时长较长等问题,该文提出利用迁移学习实现汉语儿童情感语声合成的方法。首先基于汉语语声数据库训练深度学习模型实现中文语声端到端合成模型,再使用高质量大样本的中文情感语料库完成情感语声合成模型,最后利用自行采样的小样本汉语儿童情感语料对模型进行迁移学习实现低资源的语声合成。客观实验结果中梅尔倒谱失真指标为4.91,主观听辨实验指标分别为3.61和4.17。通过实验对比表明,该文的方法在情感语声合成技术的应用上具有良好的性能表现,并且优于现有先进的低资源情感语声合成方法。 Emotional speech synthesis technology is of great significance for human-computer interaction.Facing the lack of Chinese speech data resources required for children’s emotional speech synthesis and the long time of model training,this paper proposes a method of using transfer learning to realize Chinese children’s emotional speech synthesis.This paper first implements the Chinese speech end-to-end synthesis model based on the Chinese speech database training depth learning model,then uses the high-quality and large sample Chinese emotional corpus to complete the emotional speech synthesis model,and finally uses the self sampled small sample Chinese children’s emotional corpus to transfer the model to realize low resource speech synthesis.The objective experimental results show that the Mel cepstrum distortion index is 4.91,and the subjective auditory discrimination experimental indexes are 3.61 and 4.17 respectively.The experimental comparison shows that the method in this paper has good performance in the application of emotional speech synthesis technology,and is better than the existing advanced low resource emotional speech synthesis methods.

作者胡航烨王蔚 HU Hangye;WANG Wei(School of Educational Science,Nanjing Normal University,Nanjing 210097,China)

机构地区南京师范大学教育科学学院机器学习与认知实验室

出处《应用声学》 CSCD 北大核心 2023年第1期76-83,共8页 Journal of Applied Acoustics

基金国家社会科学基金项目(BCA150054)。

关键词儿童情感语声合成迁移学习低资源 Children Emotion speech synthesis Transfer learning Low resource

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献4

1王国梁,陈梦楠,陈蕾.一种基于Tacotron 2的端到端中文语音合成方案[J].华东师范大学学报（自然科学版）,2019(4):111-119. 被引量：13
2庄福振,罗平,何清,史忠植.迁移学习研究进展[J].软件学报,2015,26(1):26-39. 被引量：462
3都格草,才让卓玛,南措吉,算太本.基于神经网络的藏语语音合成[J].中文信息学报,2019,33(2):75-80. 被引量：10
4赵力,黄程韦.实用语音情感识别中的若干关键技术[J].数据采集与处理,2014,29(2):157-170. 被引量：34

二级参考文献136

1赵力,王治平,卢韦,邹采荣,吴镇扬.全局和时序结构特征并用的语音信号情感特征识别方法[J].自动化学报,2004,30(3):423-429. 被引量：15
2王治平,赵力,邹采荣.基于基音参数规整及统计分布模型距离的语音情感识别[J].声学学报,2006,31(1):28-34. 被引量：26
3Picard R W. Affective computing[M]. Cambridge: MIT Press, 1997.
4Picard R W. Toward computers that recognize and respond to user emotion[J]. IBM Technical Journal, 2000, 38(2): 705-719.
5Scherer K R, Banziger T. Emotional expression in prosody: A review and an agenda for future research [C]//SP2004(Speech Prosody 2004). Nara, Japan: International Speech Communication Association, 2004:355-369.
6Arnold M. Emotion and personality[J]. Psychologi- cal Aspects, 1960,1 : 11-116.
7Tomkins A S S. The negative affects[J]. Affect, Imagery, Consciousness, 1962,2 : 111-116.
8vMurray I, Amott J L. Towards the simulation of e motion in synthetic speech: A review of the literature on human vocal emotion[J]. Journal of the Acoustic Society of America, 1993,93(2) : 1097-1108.
9Ortony A, Turner T J. Whatrs basic about basic e- motions[J]. Psychological Review, 1990, 97 (3): 315-331.
10Stibbard R M. Vocal expression of emotions in mon laboratory speech: An investigation of the reading/ leeds emotion in speech porject annotation data[D]. UK: University of Reading,2001.

共引文献512

1康文杰,田苗,林岚,孙珅,吴水才.深度卷积生成对抗网络对神经影像通用数据特征的学习[J].智慧健康,2020(31):1-4. 被引量：2
2程美,王力华.医疗智能语音技术与应用综述[J].中国数字医学,2021,16(8):1-7. 被引量：7
3张政,严哲,顾汉明.基于残差网络与迁移学习的断层自动识别[J].石油地球物理勘探,2020(5):950-956. 被引量：23
4陈曙,叶俊民,刘童.一种基于领域适配的跨项目软件缺陷预测方法[J].软件学报,2020,31(2):266-281. 被引量：13
5吴锐帆,代海洋,杨坦,江颖,蔡志杰.直肠癌淋巴结转移的智能诊断研究[J].数学建模及其应用,2019,8(4):30-37. 被引量：2
6刘世晶,刘阳春,钱程,郑浩君,周捷,张成林.基于CycleGAN和注意力增强迁移学习的小样本鱼类识别[J].农业机械学报,2023,54(S01):296-302. 被引量：1
7张璐,黄琳,李备备,陈鑫,段青玲.基于多尺度融合与无锚点YOLO v3的鱼群计数方法[J].农业机械学报,2021,52(S01):237-244. 被引量：16
8张红洋,田瑞盟.基于SOLO分类理论的科学思维学业质量评价[J].湖南中学物理,2021(2):1-4. 被引量：1
9张霞,杨勇,赵力.基于复数帧段特征的语音情感识别方法[J].电子器件,2022,45(2):479-482.
10林峰,郭鹏,刘旭斌.基于叶片表面污垢预处理与CNN的风电机组叶片表面损伤识别[J].动力工程学报,2020(12):975-981. 被引量：5

1王文卿,尚卓,周智强,刘涵.基于联合卷积分析与合成稀疏表示的遥感图像分量替换融合方法[J].信号处理,2022,38(3):571-581. 被引量：1
2张雪峰,许华文,杨棉子美.一种基于条件生成对抗网络的高感知图像压缩方法[J].东北大学学报（自然科学版）,2022,43(6):783-791. 被引量：5
3万伊,杨飞然,杨军.基于Transformer编码器的合成语声检测系统[J].应用声学,2023,42(1):26-33.
4朱莉文,孙彩敬.银杏叶提取物对地卓西平致精神分裂症模型大鼠的改善作用及其机制研究[J].蚌埠医学院学报,2022,47(12):1619-1622.
5张玲,吴浪龙,康怀鑫,徐高红.口部定位疗法联合引导式教育治疗言语语言障碍儿童的疗效观察[J].听力学及言语疾病杂志,2023,31(1):26-30. 被引量：6
6赵庆展,刘汉青,田文忠,王学文.低空无人机高光谱影像失真评价指标构建[J].农业工程学报,2022,38(20):67-76.

应用声学

2023年第1期

浏览历史

内容加载中请稍等...

汉语儿童情感语声合成

参考文献4

二级参考文献136

共引文献512

相关作者

相关机构

相关主题

浏览历史