基于目标逼近特征和双向联想贮存器的情感语音基频转换被引量：3

F0 Transformation for Emotional Speech Synthesis Using Target Approximation Features and Bidirectional Associative Memories

下载PDF

导出

摘要提出了一种用于情感语音合成的基频转换方法.该方法使用定量目标逼近(q TA)特征作为语音音节层的基频描述,并用高斯双向联想贮存器(GBAM)实现中性合成语音音节层q TA参数向目标情感语音音节层q TA参数的转换.在模型训练阶段,首先基于中性语料库和统计参数语音合成方法构建中性语音合成系统;然后利用少量情感录音数据,将从情感语音文本对应的中性合成语音中提取的q TA参数作为源数据,将情感录音中提取的q TA参数作为目标数据,进行GBAM转换模型的训练.在情感语音合成阶段,利用训练得到的GABM模型,实现中性合成语音基频特征向目标情感的转换.实验结果表明,该方法在目标情感数据较少的情况下可以取得比最大似然线性回归(MLLR)模型自适应方法更好的情感表现力. In this paper,an F0 transformation method for emotional speech synthesis was proposed.Quantitative target approximation（qTA）features were used to represent F0 contour in syllable level.And Gaussian directional as-sociative memories（GBAM）was used to complete the transformation of syllable-level qTA parameters from synthe-sized neutral speech to target emotional recordings.In the training stage,firstly HMM-based statistical parametric speech synthesis was used to construct a neutral speech synthesis system with neutral corpus as training set.And then,with a small amount of emotional recording data,GBAM-based transformation model was trained by using the qTA parameters extracted from synthesized neutral speech corresponding to the emotional text as the source feature and the qTA parameters extracted from target emotional recordings as the target patterns of GBAM,respectively.In the generation of emotional speech,the trained GBAM model was utilized to complete the transformation of syllable-level F0 features from synthesized neutral speech to target emotional recordings.The experiment results indicate that,in the case of little emotional recording data,the proposed method performs better in emotional expressivity than the adaptation method using maximum likelihood linear regression（MLLR）.

作者凌震华高丽戴礼荣

机构地区中国科学技术大学信息科学技术学院

出处《天津大学学报（自然科学与工程技术版）》 EI CAS CSCD 北大核心 2015年第8期670-674,共5页 Journal of Tianjin University：Science and Technology

基金国家自然科学基金资助项目(61273032)

关键词情感语音合成定量目标逼近高斯双向联想贮存器基频转换 emotional speech synthesis qTA GBAM F0 transformation

分类号 TN912.33 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献11

1Yamagishi J, Onishi K, Masuko T, et al. Acousticmodeling of speaking styles and emotional expressions in HMM-based speech synthesis [J]. IEICE Transactions on Information andSystems, 2005, 88(3): 502-509.
2Masuko T, Tokuda K, Kobayashi T, et al. Voice char- acteristics conversion for HMM-based speech synthesis system [C]//Proceedings of the 1EEE International Con- ference on Acoustics, Speech, and Signal Processing. 1997: 1611-1614.
3Tamura M, Masuko T, Tokuda K, et al. Adaptation of pitch and spectrum for HMM-based speech synthesis us- ing MLLR[C]//Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Process- ing. 2001 : 805-808.
4Junichi Y, Tachibana M, Masuko T, et al. Speaking style adaptation using context clustering decision tree for HMM-based speech synthesis [C]// Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 2004: 5-8.
5Veaux C, Rodet X. Intonation conversion from neutral to expressive speech[C]//INTERSPEECH. Florence, Italy, 2011: 2765-2768.
6Tao J, Kang Y, Li A. Prosody conversion from neutral speech to emtional speech [J]. IEEE Transactions on Au- dio, Speech, and Language Processing, 2006, 14(4): 1145-1154.
7Xu Y, Wang E Q. Pitch targets and their realization: Evidence from Mandarin Chinese [J]. Speech Communi- cation, 2001, 33: 319-337.
8Prom-On S, Xu Y, Thipakorn B. Modeling tone and intonation in Mandarin and English as a process of target approximation [J]. The Journal of the Acoustical Society of America, 2009, 125(1): 405-424.
9Gao L, Ling Z H, Chen L H, et al. Improving F0 pre- diction using bidirectional associative memories and syl- lable-level F0 features for HMM-based Mandarin speech synthesis [C]//1SCSLP. Singapore, 2014: 275-279.
10Liu L J, Chen L H, Ling Z H, et al. Using bidirec- tional associative memories for joint spectral envelope modeling in voice conversion[C]// IEEE International Conference on Acoustics, Speech, and Signal Process- ing. Florence, Italy, 2014: 7884-7888.

同被引文献33

1高莉琴.从维吾尔人学汉语看第二语言习得的几个问题[J].语言文字应用,1994(1):79-85. 被引量：5
2简志华,杨震.基于混合线性变换的语声转换算法[J].电子与信息学报,2007,29(7):1700-1702. 被引量：2
3申毅,简志华,杨震.改进的GMM模型语声转换系统[J].南京邮电大学学报（自然科学版）,2007,27(5):11-15. 被引量：2
4简志华,杨震.语声转换技术发展及展望[J].南京邮电大学学报（自然科学版）,2007,27(6):88-94. 被引量：3
5Yadav J,Rao K S.Prosodic mapping using neural networksfor emotion conversion in Hindi language[J].CircuitsSystems & Signal Processing,2015(1):1-24.
6Xue Y,Hamada Y,Akagi M.Emotional speech synthesissystem based on a three-layered model using a dimensionalapproach[C].Proceedings of Asia-Pacific Signal andInformation Processing Association Summit and Conference,2015.
7Yadav J,Rao K S.Generation of emotional speech byprosody imposition on sentence,word and syllable levelfragments of neutral speech[C].Proceedings of InternationalConference on Cognitive Computing and InformationProcessing,2015.
8Tao J,Kang Y,Li A.Prosody conversion from neutralspeech to emotional speech[J].IEEE Transactions on Audio,Speech,and Language Processing,2006,14(4):1145-1154.
9Veaux C,Rodet X.Intonation conversion from neutral toexpressive speech[C].Proceedings of the INTERSPEECH2011,2011:2765-2768.
10Teutenberg J,Watson C,Riddle P.Modelling and synthesisingF0 contours with the Discrete Cosine Transform[C].Proceedings of International Conference on Acoustics,Speech,and Signal Processing(ICASSP-88),2008:3973-3976.

引证文献3

1邓叶勋,赵晖.基于非负矩阵分解的情感语音基频转换研究[J].计算机工程,2018,44(5):256-261. 被引量：1
2杜楠楠,赵晖.维吾尔语情感语音韵律转换研究[J].计算机工程与应用,2016,52(19):154-160. 被引量：2
3张雄伟,苗晓孔,曾歆,孙蒙,曹铁勇.语音转换技术研究现状及展望[J].数据采集与处理,2019,34(5):753-770. 被引量：9

二级引证文献12

1潘梦鹞,吕小勇,陈少伟,郇锐铁,王锋.基于AI智能语音技术线上教学的创新与实践[J].创新创业理论研究与实践,2022(24):170-173. 被引量：1
2邓叶勋,赵晖.基于非负矩阵分解的情感语音基频转换研究[J].计算机工程,2018,44(5):256-261. 被引量：1
3杨静.基于HMM模型的多声部乐谱音符基频识别方法研究[J].科技通报,2019,35(11):109-112. 被引量：3
4鲍薇,温正棋.声音伪造与防伪检测技术研究[J].信息技术与标准化,2020(1):54-58. 被引量：1
5李智诚,张云翔.面向电力行业的智能会议录音回溯系统[J].现代计算机,2020,26(21):37-39. 被引量：1
6张雄伟,李嘉康,孙蒙,郑琳琳.语音欺骗检测方法的研究现状及展望[J].数据采集与处理,2020,35(5):807-823. 被引量：10
7郑琳琳,张雄伟,孙蒙,李嘉康,张星昱.基于i⁃vector的电子伪装语音鲁棒还原方法研究[J].数据采集与处理,2020,35(5):880-891. 被引量：1
8张雄伟,张星昱,孙蒙,邹霞.说话人验证系统攻击方法的研究现状及展望[J].数据采集与处理,2021,36(5):831-849. 被引量：3
9杨帅,乔凯,陈健,王林元,闫镔.语音合成及伪造、鉴伪技术综述[J].计算机系统应用,2022,31(7):12-22. 被引量：8
10吕汝金,苏庚辰,徐永博.一种智能分类垃圾桶的设计研究[J].机械设计与制造,2022(7):232-234. 被引量：5

1AP3595：整流降压控制器[J].世界电子元器件,2014,0(7):23-23.
2Diodes推出适合高电流应用的双相降压控制器[J].电子设计工程,2014,22(10):146-146.
3适合高电流应用的双相降压控制器[J].今日电子,2014,0(7):68-68.
4辛化梅,薛林,苏国庆.基于BAM网络和模糊理论的雷达脉冲信号流模糊联想分选[J].现代电子技术,2004,27(13):37-38.
5王华.用户数据融合系统的网络部署[J].电信快报（网络与通信）,2013(11):10-13.
6陈芝,张玲华.基频轨迹转换算法及在语音转换系统中的应用研究[J].南京邮电大学学报（自然科学版）,2010,30(5):83-87. 被引量：1
7满光赋.舒曼SM9112-TU四波段锁环数字调谐器的改进[J].电子技术（上海）,1993,20(9):43-44.
8杨亦文,傅长松.利用“语音音节替代技术” 提高跳频电台抗干扰能力[J].军事通信技术,1994,15(1):53-56.
9董恒,宋荣方,杨洁.采用干扰对齐的无线网络容量分析[J].南京邮电大学学报（自然科学版）,2011,31(6):11-16. 被引量：2
10许宇庆,陶向明,叶高翔,葛洪良,张其瑞.用于永久性存贮器Pb(Zr_(0.52)Ti_(0.48))O_3薄膜的特性[J].真空科学与技术,1995,15(2):140-144. 被引量：3

天津大学学报（自然科学与工程技术版）

2015年第8期

浏览历史

内容加载中请稍等...

基于目标逼近特征和双向联想贮存器的情感语音基频转换被引量：3

参考文献11

同被引文献33

引证文献3

二级引证文献12

相关作者

相关机构

相关主题

浏览历史

基于目标逼近特征和双向联想贮存器的情感语音基频转换 被引量：3

参考文献11

同被引文献33

引证文献3

二级引证文献12

相关作者

相关机构

相关主题

浏览历史

基于目标逼近特征和双向联想贮存器的情感语音基频转换被引量：3