摘要
提出了一种用于情感语音合成的基频转换方法.该方法使用定量目标逼近(q TA)特征作为语音音节层的基频描述,并用高斯双向联想贮存器(GBAM)实现中性合成语音音节层q TA参数向目标情感语音音节层q TA参数的转换.在模型训练阶段,首先基于中性语料库和统计参数语音合成方法构建中性语音合成系统;然后利用少量情感录音数据,将从情感语音文本对应的中性合成语音中提取的q TA参数作为源数据,将情感录音中提取的q TA参数作为目标数据,进行GBAM转换模型的训练.在情感语音合成阶段,利用训练得到的GABM模型,实现中性合成语音基频特征向目标情感的转换.实验结果表明,该方法在目标情感数据较少的情况下可以取得比最大似然线性回归(MLLR)模型自适应方法更好的情感表现力.
In this paper,an F0 transformation method for emotional speech synthesis was proposed.Quantitative target approximation(qTA)features were used to represent F0 contour in syllable level.And Gaussian directional as-sociative memories(GBAM)was used to complete the transformation of syllable-level qTA parameters from synthe-sized neutral speech to target emotional recordings.In the training stage,firstly HMM-based statistical parametric speech synthesis was used to construct a neutral speech synthesis system with neutral corpus as training set.And then,with a small amount of emotional recording data,GBAM-based transformation model was trained by using the qTA parameters extracted from synthesized neutral speech corresponding to the emotional text as the source feature and the qTA parameters extracted from target emotional recordings as the target patterns of GBAM,respectively.In the generation of emotional speech,the trained GBAM model was utilized to complete the transformation of syllable-level F0 features from synthesized neutral speech to target emotional recordings.The experiment results indicate that,in the case of little emotional recording data,the proposed method performs better in emotional expressivity than the adaptation method using maximum likelihood linear regression(MLLR).
出处
《天津大学学报(自然科学与工程技术版)》
EI
CAS
CSCD
北大核心
2015年第8期670-674,共5页
Journal of Tianjin University:Science and Technology
基金
国家自然科学基金资助项目(61273032)