期刊文献+

基于迁移学习的自适应语音合成 被引量:2

Adaptive speech synthesis based on transfer learning
下载PDF
导出
摘要 为利用少量目标语料来建立自适应的语音合成系统,提出了基于迁移学习的自适应语音合成方法。本文首先在多说话人语料数据集中,用一个256维讲者嵌入(Speaker Embedding)在模型中表征不同的说话人,然后在Fastspeech2声学模型的基础上进行改进作为声学特征提取器:尝试用参考编码器(Reference Encoder)将语音中的说话人风格进行"解耦合",以更加精确地提取到说话人的音色特征,进而训练出一个多说话人的预训练模型。在获取目标说话人少量语音(十几句话)的情况下,通过微调(Fine-tune)神经网络参数就可获得良好的自适应合成效果。最后,在目标说话人原音频和自适应生成的语音通过映射成空间向量进行对比,实验结果平均可达70%以上的相似度。
作者 孙志宏 叶焱 刘太君 许高明 Sun Zhihong;Ye Yan;Liu Taijun;Xu Gaoming
出处 《数据通信》 2021年第5期47-51,共5页
  • 相关文献

参考文献3

二级参考文献13

  • 1Shinoda K. Speaker adaptation techniques for automatic speech recognition[C] // Proceedings of APSIPA ASC. Xi'an, China:[s.n.J, 2011.
  • 2Gales MJ F. Maximum likelihood linear transformations for HMM-based speech recognition[J]. Computer Speech and Language, 1998,12(2) :75 - 98.
  • 3Chesta C, Siohan 0, Lee C H. Maximum a posteriori linear regression for hidden Markov model adaptation[C] //Proceedings of Eurospeech. Budapest, Hungary:[s. n.], 1999: 211 - 214.
  • 4Lin C H, Wang WJ. Maximum a posteriori linear regression for speaker adaptation with the prior of mean[C]// Proceedings of EUPSICO.[S. l.]: IEEE, 2000- 01-04.
  • 5Tsao Y, Isotani R, Kawai H, et al. An environment structuring framework to facilitating suitable prior density estimation for MAPLR on robust speech recognition[CJ II Proceedings of ISCSLP. Tainan ,[so n.J, 2010: 29 -32.
  • 6Hu Tingyao , Tsao v, Lee Lin-shan. Discriminative fuzzy clustering maximum a posteriori linear regression for speaker adaptationj C] II Proceedings of Interspeech. Portland, USA:[s.n.J, 2012.
  • 7WUJ, Huo Q. A study of minimum classification error (MCE) linear regression for supervised adaptation of MCE-trained continuous-density hidden Markov models[J]. IEEE Trans on Audio, Speech and Language Processing, 2007,15(2) :478 - 488.
  • 8Zhu B, Yan ZJ, Hu v, et al. Investigation on adaptation using different discriminative training criteria based linear regression and MAP[CJ II Proceedings of ISCSLP. Kunming , China:[s.n.], 2008:93-96.
  • 9Wang L, Woodland P C. MPE-based discriminative linear transform for speaker adaptation[J]. Computer Speech and Language, 2008,22(3) :256 - 272.
  • 10Pirhosseinloo Sh,Javadi Sh. A combination of maximum likelihood Bayesian framework and discriminative linear transforms for speaker adaptation[J]. InternationalJournal of Information and Electronics Engineering, 2012,2(4): 552 - 555.

共引文献16

同被引文献28

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部