期刊文献+

不同粒度嵌入单元的端到端语音合成技术研究 被引量:1

An End to End Speech Synthesis System Based on Multilevel Embedded Units in Uyghur Texts
下载PDF
导出
摘要 语音合成是人机交互、人工智能领域的核心技术,传统的语音合成技术相对复杂,需要大量领域专业知识,设计和实现的门槛较高。深度学习方法的应用,像WaveNet、Tacotron等提出降低了语音合成的实现难度。针对维吾尔语语音合成中的复杂前端,采用Tacotron端到端语音合成技术,使用文本以及所对应的音频数据作为学习模型,简化了语音合成的过程,成功实现了低资源语言维吾尔语的语音合成。为了选出最佳粒度单元进一步提高语音合成效果,在原本的Tacotron模型的基础上分别对维吾尔语词、词素、字符为不同粒度单元的文本进行语音合成实验,对结果进行主观及客观评价,得出了基于Tacotron的端到端模型中以字符为粒度单元的语音合成效果优于词和词素粒度单元的结论。这一结论有助于维吾尔语语音合成技术的更进一步发展。 Speech synthesis is among the core technologies in the field of human-computer interaction and artificial intelligence,con⁃ventional speech synthesis methods are relatively complex,requiring a large number of professional knowledge,heavy,workload,and have high barriers to design and implementation.Application of deep learning methods,such as WaveNet,Tacotron,makes it easier.To solve the problem of complex front end in Uyghur speech synthesis,an end-to-end speech synthesis system based on Tacotron model was applied,and directly used text and corresponding audio data as a learning model to successfully realize speech synthesis in low-re⁃source Uyghur languages.In addition,for the purpose of selecting the best granular unit to further improve the speech synthesis result,text-word and text-morpheme conversion modules were added to the original Tacotron model.Speech synthesis experiments were per⁃formed on the texts with different granular units of Uyghur words,morphemes,and characters.Comparison was conducted and conclud⁃ed that the speech synthesis result of character granular unit is better than word and morpheme granular unit.The conclusion can con⁃tribute to the better development of Uyghur speech synthesis technology.
作者 姑丽斯坦·奥布力喀斯木 帕力旦·吐尔逊 艾斯卡尔·艾木都拉 Gulisitan Aobulikasimu;Palidan Tuerxun;Askar Hamdulla(Software College,Xinjiang University,Urumqi 830046;School of Information Science and Engineering,Xinjiang University,Urumqi 830046)
出处 《现代计算机》 2021年第24期14-20,共7页 Modern Computer
基金 国家重点研发计划(2017YFC0820602)。
关键词 语音合成 深度学习 端到端 Tacotron 多粒度 维吾尔语 speech synthesis deep learning end-to-end tacotron different granular units uyghur language
  • 相关文献

参考文献7

二级参考文献30

共引文献41

同被引文献15

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部