韵律增强型汉语语音合成系统

Mandarin text⁃to⁃speech system with prosody enhancement

下载PDF

导出

摘要端到端语音合成(TTS)系统可以直接根据给定的字素或音素序列生成语音。当前主流的端到端语音合成系统可以为英语生成近似于人类声音的语音。然而,中文的文本不同于这类以罗马字母为基础的语言(如英语),直接将端到端语音合成框架应用于汉语时,合成音频存在较为严重的韵律问题,如断句或停顿不恰当、自然度差等。为此,结合汉语的语言特性和韵律特性,提出一种神经网络端到端韵律增强型汉语语音合成系统,该系统使用从预训练Bert模型中提取的多层次上下文特征增强端到端汉语语音合成系统的输入。在汉语语音合成公开数据集上的实验结果表明,与当前主流的端到端语音合成系统相比,该韵律增强型汉语语音合成系统可以生成更加自然且富有表现力的语音。 The end⁃to⁃end text⁃to⁃speech(TTS)system can generate speech according to a given sequence of graphemes or phonemes.At present,the main current end⁃to⁃end TTS system can generate the speech that sounds akin to human voice for the English.However,the text of the Chinese is different from that of roman⁃letter based languages like the English.When the end⁃to⁃end TTS architecture is applied to mandarin speech synthesis,there are relatively serious prosodic problems such as inappropriate pauses and poor naturalness.That′s why a neural end⁃to⁃end mandarin TTS system with prosody enhancement is proposed in combination with the language and prosody features,which uses multi⁃level context features extracted from the pre⁃trained language model to enhance the input of the end⁃to⁃end mandarin TTS system.The results of the experiments conducted on a public Chinese speech synthesis dataset show that the system can generate more natural and more expressive mandarin speech in comparison with the state⁃of⁃the⁃art speech synthesis systems.

作者牛芳吾守尔·斯拉木 NIU Fang;Wushour Silamu(College of Information Science and Engineering,Xinjiang University,Urumqi 830046,China;Multi-language Information Technology Laboratory of Xinjiang,Urumqi 830046,China;Multi-language Information Technology Research Center of Xinjiang,Urumqi 830046,China)

机构地区新疆大学信息科学与工程学院新疆多语种信息技术实验室新疆多语种信息技术研究中心

出处《现代电子技术》 2022年第13期87-92,共6页 Modern Electronics Technique

基金国家自然科学基金资助项目:维吾尔语汉语语音翻译系统关键技术研究(U1603262)

关键词文语转换语音合成汉语韵律增强 Bert模型 TTS text⁃to⁃speech speech synthesis mandarin prosody enhancement Bert model TTS

分类号 TN912.33-34 [电子电信—通信与信息系统]

引文网络
相关文献

1刘新红,吴树兴.汉语语音合成系统中影响合成自然度的主要技术因素探析[J].消费电子,2020(11):94-95.
2李秀梅.如何在小学语文教学中加强朗读训练[J].学周刊,2022(21):55-57.
3索朗旺姆.新时代涉藏国际传播中藏语专有名词罗马字母音译转写的规范化研究[J].西藏大学学报（社会科学版）,2022,37(1):65-72. 被引量：4
4游顺钊.军备更新与古汉语字素及词义的演变[J].古文字研究,2014(1):573-577.
5Yi MA,Mingyang CHEN.The Difference of Lemma Activation Between Native Speakers of English and L2 Speakers of English With L1 Chinese:Evidence From the Semantic and Phonological Priming Effects on L2 Speech Planning[J].Chinese Journal of Applied Linguistics,2022,45(1):69-88.
6郑晓纯.视觉符号的语言性研究[J].美术教育研究,2022(10):80-82. 被引量：3
7王毅.基于学习进阶的法律术语教学[J].思想政治课教学,2022(5):46-48.
8黄复香.“三”的美学意义及教学应用[J].湖南教育（中旬）（B）,2022(4):44-45.
9张弦,陈学奇.陈学奇辨治产后身痛经验浅析[J].浙江中医杂志,2022,57(3):172-172. 被引量：2
10贺新征,光焱,祝跃飞.基于Monad的可认证数据结构[J].计算机应用与软件,2022,39(4):5-13.

现代电子技术

2022年第13期

浏览历史

内容加载中请稍等...

韵律增强型汉语语音合成系统

相关作者

相关机构

相关主题

浏览历史