期刊文献+

一种基于改进注意力机制的实时鲁棒语音合成方法 被引量:1

A Real-time Robust Speech Synthesis Method Based on Improved Attention Mechanism
下载PDF
导出
摘要 针对现有的语音合成系统Tacotron 2中存在的注意力模型学习慢、合成语音不够鲁棒以及合成语音速度较慢等问题,提出了三点改进措施:1.采用音素嵌入作为输入,以减少一些错误发音问题;2.引入一种注意力损失来指导注意力模型的学习,以实现其快速、准确的学习能力;3.采用WaveGlow模型作为声码器,以加快语音生成的速度。在LJSpeech数据集上的实验表明,改进后的网络提高了注意力学习的速度和精度,合成语音的错误率相比基线降低了33.4%;同时,整个网络合成语音的速度相比之下提升约523倍,实时因子(Real Time Factor,RTF)为0.96,满足实时性的要求;此外,在语音质量方面,合成语音的平均主观意见分(Mean Opinion Score,MOS)达到3.88。 In order to solve the problems of the existing speech synthesis system Tacotron 2,such as that the attention model is slow to learn,the synthesized speech is not robust enough,and the synthesized speech speed is slow,three improvement measures are proposed:1. Use phoneme embedding as input to reduce some mispronunciation problem;2. Introduce an attention loss to guide the learning of the attention model to realize its fast and accurate learning ability;3. Use the WaveGlow model as a vocoder to accelerate the speed of speech generation. Experiments on the LJSpeech data set show that the improved network improves the speed and accuracy of attention learning,and the error rate of its synthesized speech is reduced by 33. 4% compared to the baseline;at the same time,the speed of synthesized speech of the entire network is increased by approximately 523 times,the Real-Time Factor(RTF)is 0. 96,which meets the realtime requirements;in addition,in terms of voice quality,the Mean Opinion Score(MOS) of synthesized speech reaches 3. 88.
作者 唐君 张连海 李嘉欣 TANG Jun;ZHANG Lianhai;LI Jiaxin(School of Information System Engineering,PLA Strategic Support Force Information Engineering University,Zhengzhou,Henan 450001,China)
出处 《信号处理》 CSCD 北大核心 2022年第3期527-535,共9页 Journal of Signal Processing
基金 国家自然科学基金资助项目(61673395)。
关键词 语音合成 注意力损失机制 Tacotron 2 WaveGlow 序列到序列 speech synthesis attention loss mechanism Tacotron 2 WaveGlow sequence to sequence
  • 相关文献

同被引文献14

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部