期刊文献+

基于非自回归模型中文语音合成系统研究与实现 被引量:1

Research and Implementation of Chinese Speech Synthesis System Based on Non-autoregressive Model
下载PDF
导出
摘要 针对传统语音合成质量差、自然度低和自回归模型训练时间较长,效率低等问题,提出了一种基于非自回归模型的中文语音合成方法。该方法相比于自回归模型训练效率拥有大幅提升,并在声码器中采用生成对抗网络,较传统语音合成方法合成音频质量有明显提升。该方法首先输入中文汉字经过前端处理转换为音素,再通过One-hot编码转换到音素嵌入层,通过位置编码确定音素序列位置信息,编码器中前馈网络负责将音素序列转换为隐藏序列,再添加可变信息适配器预测的音频特征,最后由解码器输出梅尔频谱到声码器生成音频波形。实验数据集采用专业中文女声10000句,实验结果表明主观意见得分为3.76,在合成质量方面明显优于传统参数式语音合成方法,训练时间只需要自回归模型的15%。 Aiming at the problems of poor quality and low naturalness of traditional speech synthesis,also long training time and low efficiency of autoregressive models such as Tacotron,this paper proposes a Chinese speech synthesis method based on non-autoregressive model.Compared with the training efficiency of the autoregressive model,this method has greatly improved the training efficiency,and adopts the generative confrontation network in the vocoder,which significantly improves the synthesized audio quality compared with the traditional speech synthesis method.In this method,the input phoneme is converted to the phoneme embedding layer through One-hot encoding,and the position information of the phoneme sequence is determined by position encoding.The feed-forward block in the encoder is responsible for converting the phoneme sequence into the hidden sequence,and then adds the audio features which are predicted by the variance adapter.Finally,the decoder outputs the Mel-spectrogram to the vocoder to generate audio waveforms.The experimental data set uses 10,000 professional Chinese female voices.The experimental results show that the mean opinion score is 3.76,which is significantly better than the traditional splicing speech synthesis method in terms of synthesis quality.The training time only needs 15%of the autoregressive model.
作者 王志超 吴浩 李栋 刘益岑 WANG Zhichao;WU Hao;LI Dong;LIU Yicen(School of Automation and Information Engineering,Sichuan University of Science&Engineering,Zigong 643000;Artificial Intelligence Key Laboratory of Sichuan Province,Zigong 643000;Electric Power Research Institute of State Grid Sichuan Electric Power Company,Chengdu 610000)
出处 《计算机与数字工程》 2023年第2期325-330,335,共7页 Computer & Digital Engineering
关键词 中文语音合成 非自回归模型 自注意力 可变信息适配器 声码器 speech synthesis non-autoregressive model self-attention variance adaptor vocoder
  • 相关文献

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部