期刊文献+

低资源非自回归壮语语音合成

Low-resource Non-autoregressive Zhuang Speech Synthesis
下载PDF
导出
摘要 基于FastSpeech2模型,文章提出了非自回归的壮语语音合成模型Zhuang-TTS。为了提升模型合成壮语语音的韵律,根据壮语特点及实地调查提出了一套新的壮语音系(声调、声母或辅音、韵母或元音),同时针对壮语声学特点进行了改进:(1)使用壮语音素序列表征壮语发音信息;(2)使用音素级的声学调节器(与FastPitch类似),使合成结果更加稳定;(3)使用Conformer代替FastSpeech2模型中的Transformer,同时构建了一个壮语语音合成语料库。实验结果表明,Zhuang-TTS在韵律方面的意见评分(Mean Opinion Score, MOS)达到3.90,合成实时率达8.65×10^(-2)。该模型在合成壮语语音的质量和速度方面获得了较大提升,优于Tacotron2和FastSpeech2基线模型,研究推动了壮语语音合成领域的发展。 This paper introduces a non-autoregressive Zhuang text-to-speech synthesis model,Zhuang-TTS,based on the FastSpeech2 model.To enhance the rhythmic quality of synthesized Zhuang speech,a new set of Zhuang phonetic features is proposed based on the characteristics of Zhuang language and on-field investigations.These features include tone,initial consonants or consonants,and final vowels or vowels.Improvements are made to address Zhuang language's acoustic characteristics:(i)Utilizing Zhuang phoneme sequences to represent pronunciation information;(ii)Employing a phoneme-level acoustic regulator(similar to FastPitch)for enhanced stability in synthesis results;(ili)Substituting the Conformer for the Transformer in the FastSpeech2 model,considering the acoustic characteristics of Zhuang language.Additionally,a Zhuang speech synthesis corpus is constructed.Experimental results show that Zhuang-TTS achieves a Mean Opinion Score(MOS)of 3.90 in terms of rhythm,a synthesis real-time rate of 8.65×10^(-2).The model's substantial improvements in the quality and speed of synthesizing Zhuang speech,outperforming the baseline models Tacotron2 and FastSpeech2,have also contributed to the advancement of the field of Zhuang speech synthesis.
作者 王杰 秦董洪 WANG Jie;QIN Donghong(School of Artificial Intelligence,Guangxi Minzu University,Nanning 530006,China)
出处 《中央民族大学学报(自然科学版)》 2024年第2期40-47,共8页 Journal of Minzu University of China(Natural Sciences Edition)
基金 广西科技基地和人才专项(桂科AD23026054) 广西民族大学横向科研项目(2022450016000429)。
关键词 壮语语音合成 非自回归声学模型 非自回归声码器 CONFORMER Zhuang language speech synthesis non-autoregressive acoustic model non-autoregressive vocoder Conformer
  • 相关文献

参考文献1

二级参考文献1

共引文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部