摘要
语音合成作为智能家电语音交互功能的关键技术之一,其生成语音的质量直接影响着用户的智能交互体验。针对目前主流语音合成模型Glow TTS存在的合成语音时长固定且缺乏韵律的问题,使用基于标准化流的随机时长预测器对其进行改进优化,并以日语为研究对象进行试验。结果表明,改进后的模型在语音时长预测和语音合成质量方面均有较大提升,可为后续智能家电语音合成的优化提供指导。
Speech synthesis is one of the key technologies of voice interaction on intelligent home appliances.The quality of synthesized speech directly influences the users’intelligent interactive experiences.For the problem that the duration of synthesized speech is deterministic and lack of rhythms,utilizing the flow-based stochastic duration predictor to optimize the mainstream Glow TTS model,and an experiment was conducted with Japanese as the research object.The result indicates that the accuracy of speech duration prediction and the quality of synthesized speech are greatly improved.It can provide guidance for the following optimization study on speech synthesis of intelligent home appliances.
作者
李鹏
胡蒙
苏忠城
LI Peng;HU Meng;SU Zhongcheng(Wuxi Little Swan Electric Co.,Ltd.,Wuxi 214111)
出处
《家电科技》
2022年第S01期616-618,共3页
Journal of Appliance Science & Technology
关键词
智能家电
语音合成
Glow
TTS模型优化
标准化流
随机时长预测器
Intelligent home appliances
Speech synthesis
Glow TTS model optimization
Standard flow
Stochastic duration predictor