摘要
当前主流参数化语音合成系统大多采用混合激励的源-滤波器模型,而非周期成分是影响合成音音质的重要参数。该文探讨了如何更有效地对非周期成分建模以及在嵌入式语音合成系统中非周期成分的模型规模如何尽可能压缩。该文通过分析得到非周期成分在中文音节中前后帧存在较强相关性,因此在一个音节内一段固定频段上,可以对非周期成分形成的一条连续轨迹通过离散余弦变换(dis-crete cosine transformation,DCT)来拟合。实验证明,该方法在保持合成音音质与基线系统相当的情况下,可以将分带非周期成分(band aperiodicity,BAP)模型的大小压缩到原来的6.64%。
The mixed excitation source-filter model is used in most statistical parametric speech synthesis systems,so voice aperiodicity is a crucial factor for synthesis voice quality perception.One problem is to improve the precision of the aperiodicity model,while another is that the aperiodicity model must be compressed for the embedded speech synthesis system.The voice aperiodicity of one frame is shown to be related to that of other frames in the time scale of one syllable.The band voice aperiodicity contours for one syllable are fitted by a discrete cosine transformation(DCT).Tests show that the band aperiodicity(BAP) model can be compressed to 6.64% of the baseline system while providing nearly the same perception quality of the synthesized speech.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2013年第6期767-770,780,共5页
Journal of Tsinghua University(Science and Technology)
基金
国家自然科学基金资助项目(11161140319)
关键词
非周期成分
音节层建模
语音合成
aperiodicity contour
syllable-level modeling
speech synthesis