期刊文献+

嵌入式中文语音合成系统非周期成分音节层建模方法

Syllable-level modeling of voice aperiodicity contours for embedded Mandarin speech synthesis systems
原文传递
导出
摘要 当前主流参数化语音合成系统大多采用混合激励的源-滤波器模型,而非周期成分是影响合成音音质的重要参数。该文探讨了如何更有效地对非周期成分建模以及在嵌入式语音合成系统中非周期成分的模型规模如何尽可能压缩。该文通过分析得到非周期成分在中文音节中前后帧存在较强相关性,因此在一个音节内一段固定频段上,可以对非周期成分形成的一条连续轨迹通过离散余弦变换(dis-crete cosine transformation,DCT)来拟合。实验证明,该方法在保持合成音音质与基线系统相当的情况下,可以将分带非周期成分(band aperiodicity,BAP)模型的大小压缩到原来的6.64%。 The mixed excitation source-filter model is used in most statistical parametric speech synthesis systems,so voice aperiodicity is a crucial factor for synthesis voice quality perception.One problem is to improve the precision of the aperiodicity model,while another is that the aperiodicity model must be compressed for the embedded speech synthesis system.The voice aperiodicity of one frame is shown to be related to that of other frames in the time scale of one syllable.The band voice aperiodicity contours for one syllable are fitted by a discrete cosine transformation(DCT).Tests show that the band aperiodicity(BAP) model can be compressed to 6.64% of the baseline system while providing nearly the same perception quality of the synthesized speech.
出处 《清华大学学报(自然科学版)》 EI CAS CSCD 北大核心 2013年第6期767-770,780,共5页 Journal of Tsinghua University(Science and Technology)
基金 国家自然科学基金资助项目(11161140319)
关键词 非周期成分 音节层建模 语音合成 aperiodicity contour syllable-level modeling speech synthesis
  • 相关文献

参考文献17

  • 1Zen H, Tokuda K, Black A W. Statistical parametric speech synthesis [J]. Speech Communication, 2009, S1(11) : 1039-1064.
  • 2Tokuda K, Yoshimura T, Masuko T, et al. Speech parameter generation algorithms for HMM-based speech synthesis [C]//ICASSP 2000. Piscataway, NJ, USA: IEEE Press, 2000: 1315- 1318.
  • 3Zen H, Toda T, Nakamura M, et al. Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005 [J]. IEICE Trans Inf Syst, 2007, E-90D (1): 325-333.
  • 4Kawahara H, Katsuse I M, Cheveigne A D. Restructuring speech representations using a pitch adaptive time-frequency smoothing and an instantaneous frequency-based F0 extraction: Possible role of a repetitive structure in sounds [J]. Speech Communication, 1999, 27(3/4): 187-207.
  • 5Silen H, Helander E, Gabbouf M. Prediction of voice aperiodicity based on spectral representations in HMM speech synthesis [C]// Interspeech 2011. Grenoble, France: ISCA, 2011: 105 - 108.
  • 6Fujimura O. An approximation to voice aperiodicity [J]. IEEE Trans Audio Electroacoust, 1968, 16(1) : 68 - 72.
  • 7Yoshimura T, Tokuda K, Masuko T. Mixed excitation for HMM-based speech synthesis [C]// Eurospeech 2001. Grenoble, France: ISCA, 2001: 2259- 2262.
  • 8Maia R, Toda T, Zen H. An excitation model for HMM-based speech synthesis based on residual modeling [C]// 6th ISCA Workshop on Speech Synthesis (SSW6). Grenoble, France: ISCA, 2007: 2263-2267.
  • 9Stylianou Y. Applying the harmonic plus noise model in concatenative speech synthesis [J]. IEEE Trans Speech and Audio Processing, 2001, 9(1) : 21 - 29.
  • 10Drugman T, Wilfart G, Dutoit T. A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis [C]// Interspeech 2009. Grenoble, France: ISCA, 2009: 1779- 1782.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部