期刊文献+

递归趋势分析在汉语语音声韵母切分中的应用研究 被引量:5

The Application of Recurrence Trend Analysis in I/F Segmentation for Mandarin Speech
下载PDF
导出
摘要 基于隐马尔可夫模型(HMM)的连续语音自动切分方法由于较高的切分精度得到了广泛的应用,然而其切分结果还不能够直接应用于基于脚本的语音拼接合成系统,需要音素边界的再调整。本文分析了不同的汉语语音音素的非线性动力学物理模型在其递归图(RP)上的表现,通过递归趋势(RT)这一衡量系统稳定性程度的量化参数,揭示了语音产生过程中的不稳定性。结合基于HMM的连续语音初始切分结果,从定位语音动力学特性突变点的角度,调整声韵母切分边界,在10、20、30毫秒基准范围内,切分精度分别提高了13.88%、4.19%、3.19%。 Although the standard HMM-based method for automatic speech segmentation exhibits superior performance compared with other approaches, the segmentation results are not accurate enough for the corpus-based concatenative speech synthesis. In this paper, we describe different topological structures using recurrence plots (RPs) for different physical modeling of speech production, e. g. periodicity for the oscillation of voiced sounds, homogeneity for the turbulent source of unvoiced sounds and abrupt changes for stop consonants. As a quantification parameter to measure the nonstationarity of speech dynamics, recurrence trend (RT) explicitly reveals such phenomena. Time-dependent recurrence trend (TDRT) is then proposed to identify the dynamical change point as the suitable Initial/ Final (I/F) boundary for mandarin speech. Experimental results show that the accuracy on the continuous mandarin speech database using HMM-hased approach can be remarkably improved with TDRT correction process by 13.88% ,4.19% ,3.19% within 10 ms, 20 ms and 30 ms respectively.
出处 《信号处理》 CSCD 北大核心 2007年第4期521-525,共5页 Journal of Signal Processing
基金 国家重点基础研究发展规划项目(973计划)(No.2005CB724303)
关键词 隐马尔可夫模型 基于脚本的语音拼接合成系统 语音动力学 递归图 递归趋势分析 Hidden Markov model Corpus-based concatenative speech synthesis Speech dynamics Recurrence plot Recurrence trend analysis
  • 相关文献

参考文献17

  • 1Chou F,Tseng C and Lee L. A Set of Corpus-Based Textto-Speech Synthesis Technologies for Mandarin Chinese. IEEE Transactions on Speech and Audio Processing,2002, vol. 10, pp. 481 - 494.
  • 2Husson J L. Evalution of a Segmentation System Based on Multi-level Lattics. EUROSPEECH, 1999, pp. 471 - 474.
  • 3Malfrere F, Deroo O, Dutoit T, et al. Phonetic Alignment: Speech Synthesis-Based vs. Viterbi-Based. Speech Communication ,2003, vol. 40, pp. 503 - 515.
  • 4Talkin D and Wightman C W. The Aligner:Text-to-Speech Alignment Using Markov Models and a Pronunciation Dictionary. Proceedings of Second ESCA/IEEE Workshop on Speech Synthesis, 1996, pp. 89 - 92.
  • 5Hosom J P. Automatic Time Alignment of Phonemes Using Acoustic-Phonetic Information, Oregon Graduate Institute of Science and technology,2000.
  • 6van Santen J P H and Sproat R W. High-Accuracy Automatic Segmentation. EUROSPEECH, 1999, pp. 2809 - 2812.
  • 7Wu Y, Kawai H, Ni J, et al. Discriminative Training and Explicit Duration Modeling for HMM-based Automatic Segmentation. Speech Communication, 2005, vol. 47, pp. 397 -410.
  • 8Faundez-Zanuy M, Kubin G, Kleijin W B, et al. Nonlinear Speech Processing:Overview and Applications. Control and Intelligent Systems,2002, vol. 30, pp. 1 - 10.
  • 9Kleijn W B and Paliwal K K. Speech Coding and Synthesis. Elsevier Science B, V. , 1995.
  • 10Kantz H and Schreiber T. Nonlinear Time Series Analysis. Cambridge University Press, 1997.

同被引文献57

引证文献5

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部