摘要
基于隐马尔可夫模型(HMM)的连续语音自动切分方法由于较高的切分精度得到了广泛的应用,然而其切分结果还不能够直接应用于基于脚本的语音拼接合成系统,需要音素边界的再调整。本文分析了不同的汉语语音音素的非线性动力学物理模型在其递归图(RP)上的表现,通过递归趋势(RT)这一衡量系统稳定性程度的量化参数,揭示了语音产生过程中的不稳定性。结合基于HMM的连续语音初始切分结果,从定位语音动力学特性突变点的角度,调整声韵母切分边界,在10、20、30毫秒基准范围内,切分精度分别提高了13.88%、4.19%、3.19%。
Although the standard HMM-based method for automatic speech segmentation exhibits superior performance compared with other approaches, the segmentation results are not accurate enough for the corpus-based concatenative speech synthesis. In this paper, we describe different topological structures using recurrence plots (RPs) for different physical modeling of speech production, e. g. periodicity for the oscillation of voiced sounds, homogeneity for the turbulent source of unvoiced sounds and abrupt changes for stop consonants. As a quantification parameter to measure the nonstationarity of speech dynamics, recurrence trend (RT) explicitly reveals such phenomena. Time-dependent recurrence trend (TDRT) is then proposed to identify the dynamical change point as the suitable Initial/ Final (I/F) boundary for mandarin speech. Experimental results show that the accuracy on the continuous mandarin speech database using HMM-hased approach can be remarkably improved with TDRT correction process by 13.88% ,4.19% ,3.19% within 10 ms, 20 ms and 30 ms respectively.
出处
《信号处理》
CSCD
北大核心
2007年第4期521-525,共5页
Journal of Signal Processing
基金
国家重点基础研究发展规划项目(973计划)(No.2005CB724303)