摘要
基于隐尔马可夫模型(HMM)的强制对齐方法被用于文语转换系统(TTS)语音单元边界切分。为提高切分准确性,本文对HMM模型的特征选择,模型参数和模型聚类进行优化。实验表明:12维静态M e l频率倒谱系数(M FCC)是最优的语音特征;HMM模型中的状态模型采用单高斯;对于特定说话人的HMM模型,使用分类与衰退树(CART)聚类生成的绑定状态模型个数在3 000左右最优。在英文语音库中音素边界切分的实验中,切分准确率从模型优化前的77.3%提高到85.4%。
HMM models are widely used in the automatic speech recognition system to segment text-to-speech (TTS) units in the forced alignment mode. To improve the segmentation performance, the optimal acoustic feature selection and the training condition of the HMM model are discussed. Experimental results show that the static 12-D Mel-frequency cepstral coefficient (MFCC) feature is the optimal acoustic feature; the optimal number of Gaussian mixture components per state is 1; the optimal number of tied states after model clustering by the classification and regreession tree (CART) is about 3 000 for speaker-dependent tri-phone HMM models. With optimized parameters, the segmentation accuracy on English test corpus is increased from 77.3% to 85.4%.
出处
《数据采集与处理》
CSCD
北大核心
2005年第4期381-384,共4页
Journal of Data Acquisition and Processing