期刊文献+

基于HMM模型的语音单元边界的自动切分 被引量:4

Automatic Phonetic Segmentation Using HMM Model
下载PDF
导出
摘要 基于隐尔马可夫模型(HMM)的强制对齐方法被用于文语转换系统(TTS)语音单元边界切分。为提高切分准确性,本文对HMM模型的特征选择,模型参数和模型聚类进行优化。实验表明:12维静态M e l频率倒谱系数(M FCC)是最优的语音特征;HMM模型中的状态模型采用单高斯;对于特定说话人的HMM模型,使用分类与衰退树(CART)聚类生成的绑定状态模型个数在3 000左右最优。在英文语音库中音素边界切分的实验中,切分准确率从模型优化前的77.3%提高到85.4%。 HMM models are widely used in the automatic speech recognition system to segment text-to-speech (TTS) units in the forced alignment mode. To improve the segmentation performance, the optimal acoustic feature selection and the training condition of the HMM model are discussed. Experimental results show that the static 12-D Mel-frequency cepstral coefficient (MFCC) feature is the optimal acoustic feature; the optimal number of Gaussian mixture components per state is 1; the optimal number of tied states after model clustering by the classification and regreession tree (CART) is about 3 000 for speaker-dependent tri-phone HMM models. With optimized parameters, the segmentation accuracy on English test corpus is increased from 77.3% to 85.4%.
出处 《数据采集与处理》 CSCD 北大核心 2005年第4期381-384,共4页 Journal of Data Acquisition and Processing
关键词 语音单元边界 自动切分 隐尔马可夫模型 文语转换系统 acoustic unit boundary automatic segmentation HMM text-to-speech system
  • 相关文献

参考文献6

  • 1Paulo S, Oliveira L C. DTW-based phonetic alignment using multiple acoustic features[A]. Proceeding of Eurospeech [C]. Geneva, Switzerland, 2003.309~312.
  • 2Wu Y J, Kawai H, Ni J, et al. Minimum segmentation error based discriminative training for speech synthesis application[A]. Proceeding of ICASSP [C]. 2004. 629~632.
  • 3Toledano D T, Luis A, Gómez H. Automatic phonetic segmentation[J]. IEEE Transactions on Speech and Audio Processing, 2003,11(6):617~625.
  • 4Kim Y J, Conkie A. Automatic segmentation combining an HMM-based approach and spectral boundary correction[A]. Proceeding of ICSLP[C]. 2002. 145~148.
  • 5Odell J, Ollason D, Woodland P, et al. The HTK book for HTK V3.0[M]. Cambridge, Cambridge University Press, UK,2001.116~132.
  • 6Huang X D, Acero A, Hon H W. Spoken language processing[M]. Prentice Hall PTR, Upper Saddle River, New Jersey, 2001.304~316.

同被引文献19

  • 1郝静,张刚.基于粒计算的清浊音检测算法[J].太原理工大学学报,2008,39(S1):39-41. 被引量:2
  • 2陈锴,柴佩琪.基于HMM的中文语音自动切分中的静音添加[J].计算机工程,2004,30(9):40-41. 被引量:1
  • 3王丽娟,曹志刚.TTS语音单元边界的自动切分[J].微电子学与计算机,2005,22(12):8-11. 被引量:3
  • 4李永宏,于洪志.安多藏语语音合成语料库的设计[J].西北民族大学学报(自然科学版),2006,27(1):36-39. 被引量:16
  • 5姑丽加玛丽·麦麦提艾力.基于二级语音基元及其韵律参数的UTTS技术研究与实现[D].乌鲁木齐:新疆大学,2009.
  • 6艾斯卡尔·肉孜.基于HMM的维吾尔语音合成系统的研究与实现[J].新疆大学学报,2008.
  • 7Gao Lu, Yu Hongzhi, et al. Study on SAMPA-ST for Lhasa Tibetan and Realization of Automatic Labelling System[ C]. IASP 2010. Vol I, PP- 133 - 137.
  • 8Htkbook[EB/OL].http://users.ece.gatech.edu/-antonio/htk.book/htkbook.html.
  • 9GAO Lu, YU Hong-zhi, LI Yong-hong, et al. Study on SAMPA_ST for Lhasa Tibetan and realization of automatic labelling system [ C ]//Proc of International Conference on Image Analysis and Signal Processing. 2010 : 133-137.
  • 10HTS [ EB/OL ]. http ://hts. sp. nitech, ac. jp/.

引证文献4

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部