
一种改进的基于Viterbi的语音切分算法 被引量:4

An Improved Speech Segmentation Algorithm based on Viterbi
摘要 主要针对文本提示型说话人识别中语音切分高精确度要求的问题,在利用Viterbi算法的语音切分基础上,提出了向后平滑搜索多帧能量极小值的语音切分方法。该算法首先对0—9的每个数字建立模型,然后利用Viterbi算法对随机数字串进行切分得到初始切分点,最后利用搜索多帧能量极小值的方法更新原始切分点。实验表明,相比于传统的切分算法,在误差范围小于20ms之内,改进算法的切分准确率由82.1%提高到88%。 An improved algorithm for speech segmentation is proposed to improve the segmentation accuracy in text-prompted speaker recognition. This method, based on Viterbi algorithm, implements speech segmentation by backward smooth searching of minimum frame energy. Firstly, the models for numbers from 0 to 9 are trained individually, then the segmentation points are acquired by using Viterbi algorithm to seg- ment a series of random numbers, and finally the segmentation points are updated by smooth searching of minimum frame energy. Experimental results show that this proposed algorithm could achieve an improvement of from 82.1% to 88% in segmentation accuracy within the error range of 20ms, as compared with the traditional algorithm.
出处 《通信技术》 2015年第9期1027-1031,共5页 Communications Technology
基金 中兴通讯产学研合作研究项目(No.CON1307160001)~~
关键词 语音切分 VITERBI 多帧能量极小值 speech segmentation Viterbi minimum frame energy
  • 相关文献


  • 1何致远,胡起秀,徐光.说话人识别中语音切分算法的研究[J].计算机工程与应用,2003,39(6):55-58. 被引量:4
  • 2梁维谦,原道德,丁玉国.大词表孤立词语音识别的快速搜索算法[J].清华大学学报(自然科学版),2011,51(1):101-104. 被引量:1
  • 3张辉,杜利民.汉语连续语音识别中不同基元声学模型的复合[J].电子与信息学报,2006,28(11):2045-2049. 被引量:7
  • 4Tryfou G, Pellin M, Omologo M. Time-Frequency Reas- signed Cepstral Coefficients for Phone-Level Speech Seg- mentation [ C ]. 2014 Proceedings of the 22nd European Signal Processing Conference. 2014:2060-2064.
  • 5Stolcke A, Ryant N, Mitra V, YUAN Jia-hong. Highly Accurate Phonetic Segmentation Using Boundary Correc- tion Models and System Fusion[ C ]. 2014 IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing. 2014:5552-5556.
  • 6吕伟辰,洪青阳,王胜等.基于Viterbi-GMM的文本提示型说话人识别系统[C].第十二届全国人机语音通讯学术会议,2013.
  • 7Iosif Mporas, Alexandros Lazaridis, Todor Ganchev, Ni- kos Fakotakis. Using Hybrid HMM-based Speech Seg- mentation to Improve Synthetic Speech Quality [ C ]. In Proceedings of the 13th Pan-Hellenic Conference on Informatics, PCI 2009 : 118-122.
  • 8胡克,康世胤,郝军.中文HMM参数化语音合成系统构建[J].通信技术,2012,45(8):101-103. 被引量:2
  • 9Sainath, Tara N, Kanevsky, Dimitri, et, al. Broad Pho- netic Class Recognition in a Hidden Markov Model Frame Work Using Extended Baum Welch Transformations [ C ]. 2007 IEEE Workshop on Automatic Speech Recognition and Understanding, 2007 :pp. 305-311.


  • 1张东滨,杜利民.语音识别的自适应束剪枝方法[J].电声技术,2004,28(8):41-45. 被引量:4
  • 2黄昆.嵌入式,语音识别技术新趋向[J].中国计算机用户,2006(45):46-46. 被引量:1
  • 3Forney G D. The Viterbi algorithm [J]. Proceedings of the IEEE, 1973, 61(3) : 268-278.
  • 4Rabiner L R, Juang B H. Fundamentals of Speech Recognition [M]. Boston, M A: Prentice Hall, 1999.
  • 5Huang X D, Acero A, Hon H, et al. Spoken Language Processing: A Guide to Theory, Algorithm and System Development [M]. New Jersey: Prentice Hall, 2001.
  • 6Rabiner L R. A tutorial on hidden Markov models and selected applications in speech recognition [J]. Proceedings of the IEEE, 1989, 77(2): 257-286.
  • 7Wu G D, Lin C T. Word boundary detection with mel-scale frequency bank in noisy environment [J].IEEE Trans Speech and Audio Proc, 2000, 8(5) : 541 - 554.
  • 8董倩.鲁棒语音识别技术的研究[D].长春:吉林大学,2007.
  • 9Fiscus J G.A post-processing system to yield reduced word error rates:Recogniser Output Voting Error Reduction(ROVER).Proceedings of IEEE ASRUWorkshop:Santa Barbara,1997:347-352.
  • 10Yan Y H,et al..A dynamic cross-reference pruning strategy for multiple feature fusion at decoder run time.In Proc.EUROSPEECH'03 Geneva,2003.












使用帮助 返回顶部