基于BIC的语音识别模型压缩算法

Speech Recognition Model Compression Algorithm Based on Bayesian Information Criterion

下载PDF

导出

摘要当对HMM(Hidden Markov Model,隐马尔科夫模型)语音模型进行GMM(Gaussian Mixture Model,混合高斯模型)区分训练增加组件时,语音模型的识别率会随着GMM的组件增多而增加,模型的大小也会增加,这就造成了语音模型的臃肿。而在移动端使用本地语音模型进行识别时,存放一个几百兆的模型很不合适。针对上述问题,本文提出将一个GMM组件数较多的语音模型利用BIC准则压缩到指定的组件数,从而在模型大小合适的情况下尽量保证模型的识别率。实验结果表明,使用本方法进行压缩之后的语音识别率比未压缩的相同组件数的语音识别模型的识别率要高。 Recognition rate of speech model will increase with the increase in the number of GMM components, the size of model will increase as well, when making the GMM recognition training for HMM speech model, and it causes model bloated. However, it is unfit for mobile devices while using speech model for recognition to keep greater than hundreds of megabytes in mobile. For this problem, a method for compress speech model based on BIC is presented. This method tries to keep recognition rate of speech model in appropriate to the size of model. Experiments demonstrate that it＇ s applicable and available to achieve the final speech model specified size even ensure recognition rate of speech model as much as possible.

作者邹灿李柏岩

机构地区东华大学计算机学院

出处《计算机与现代化》 2014年第6期71-73,78,共4页 Computer and Modernization

关键词语音识别模型压缩 BIC(贝叶斯信息准则) speech recognition model compress BIC （bayesian information criterion）

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献16

1Jurafsky D, Martin.Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition(2nd ed)[M]. Prentice Hall,2008.
2Juang B H, Rabiner L R. Hidden Markov models for speech recognition[J]. Technometrics, 1991,33(3):251-272.
3Xie Chen, Adam Eversole, Gang Li,et al. Pipelined Back-Propagation for Context-Dependent Deep Neural Networks[DB/OL]. http://research.microsoft.com/apps/pubs/?id=173312, 2012-09-10.
4Gideon Schwarz. Estimating the dimension of a model[J]. The Annals of Statistics, 1978,6(2):461-464.
5Akaike H. A new look at the statistical identication model[J]. IEEE Transactions on Automatic Control, 1974,19(6):716-723.
6Jin H, Kubala F, Schwartz R. Automatic speaker clustering[C]// Proceedings of the 1997 DARPA Speech Recognition Workshop. 1997:108-111.
7Legetter C J, Woodland P C. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models[J]. Computer Speech and Language,1995,9(2):171-185.
8Geoffrey J McLachlan, Thriyambakam Krishnan. The EM Algorithm and Extensions(2nd ed)[M]. Wiley, 2008.
9Lawrence Rabiner, Biing-Hwang Juang. Fundamentals of Speech Recognition[M]. USA: Prentice Hall, 1993.
10Akaike H. A new look at the statistical identification[J]. IEEE Transactions on Automatic Control, 1974,19(6):716-723.

二级参考文献25

1王作英.基于段长分布的HMM语音识别模型.第二届全国汉字语音识别会议[M].庐山,1989..
2－.智能机研究动态.第五届全国汉字识别、语音识别与合成系统及自然语言处理系统评测结果[M].,1994,4..
3Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon. Spoken Language Processing: A Guide to Theory,Algorithm, and System Development[M], Prentice Hall PTR, 2001.
4J. M. Huerta and R. M. Stem, Distortion-Class Modeling for Robust Speech Recognition under GSM RPE-LTP Coding[J ], in Speech Communication, 2001,34( 1 - 2) :213 - 225.
5V. Digalakis, P. Monaco and H. Murveit, Genones: generalized mixture tying in continuous hidden markovmodel-based speech recognizers [J ], IEEE Transactions on Speech and Audio Processing, July 1996,4, (4) :281 - 289.
6W. Reichl and W. Chou, Robust Decision Tree State Tying for Continuous Speech Recngnition[J], IEEE Trans. Speech and Audio Proc. , 2000,8(5) :555 - 566.
7J. Park H. Ko, CONSTRUCTION OF DECISION TREE FROM DATA DRIVEN CLUSTERING[ C], ICSLP 2002,2657 - 2660.
8J. T Chien, C. H Huang, and S. J Chen, COMPACT DECISION TREES WITH CLUSTER VALIDITY FOR SPEECH RECOGNITION[C], ICCASP 2002,2462 - 2465.
9S. Gao,J. S Zhang, S. Nakamura,C. H Lee, T. S Chua, Weighted Graph Based Decision Tree Optimization for High AccuracyAeoustic Modeling[C], ICSLP2002,1233 - 1236.
10A. Kannan, M. Ostendorf, and J. R. Rohlicek, Maximum Likelihood Clustering of Gaussians for Speech Recognition[J], IEEE Transactions on Speech and Audio Processing,July 1994,2(3):453- 355.

共引文献20

1李荪,曹峰,刘姿杉.面向算法模型的语音数据集质量评估方法研究[J].计算机科学,2022,49(S02):519-524. 被引量：2
2宁振江,杜利民.面向语音识别声学模型的汉语语料抽选方法[J].声学技术,2003,22(z2):356-358. 被引量：1
3张宜.汉语语音识别技术的研究与发展[J].广西广播电视大学学报,2003,14(4):18-22. 被引量：3
4王新民,姚天任.基于因子分析的隐马尔可夫模型及其训练算法[J].计算机工程与应用,2004,40(15):79-81. 被引量：3
5宁振江,杜利民.一种改进后的递增式语音语料抽选算法[J].中国科学院研究生院学报,2005,22(2):140-146.
6刘刚,张洪刚,郭军.不同训练样本对识别系统的影响[J].计算机学报,2005,28(11):1923-1928. 被引量：15
7李生,赵铁军.Chinese Information Processing and Its Prospects[J].Journal of Computer Science & Technology,2006,21(5):838-846. 被引量：1
8杨阳蕊,李永宏,于洪志.藏语安多方言的音联结构及统计分析[J].西北民族大学学报（自然科学版）,2008,29(2):11-16. 被引量：2
9赵晖,林成龙,唐朝京.基于视频三音子的双模态语料自动选取算法[J].计算机工程,2009,35(17):1-3. 被引量：2
10赵晖,林成龙,唐朝京.基于视频三音子的汉语双模态语料库的建立[J].中文信息学报,2009,23(5):98-103. 被引量：6

1刘晴,赵保军.尺度自适应的多模型压缩跟踪算法[J].系统工程与电子技术,2016,38(4):955-959.
2朱小燕,王昱,徐伟.基于循环神经网络的语音识别模型[J].计算机学报,2001,24(2):213-218. 被引量：23
3储岳中.一类基于贝叶斯信息准则的k均值聚类算法[J].安徽工业大学学报（自然科学版）,2010,27(4):409-412. 被引量：15
4荣蓉.一类基于概率神经网络的语音识别模型[J].山东理工大学学报（自然科学版）,2005,19(3):49-52.
5赵凯,史长琼,张理阳.基于聚类分析的P2P流量识别[J].长沙理工大学学报（自然科学版）,2010,7(3):58-62. 被引量：3
6王晓斌,温春,石昭祥.基于贝叶斯信息准则的文本主题数估计[J].计算机工程,2009,35(7):183-185. 被引量：5
7梁浩,杨光宇.基于连续隐马尔科夫的语音识别模型[J].无线互联科技,2013,10(6):56-57. 被引量：1
8听写机及其语音模型[J].科技开发动态,2003(8):24-24.
9白志杰,李弼程,彭天强.基于BIC的新闻视频近似重复帧检测方法[J].计算机应用,2009,29(6):1694-1695.
10周贤娟,赵发,冷强,杨欢.具有语音识别功能的无线传感器网络节点设计[J].单片机与嵌入式系统应用,2014,14(7):57-59.

计算机与现代化

2014年第6期

浏览历史

内容加载中请稍等...

基于BIC的语音识别模型压缩算法

参考文献16

二级参考文献25

共引文献20

相关作者

相关机构

相关主题

浏览历史