基于语速调整和音位属性后验概率的音素识别被引量：5

A Speaking Rate Adaptation Technique and phonological Attribute Posterior for Phone Recognition

下载PDF

导出

摘要基于语音事件检测的自动语音识别是当前研究的热点问题。针对说话人语速变化导致模型适应性差的问题,提出了一种语速自适应调整算法。该算法以语句为单位,采用连续变化的帧长与帧移间隔对语句进行归一化调整,使调整后速率与语料库平均速率一致,减小速率因素对模型训练的影响;另外,通过计算音位属性的后验概率向量间夹角,得到测试集的语速,相比采用训练模型的语速检测方法减轻了系统负担。本文将语速调整算法应用于音位属性的提取,并对音位属性特征进行非线性变换,最后采用隐马尔科夫模型进行建模,实验表明:经过语速调整后,音素的平均持续帧数较为恒定,动态变化范围减小,使得音素识别率提升了1.3%。 The event detection-based method has become state of the art technique in Automatic Speech Recognition （ASR）. The differences in speaking rate may impair the adaptation ability of acoustical models, On account of this, A novel adaptation algorithm is proposed in this paper, which adjust the frame and step size in the front end of the system with the cell of one utterance, after adaptation, the speaking rate consistent with the average rate of the speech corpus and decreasing it＇ s effect in model training. In addition, this method calculates the angle between vectors of the posterior probability to get the speed of the testing set, which eased the burden of system compared to that by training models. The algorithm was used in the pre-processing before the phonological features detection stage, and then with the nonlinear transformation, we put them as the observation of Hidden Markov Models based phone recognition systems. After the adaptation approach, the av- erage frame of one phone in an utterance becomes constant and the dynamic range decreases, therefore the phoneme classifi- cation rate increase about 1.3%.

作者许友亮张连海张文林李永彬

机构地区信息工程大学信息工程学院

出处《信号处理》 CSCD 北大核心 2012年第2期295-300,共6页 Journal of Signal Processing

基金国家自然科学基金(61175017)

关键词语速调整音位属性检测隐马尔可夫模型自动语音识别 Speaking Rate Adaptation Phonological Attributes Detection Hidden Markov Models Automat-ic Speech Recognition

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献13

1Chin-Hui Lee,Mark A.Clements,Sorin Dusan.An Overview on Automatic Speech Attribute Transcription(ASAT) [C]// Conference on the International Speech Communication Association.Antwerp,Belgium;InterSpeech Express, 2007.1825-1828.
2S.King,P.Taylor.Detection of phonological features in continuous speech recognition using neural networks[J]. Computer,Speech and Language,2000,14(4):333-353.
3M.A.Siegler,R.M.Stern.On the effects of speech rate in large vocabulary speech recognition systems[C]// International Conference on Acoustics,Speech,and Signal Processing. Detroit,MI:ICASSP express,1995.612-615.
4V.R.Gadde,K.Sonmez,H.Franco.Multirate ASR Models for Phone-class Dependent N-best List Rescoring [C]//IEEE Workshop on Automatic Speech Recognition and Understanding(ASRU ).San Juan:IEEE express, 2005.157-161.
5S.Dimopoulos,A.Potamianos,E.-F.Lussier,L.Chin-Hui. Multiple time resolution analysis of speech signal using MCE training with application to speech recognition [C]// International Conference on Acoustics,Speech, and Signal Processing.Tai Bei:IEEE express,2009. 3801-3804.
6I-F Chen,Hsin-Min Wang.Articulatory Feature Asynchrony Analysis and Compensation in Detection-Based ASR//.International Speech Communication Association, Brighton United Kingdom,2009:3059-3062.
7Zoltan Tuske,Christian Plahl,Ralf Schluter.A study on Speaker Normalized MLP Features in LVCSR[C]//Conference on the International Speech Communication Association. Florence,Italy,2011:1089-1092.
8N.Strom,.“The NICO Artificial Neural Network Toolkit”, http://nico.nikkostrom.com.
9Frantisek Grezl.Trap-Based Probabilistic Features For Automatic Speech Recognition[D].Brno,CZ:Brno University of Technology,2007.
10Afsaneh Asaei,Benjamin Picart,Herve Bourlard.Analysis of Phone Posterior Feature space Exploiting Class-Specific Sparsity And MLP-Based Similarity Measure[C]// International Conference on ICASSP.Dallas,TX:2010. 4886-4889.

二级参考文献9

1刘波,戴礼荣,王仁华,杜俊,李锦宇.基于双高斯GMM的特征参数规整及其在语音识别中的应用[J].自动化学报,2006,32(4):519-525. 被引量：4
2R. C. Gonzalez, R. E. Woods. Digital Image Processing [ M ] , New Jersey, Prentice-Hall, 2002.
3O. Viikki, K. Laurila. Cepstral Domain Segmental Feature Vector Normalization for Noise Robust Speech Recogni- tion[ J ]. Speech Communication, 1998,1 (25) : 133-147.
4Hilger F, Molan S, Ney H. Quantile based histogram e- qualization for online application. Proceedings of Interna- tional Conference of Spoken Language Proceessing, Run- die Mall,Australia, Causal Productions,2002,237-240.
5Segura J C, Benitez M C, de la Torre A, Rubio A J. Fea- ture extraction combining spectral noise reduction and cepstral histogram equalization for robust ASR [ J ]. Pro- ceedings of International Conference of Spoken Language Processing 2002, Rundle Mall, Australia, Causal Produc- tions, 2002,225-228.
6Segura J C, Benitez M C, de la Torre A. VTS residual noise compensation [ J ]. Proceedings of International Conference on Acoustics and Signal Processing 2002.Piscataway, USA, IEEE Press,2002,209-212.
7J. C. Segura, C. Benitez, ~. de la Torre, A. J. Rubio, J. Ramfrez. Cepstral Domain Segmental Nonlinear Feature Transformations for Robust Speec Recognition [ J ]. IEEE Signal Processing Letters ,2004,5( 11 ) :517-520.
8Young S,Evermann G, Hain T et al. The HTK Book (for HTK Version 3.2.1 ). 2002, http : ff htk. eng. cam. ac. uk.
9H. Y. Jun. Filtering of Filter-Bank Energies for Robust Speech Recognition [ J ]. ETRI, 3 ( 26 ), 2004,273-276.

共引文献3

1周阿转,俞一彪.采用特征空间随机映射的鲁棒性语音识别[J].计算机应用,2012,32(7):2070-2073. 被引量：5
2吕钊,吴小培,张超.鲁棒语音识别技术综述[J].安徽大学学报（自然科学版）,2013,37(5):17-24. 被引量：4
3李丹,贾桂敏,程方圆,杨金锋,郭晓静.陆空通话复诵语义自动化校验BiLSTM模型[J].信号处理,2019,35(1):57-64. 被引量：7

同被引文献31

1孙震,张江鑫.关于线性预测滤波器阶数的分析研究[J].杭州电子科技大学学报（自然科学版）,2010,30(5):153-156. 被引量：7
2Jansen A, Niyogi P. Point Process Models for Spotting Keywords in Continuous Speech[ J ]. IEEE Transactions on Audio, Speech, and 'Language Processing. 2009, 17 (8) : 1457-1470.
3Jansen A. Whole Word Discriminative Point Process Mod- els[ C ]. IEEE International Conference on Acoustics, Speech and Signal Processing, 2011:5180-5183.
4Deng L. An Overview of Deep-Structured Learning for Infor- marion Processing: APSIPA ASC 2011[C]. Xi'an: 2011.
5Mohamed A, Dahl G E, Hinton G. Acoustic Modeling Using Deep Belief Networks[ J ]. IEEE Transactions on Audio, Speech, and Language Processing. 2012, 20 (1): 14-22.
6Himon G E, Osindero S, Teh Y. A Fast Learning Algo- rithm for Deep Belief Nets [ J ]. Neural Computation. 2006, 18: 1527-1554.
7Hinron G E, Salakhutdinov R. Reducing the Dimension- ality of Data with Neural Networks[ J]. Science. 2006, 313(5786) : 504-507.
8Mostafa A. Salanm, Aboul Ella Hassanien, Aly A. Fahmy. Deep Belief Network for Clustering and Classification of a Continuous Data[ J]. IEEE Inlemational Symposium on Sig- nal Processing and Ibformation Technology, 2010: 473-477.
9Mohamed A, Sainath T, Dahl G. Deep belief networks using discriminative features for phone recognition [ C ]. IEEE International Conference on Acoustics, Speech and Signal Processing, 2011: 5060-5063.
10Pan J, Liu C, Wang Z, Hu Y, Jiang H. Investigalion of Deep Neural Networks (DNN) for Large Vocabulary Con-tinuous Speech Recognition Why DNN Surpasses GMMs in Acoustic Modeling. In Proceedings of International Sympo- sium on Chinese Spoken Language Processing 2012, un- published.

引证文献5

1陆俊,张琼,杨俊安,王一,刘辉.嵌入深度信念网络的点过程模型用于关键词检出[J].信号处理,2013,29(7):865-872. 被引量：5
2洪学敏,刘惠华.利用极点轨迹图探讨语速对语音共振峰的影响[J].北京信息科技大学学报（自然科学版）,2015,30(5):57-60.
3杨金霄,沈天飞,滕秋霞.基于声门激励的语音语速、音量调整方法[J].电子测量技术,2016,39(2):72-75. 被引量：3
4王民,苏利博,王稚慧,要趁红.采用STRAIGHT模型和深度信念网络的语音转换方法[J].计算机工程与科学,2016,38(9):1950-1954. 被引量：4
5王民,黄斐,刘利,卫铭斐,王明明.采用深度信念网络的语音转换方法[J].计算机工程与应用,2016,52(15):168-171. 被引量：2

二级引证文献14

1潘梦鹞,吕小勇,陈少伟,郇锐铁,王锋.基于AI智能语音技术线上教学的创新与实践[J].创新创业理论研究与实践,2022(24):170-173. 被引量：2
2肖同录,赵增顺.基于深度信念网络的短期电力负荷预测[J].电子世界,2014(10):186-187. 被引量：7
3王培良,夏春江.基于PCA-PDBNs的故障检测与自学习辨识[J].仪器仪表学报,2015,36(5):1147-1154. 被引量：21
4王飞,李强.基于改进的深度信念网络的人脸识别算法研究[J].兰州交通大学学报,2016,35(1):42-47. 被引量：4
5伍忠东,王飞.基于PCA-GA-DBNs的人脸识别算法研究[J].西北师范大学学报（自然科学版）,2016,52(3):43-48. 被引量：2
6韩勇,赵宇红.耦合MMSE和WEDM幅度谱估计的语音增强方法[J].国外电子测量技术,2016,35(10):25-29. 被引量：1
7赵东辉,杨俊友,王义娜,王硕玉.基于规则进化模糊系统的步行方向意图识别[J].仪器仪表学报,2017,38(11):2615-2625. 被引量：5
8李金科,王朝宇,刘慧敏.基于FPGA的乐器音色识别硬件系统设计[J].电子测量技术,2018,41(14):117-121. 被引量：2
9李建文,杨亚威.基于移动设备的听障人特定语音识别训练系统[J].河南科技学院学报（自然科学版）,2019,47(1):67-73. 被引量：2
10王文浩,张筱,万永菁.改进深度信念网络在语音转换中的应用[J].浙江大学学报（工学版）,2019,53(12):2372-2380. 被引量：1

1许友亮,张连海,牛铜.基于音位属性和边界信息的音素识别[J].数据采集与处理,2013,28(2):178-183. 被引量：6
2许友亮,张连海,屈丹,牛铜.基于长时性特征的音位属性检测方法[J].计算机工程,2012,38(11):160-162.
3陆明明,张连海,牛铜.基于音位属性检测的PSPL改进方法[J].信息工程大学学报,2012,13(4):426-431.
4陆明明,张连海,屈丹,牛铜.一种融合音位属性的语音文档索引方法[J].计算机工程,2012,38(19):159-162.
5优必选与亚马逊合作推出人形机器人Lynx[J].智能机器人,2017,0(1):17-17.
6赵林.华为Voice Internet业务——带给您全新的感受[J].电信技术,2003(1):86-86.
7俞铁城.适用于自动语音识别的声道参数[J].物理,1998,27(2):125-125.
8李健,王作英.语音识别中段长相关信息的利用[J].计算机工程与应用,2003,39(25):68-70. 被引量：1
9杨金霄,沈天飞,滕秋霞.基于声门激励的语音语速、音量调整方法[J].电子测量技术,2016,39(2):72-75. 被引量：3
10李立永,张连海.基于区分性特征的音素识别[J].信息工程大学学报,2013,14(6):692-699.

信号处理

2012年第2期

浏览历史

内容加载中请稍等...

基于语速调整和音位属性后验概率的音素识别被引量：5

参考文献13

二级参考文献9

共引文献3

同被引文献31

引证文献5

二级引证文献14

相关作者

相关机构

相关主题

浏览历史

基于语速调整和音位属性后验概率的音素识别 被引量：5

参考文献13

二级参考文献9

共引文献3

同被引文献31

引证文献5

二级引证文献14

相关作者

相关机构

相关主题

浏览历史

基于语速调整和音位属性后验概率的音素识别被引量：5