双模型语音识别中的听视觉合成和模型同步异步性实验研究被引量：3

Experimental Research on Audio Visual Fusion and on Model Asynchrony for Raising Speech Recognition Rate

下载PDF

导出

摘要研究了双模型语音识别系统中前合成和后合成两种听觉视觉合成方法 ;同时在后合成方法中引入了考虑听觉和视觉同步异步特点的复合模型。仿真实验证明了在声学噪音环境下 ,后合成方法能够带来比较理想的识别效果 ;考虑听觉和视觉同步异步性的模型可以有效地提高识别率。 Researchers have become increasingly interested in raising speech recognition rate under noisy acoustic environments. As human speech recognition rate is much higher than machine speech recognition rate because human's aural sensing is aided by visual sensing with a certain degree of asynchrony naturally existing between the two sensings, researchers, including us, have been studying audio visual fusion and model asynchrony for increasing speech recognition rate. This paper offers some progress in this research area. Section 1 uses Fig.1 to discuss two fusion methods of audio and visual sensors: early integration and late integration. Section 2 introduces 4 types of model topologies in Fig.2 to reflect the bimodal (audio visual) fusion of human speech perceptions. Section 2 also simulates asynchrony with Product HMMs (Hidden Markov Models). Fig.2(d) presents a simplified topology of Product HMM, which adopts stream state tying scheme to get robust parameter estimations while restricting the asynchrony to only a state of phoneme HMM. Section 3 gives detailed speech recognition experiments in various simulated SNR (Signal Noise Ratio) conditions based on the AVTC (Audio Visual Telephone Conversations) corpus for 6 systems including early and late integration systems. Experimental results as shown in Fig.3, in terms of recognition rate, indicate that late integration can bring better recognition performance than early integration does in noisy acoustic conditions. Fig.3 also shows that state asynchronous system outperforms state synchronous system when SNR>12dB, and state synchronous system outperforms state asynchronous system when SNR<12dB. We believe that our findings are of some help in raising the speech recognition rates under noisy acoustic environments.

作者谢磊蒋冬梅 Ilse Ravyse 赵荣椿 Hichem Sahli Werner Verhelst Jan Cornelis

机构地区西北工业大学计算机科学与工程系布鲁塞尔自由大学电子与信息处理系

出处《西北工业大学学报》 EI CAS CSCD 北大核心 2004年第2期171-175,共5页 Journal of Northwestern Polytechnical University

基金中国科技部与比利时弗拉芒大区科技合作项目 (国科外字 19990 2 0 9)

关键词语音识别双模型语音识别听觉视觉合成模型同步异步性 speech recognition, audio visual fusion, model asynchrony

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献8

1Lippmann R P. Speech Recognition by Machines and Humans. Speech Communication, 1997, 22(1): 1-15
2Chibelushi C C, et al. A Review of Speech-Based Bimodal Recognition. IEEE Trans on Multimedia, 2002, 4(1) : 23-37
3Hall D L. Mathematical Techniques in Multisensor Data Fusion. Norwood: Artech House, 1992. 18-22
4Bourlard H, et al. Multi-stream Speech Recognition. Technical Report IDIAP-RR96-07, IDIAP, 1996
5Varga P, Moore R K. Hidden Markov Model Decomposition of Speech and Noise. Proc International Conference on Acoustics, Speech and Signal Processing, Albuquerque, USA: 1990, 845-848
6Young S J, et al. The HTK Book.http ://htk. eng. cam. ac. uk/docs/docs. shtml, 2002
7Ravyse I, Reinders M, Cornelis J, Sahli H. Eye Gesture Estimation. Proc Signal Processing Symposium of IEEE Benelux Signal Processing Chapter, Hilvarenbeek, The Netherlands: 2000, 4- 7
8Gravier G, Potamianos G, Neti C. Asynchrony Modeling for Audio-Visual Speech Recognition. Proc Human Language Technology Conference, SanDiego, USA: 2002, 325-328

同被引文献23

1刘鹏,王作英.多模式汉语连续语音识别中视觉特征的提取和应用[J].中文信息学报,2004,18(4):79-84. 被引量：6
2谢磊,付中华,蒋冬梅,赵荣椿,Werner Verhelst,Hichem Sahli,Jan Conlenis.一种稳健的基于VisemicLDA的口形动态特征及听视觉语音识别[J].电子与信息学报,2005,27(1):64-68. 被引量：4
3刘鹏,王作英.Stream Weight Training Based on MCE for Audio-Visual LVCSR[J].Tsinghua Science and Technology,2005,10(2):141-144. 被引量：1
4DENG J, BOUCHARD M, YAEP T H. Feature enhancement for noisy speech recognition with a time-variant linear predictive HMM structure [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16(5) : 891 -899.
5CUI X, ALWAN A. Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR [ J]. IEEE Transactions on Speech and Audio Processing, 2005, 13(6) : 1161 - 1172.
6DUPONT S, LUETTIN J. Audio-visual speech modeling for continuous speech recognition [ J]. IEEE Transactions on Multimedia, 2000, 2(3) : 141 - 151.
7LEE J S, PARK C H. Robust audio-visual speech recognition based on late integration [ J]. IEEE Transactions on Multimedia, 2008, 10 (5) : 767 -779.
8KUMATANI K, NAKAMURA S, SHIKANO K. An adaptive integration based on product HMM for audio-visual speech recognition [ C]// Proceedings of the 2001 IEEE International Conference on Multimedia and Expo. Tokyo, Japan: [ s. n. ], 2001:813 -816.
9ZHAO H, TANG C J, YU T. Fast thresholding segmentation for image with high noise [ C]// Proceedings of the 2008 IEEE International Conference on Information and Automation. Washington, DC: IEEE Computer Society, 2008:290 - 295.
10Kumatani K,Nakamura S,Shikano K.An Adaptive Integration Based on Product HMM for Audio-visual Speech Recognition[C]// Proceedings of IEEE ICME'01.Tokyo,Japan:[s.n.],2001:1020-1023.

引证文献3

1秦伟,韦岗.多数据流隐马尔可夫模型的流权值优化方法[J].计算机应用研究,2007,24(11):100-102.
2赵晖,顾亚强,唐朝京.双模态语音识别中乘积HMM权重系数与瞬时SNR的关系研究[J].计算机应用,2009,29(B12):279-281.
3赵晖,顾亚强,唐朝京.基于乘积HMM的双模态语音识别方法[J].计算机工程,2010,36(8):7-9. 被引量：8

二级引证文献8

1李勇,李应,余清清.基于流形学习和SVM的环境声音分类[J].计算机工程,2011,37(7):288-290. 被引量：1
2梁吉光,田俊华,姜杰.基于改进HMM的文本信息抽取模型[J].计算机工程,2011,37(20):178-179. 被引量：9
3曹洁,余丽珍.改进的说话人聚类初始化和GMM的多说话人识别[J].计算机应用研究,2012,29(2):590-593. 被引量：5
4秦宇强,张雪英.语音情感中基于ZCPA的VAP模型[J].计算机工程,2012,38(2):169-171. 被引量：2
5吕兰兰,蒋冬梅,王风娜,Hichem Sahli,Werner Verhelst.基于三流DBN模型的听视觉情感识别[J].计算机工程,2012,38(5):161-162. 被引量：1
6李冠宇,孟猛.藏语拉萨话大词表连续语音识别声学模型研究[J].计算机工程,2012,38(5):189-191. 被引量：16
7赵立辉,毛竹,霍春宝,杨红喆.基于GMM-SVM的说话人识别系统研究[J].工矿自动化,2014,40(5):49-53. 被引量：7
8李欣怡,张志超.语音驱动的人脸动画研究现状综述[J].计算机工程与应用,2017,53(22):21-28. 被引量：4

1骆燕婷,刘有志,杨韬,张永传.主配网系统自动模型同步及动态数据传输技术研究[J].数字技术与应用,2014,32(10):26-26. 被引量：1
2MP4听觉视觉的共同享受[J].多媒体世界,2004(12):116-116.
3耿小峰,周娅.一种高效的P2P视频点播新技术[J].网络安全技术与应用,2008(6):74-76. 被引量：1
4周彩兰,孙琳,李素芬.AJAX在电子地图二次开发中的应用[J].武汉理工大学学报（信息与管理工程版）,2007,29(7):49-52. 被引量：5
5贾志勇,罗杰,王德强,谢立.一种基于指向路径分次删剪的移动Agent通信算法[J].计算机科学,2004,31(1):20-24. 被引量：2
6杨建仁,陈月华,肖井华.基于时空混沌的CPSK多用户扩频通信[J].通信技术,2010,43(5):98-99.
7冯新宇,吕建,曹建农.通用的移动Agent通信框架设计[J].软件学报,2003,14(5):984-990. 被引量：24
8张冠军.基于XML异构系统间的数据交换技术[J].现代电子技术,2013,36(2):45-47. 被引量：8
9林德丽,臧香伟.记录型信号量解决读者写者问题的探究[J].中国电子商情（科技创新）,2013(20):138-138.
10周治,杜利民,徐彦君.汉语听觉视觉双模态信息的互补作用[J].中国科学（E辑）,2000,30(3):283-288. 被引量：4

西北工业大学学报

2004年第2期

浏览历史

内容加载中请稍等...

双模型语音识别中的听视觉合成和模型同步异步性实验研究被引量：3

参考文献8

同被引文献23

引证文献3

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

双模型语音识别中的听视觉合成和模型同步异步性实验研究 被引量：3

参考文献8

同被引文献23

引证文献3

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

双模型语音识别中的听视觉合成和模型同步异步性实验研究被引量：3