一种基于MASM的口形轮廓特征提取方法及听视觉语音识别被引量：1

A Lip Contour Extraction Method Based on Multiple Active Shape Model (MASM) for Audio Visual Speech Recognition

下载PDF

导出

摘要提出了一种用于听视觉语音识别的基于 MASM的口形轮廓提取方法 ,这种方法只需要少量的训练数据就可以实现对大量口形轮廓的准确提取。还引入了一种口形轮廓的平滑修正方法 ,该方法利用口形连续变化的特点 ,对错误轮廓进行修正。实验证明 ,利用该方法提取轮廓的准确率比常规 ASM模型高出 2 0个百分点 ;将该口形轮廓特征引入到听视觉语音识别中。 In audio visual speech recognition and lipreading, the widely used ASM (Active Shape Model) for lip contour extraction suffers from the lack of robustness and cannot extract the exact lip contours due to the various mouth shape changes when uttering. We present a more robust model——Multiple Active Shape Model (MASM). The model classifies the mouth shapes into closed mouth set, half-opened mouth set, and round mouth set. An independent ASM is built for each different set with a tiny set of the training data. The MASM contour extraction algorithm automatically selects the best accurate lip contour from multiple shape searching procedures. Considering the consecutive changes of the mouth, a method for smoothing lip contours is also presented to correct the contour extraction errors. Experimental results from AVCONDIG database show that extraction accuracy achieved by the MASM is 13% higher than that of conventional ASM. The combination of the MASM and the contour-smoothing method leads to another 7% accuracy improvement. With the fusion of the exact lip contour feature and audio MFCC (Mel Frequency Cepstral Coefficients) feature, the average word recognition rates of the considered connected-digits speech recognition task are considerably increased under noisy acoustic conditions.

作者谢磊冯伟赵荣椿

机构地区西北工业大学计算机学院

出处《西北工业大学学报》 EI CAS CSCD 北大核心 2004年第5期674-678,共5页 Journal of Northwestern Polytechnical University

基金中国科技部与比利时弗拉芒大区国际科技合作项目 (国科外 19990 2 0 9号 )资助

关键词语音识别听视觉语音识别 ASM MASM 口形轮廓提取 speech recognition, audio visual speech recognition, ASM(Active Shape Model), MASM(Multiple Active Shape Model), lip contour extraction

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献5

1[1]Summerfield Q. Some Preliminaries to a Comprehensive Account of Audio-Visual Speech Perception. In: Dodd B and Campbell R. Hearing by Eye: The Psychology of Lip-Reading. Hillsdale, USA: Lawrence Erlbaum Associates, 1987,3～51
2[2]McGurk H, McDonald J. Hearing Lips and Seeing Voices. Nature, 1976,2:746～748
3[3]Cootes T F, Taylor C J, et al. Active Shape Models --Their Training and Application. Computer Vision and Image Understanding, 1995, 12(1): 38～59
4[4]Young S J, Kershaw D, Odell J, Woodland P. The HTK Book. http://htk. eng. cam. ac. uk/docs/docs. shtml, 2002
5[5]Bourlard H, Dupone S, Riss C. Multi-Stream Speech Recognition. Technical Report IDIAP-RR96-07, IDIAP, 1996

同被引文献6

1BRAND M.Voice puppetry[C]//Proceedings of ACM SIGGRAPH 1999.Los Angeles:ACM Press,1999:21-28.
2BREGLER C,COVELL M,SLANEY M.Video rewrite:driving visual speech with audio[C]//Proc SIGGRAPH'97.Los Angeles:ACM Press,1997:353-360.
3MOK L L,LAU W H,LEUNG S H,et al.Lip features selection with application to person authentication[C/OL]//2004 IEEE,Volume 3,Issue,17-21 May 2004 Page(s):iii-397-400 vol.3,Montreal,Canada,ICASSP 2004[2006-01-10].http://ieexplore.ieee.org/Xplore/login.jsp?url=/iel5/9248/29345/01326565.pdf.
4COSATTO E,GRAF H P.Sample-based synthesis of photo-realistic talking-heads[C/OL]//Proc Computer Animation,June 1998,pp.103-110,Philadelphia,Pennsylvania,June 8-10,1998[2006-01-10].http://potal.acm.ofr/citation.cfm?id=791528.
5COVELL M.Eigen-points:control-point location using principal component analyses[C/OL]//Proceedings of Conference on Automatic Face and Gesture Recognition,P122-127,Massachusetts,USA,October 1996[2006-01-10].http://ieeexplore.ieee.org/Xplore/login.jsp?url=/ie13/4096/12122/00557253.pdf?arnumber=557253.
6MAHMOODI S,SHARIF B S,CHESTER E G,et al.Bayesian estimation of growth age using shape and texture descriptors[C/OL]//Image Processing and Its Applications,Conference Publication,Volume 2,Issue,1999 Page(s):489-493 vol.2,India,1999[2006-01-10].http://ieeexplore.ieee.org/Xplore/olgin.jsp?url:/ie15/6416/17139/00791096.pdf.

引证文献1

1孙艳丰,陈贺,贾熹滨,李敬华.基于HASM的口形特征点定位[J].北京工业大学学报,2007,33(7):726-730.

1刘颖,翁健杰,戎蒙恬.VLSI互连线的全局优化算法[J].微电子学,2003,33(6):506-508.
2梁涛.传统IP组播面临的困境及源特定组播（SSM）[J].网络电信,2004,6(8):67-69.
3于玉海,张平.基于Mobile Agent的服务移动性实现[J].电子学报,2002,30(12A):2061-2065. 被引量：3
4文献与摘要(64)[J].印制电路信息,2006(12):71-72.
5刘颖,翁健杰,戎蒙恬.用改进激活集合法优化VLSI互连线[J].微电子学与计算机,2004,21(12):203-206. 被引量：3
6于玉海,张平.移动通信网络中服务漫游的实现[J].计算机工程与应用,2002,38(22):55-59.
7王以真.静能生智慧[J].家庭影院技术,2009(1):83-83.
8裘志刚.应用非线编软件的音频波形图精确设定对型剪辑点[J].摄影与摄像,2012(6):114-115.
9EPOS.十年磨一剑——终于让电影对白与角色口型完美配合[J].家庭影院技术,2008(8):42-44.
10计算机在电子学方面的应用[J].中国无线电电子学文摘,1994(2):112-124.

西北工业大学学报

2004年第5期

浏览历史

内容加载中请稍等...

一种基于MASM的口形轮廓特征提取方法及听视觉语音识别被引量：1

参考文献5

同被引文献6

引证文献1

相关作者

相关机构

相关主题

浏览历史

一种基于MASM的口形轮廓特征提取方法及听视觉语音识别 被引量：1

参考文献5

同被引文献6

引证文献1

相关作者

相关机构

相关主题

浏览历史

一种基于MASM的口形轮廓特征提取方法及听视觉语音识别被引量：1