期刊文献+

基于时空相关度融合的语音唇动一致性检测算法 被引量:5

Lip Motion and Voice Consistency Algorithm Based on Fusing Spatiotemporal Correlation Degree
下载PDF
导出
摘要 本文在传统发音唇动分析模型的基础上,构建一个发音唇动时空模型.提出了唇动时域特征、空域特性与语音的相关度度量方法,以及融合时空度量的语音唇动一致性检测方法.利用唇宽、唇高与音频幅度变化之间的联系获得语音唇动的时域一致性评分;通过协惯量分析法获得语音与唇部空域特征的初始相关度,并提出了针对语音、唇动自然延时的相关度修订方法;最后将时空上的得分进行融合以判断语音唇动是否一致.初步实验结果表明,对于四种不一致音视频数据,与常用的协惯量方法相比,EER(Equal Error Rate)平均下降了约8.2%. This paper constructs a spatiotemporal lip motion model based on traditional simple pronunciation and lip motion spatial model ,and proposes methods for measuring the correlation degree between voice and the spatial ,temporal characteristic of lip motion .In addition ,a fusion scheme for the spatial and temporal correlation degree is proposed to measure the consistency of voice and lip motion .The temporal consistent score is defined as the correlation between lip shape (height and width) and the speech am-plitude .The Coinertia is used as the initial correlation degree of speech and lip spatial characteristic .Both the spatial and temporal correlation degrees are modified by audiovisual initial delay .Experimental results show that the proposed method reduces EER by about 8 .2% compared to the CoIA method .
出处 《电子学报》 EI CAS CSCD 北大核心 2014年第4期779-785,共7页 Acta Electronica Sinica
基金 国家自然科学基金(No.61301300 No.60972132) 博士后科学基金(No.2013M531850) 中央高校基本科研业务费项目华南理工大学(No.2013ZM0097)
关键词 时空特性 一致性分析 协惯量分析 相关度融合 spatiotemporal characteristic consistent analysis coinertia analysis(ColA) correlation degree fusion
  • 相关文献

参考文献18

  • 1魏耀都,谢湘,匡镜明,韩辛璐.新的全参考音视频同步感知质量评价模型[J].通信学报,2012,33(2):182-190. 被引量:2
  • 2MI Faraj, J Bigun. S ynergy of lip-motion and acoustic features in biometric speech and speaker recognition[ J]. IEEE Transac- tions on Computer,2007,56(9): 1169- 1175.
  • 3S Kumagal, K Doman, et al. Detection of inconsistency between subject and speaker based on the co-occurrence of lip motion and voice towards speech scene extraction from news videos [ A]. IEEE International Symposium on Multimedia[ C]. Cali- fornia: IEEE,2011.311 - 318.
  • 4M Slaney,M Covell. Facesync:A linear operator for measuring synchronization of video facial images and audio track [ A ].Neural Information Processing Systems [ C ]. Denver: NIPSF, 2000. 814 - 820.
  • 5N Eveno, L Besacier. A speaker independent "liveness" test for audio-visual biomelrics [ A ]. Nineth European Conference on Speech Communication and Technology [ C ]. Lisbon: ISCA, 2005. 3081 - 3084.
  • 6G ChoUet, R Landais, et al. Some experiments in audio-visual speech processing [A ]. Non-Linear Speech Processing 2007 [ C]. Paris-ISCA, 2007.28 - 56.
  • 7A Sayo, Y Kajikawa, et al. Biometrics authentication method using lip motion in utterance[ A]. 8th International Conference on Information, Communications and Signal Processing [ C ]. Singapore: IEEF., 2011.1 - 5.
  • 8AA EL-Sallam, AS Mian. Correlation based speech-video syn- chronization[ J]. Pattern Recognition Letters, 2011,32 ( 6 ) : 780 - 786.
  • 9B Goswami, C Chan, et al. Speaker authentication using video- based lip information[ A]. IEEE, International Conference on A- coustics, Speech, and Signal Processing [ C ]. Prague: IEEE, 2011.1908 - 1910.
  • 10R Goecke, B MiUar. Statistical analysis of the relationship be- tween audio and video speech parameters for Australian Eng- lish[ A]. Auditory Visual Speech Processing Conference[ C]. France: ISCA,2003.133 - 138.

二级参考文献42

  • 1L Sirovich,M Kirby. Appfication of Karhunen-Loeve procedure for the characterization of human faces[ J ]. IEEE Trans Pattern Analysis and Machine Intelligence, 1990,3( 1 ) :71 - 79.
  • 2M Turk, A Pentland. Eigenfaces for recognition[ J]. Journal of Cognitive Neuroscience, 1991,3( 1 ) : 72 - 86.
  • 3D L Swets, J Y Weng. Using discriminant eigenfeatures for image retdeval[ J ]. IEEE Trans Pattern Analysis and Machine Intelligence, 1996,18(8) : 831 - 836.
  • 4P N Belhumeur, J P Hespanha, D J Kriegman. Eigenfaces vs. Fisherfaces: recognition using class specific linear projection[ J]. IEEE Trans Pattern Analysis and Machine Intelligence, 1997,19 (7) :711 - 720.
  • 5Z M Hafed, M D Levine. Face recognition using the discrete cosine transform[ J ].International Journal of Computer Vision, 2001,43(3) : 167 - 188.
  • 6D Ramasubramanian, Y V Venkatesh. Encoding and recognition of faces based on the human visual model and DCT[ J]. Pattern Recognition, 2001,34(12) :2447 - 2458.
  • 7W Chen, J E Meng, S Wu. PCA and LDA in DCT domain [ J]. Pattern Recognition Letters,2005,26(15) :2474 - 2482.
  • 8Cremers D, Rousson M, Deriche R. A review of statistical approaches to level set segmentation: integrating color, texture, motion and shape [J]. International Journal of Computer Vision, 2007,72 ( 2 ) : 195- 215.
  • 9Cremers D, Soatto Stefano. A pseudo-distance for shape priors in level set segmentation [ C ] //Proc of IEEE Workshop on Variational, Geometric and Level Set Methods in Computer Vision. Nice : IEEE ,2003 : 1-8.
  • 10Chan Tony, Zhu Wei. Level set based shape prior segmentation [ C]//Proc of IEEE Computer Society Conference on Computer Vision and Panem Recognition. San Diego : IEEE ,2005 : 1 164-1 170.

共引文献45

同被引文献17

引证文献5

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部