一种基于共振峰分析的语音驱动人脸动画方法被引量：1

An Approach of Speech-driven Facial Animation Based on Formants Analysis

下载PDF

导出

摘要快速、高效地实现语音驱动下的唇形自动合成,以及优化语音与唇动的同步是语音驱动人脸动画的重点。提出了一种基于共振峰分析的语音驱动人脸动画的方法。对语音信号进行加窗分帧,DFT变换,再对短时音频信号的频谱进行第一、第二共振峰分析,将分析结果映射为一组控制序列,并对控制序列进行去奇异点等后处理。设定三维人脸模型的动态基本口形,以定时方式将控制序列导入模型,完成人脸动画驱动。实验结果表明,该方法简单快速,有效实现了语音和唇形的同步,动画效果连贯自然,可广泛用于各类虚拟角色的配音,缩短虚拟人物的制作周期。 Automatic synthesis of lip animation driven by speech and lip synchronization is the key issues in speech driven facial animation system. A new approach of speech-driven facial animation based on formants analysis is presented. The input audio signal is divided into partly overlapped frames and multiplied by a Hamming window, and then a DFT （Discrete Fourier Transformation） is used. In the frequency domain of the short time signal, the 1st and 2nd formant are analyzed in order to form a control sequence. Several basic dynamic mouth shapes of the 3D facial model are defined, and the control sequence is used to drive the facial movements. The results show that the input speech and facial lip animation is synchronized precisely in this way, and the effect of the animation is fluent and looks real.

作者潘晋杨卫英

机构地区上海大学影视艺术技术学院

出处《电声技术》 2009年第5期62-65,共4页 Audio Engineering

关键词语音驱动共振峰分析人脸动画语音唇形同步 speech driving formants analysis lip animation speech-lip synchronization

分类号 TN912 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献7

1MASSARO D W, BESKOW J, COHEN M M. Picture my voice: Audio to visual speech synthesis using artificial neural networks[C]// Proceedings of the 4th Annual Auditory-Visual Speech Processing Conference (AVSP'99). Santa Cruz : [s.n.], 1999 : 105-111.
2BRAND M. Voice puppetry[C]// Proceedings of the SIGGRAPH'99. Los Angeles:[s.n.], 1999 : 21-28.
3陈益强,高文,王兆其,姜大龙.基于机器学习的语音驱动人脸动画方法[J].软件学报,2003,14(2):215-221. 被引量：20
4林鑫,陈桦,王开志,王继成.语音驱动唇形自动合成算法[J].计算机工程,2007,33(17):237-238. 被引量：6
5KAKUMANU P,GUTIERREZ-OSUNA R, ESPOSITO A, et al. Speech driven facial animation [C]//Proceedings of the 2001 Workshop on Perceptive User Interfaces. Florida : [s.n.], 2001,15 : 1-5.
6涂欢,周经野,刘军发,崔国勤,谢晨.一种语音和文本联合驱动的卡通人脸动画方法[J].小型微型计算机系统,2007,28(12):2238-2241. 被引量：1
7王炳锡,屈丹,彭煊.实用语音识别基础[M].北京:国防工业出版社,2004:264-286.

二级参考文献23

1[1]Beskow J. Rule-Based visual speech synthesis. In: Proceedings of the 4th European Conference on Speech Communication and Technology. 1995. 299～302. http://www.speech.kth.se/～beskow/papers/es95rul.pdf.
2[2]Waters K, Levergood, TM. DECface : an automatic lip-synchronization algorithm for synthetic face. Technical Report, CRL 93-4, Digital Equipment Corporation, Cambridge Research Laboratory, 1993. ftp://crl.dec.com/pub/DEC/CRL/tech-reports/93.4.ps.Z.
3[3]Hong PY, Wen Z, Huang TS. IFACE: a 3D synthetic talking face. International Journal of Image and Graphics, 2001,1(1):1～8.
4[4]Ezzat T, Poggio, T. Visual speech synthesis by morphing visemes. International Journal of Computer Vision, 2000,38(1):45～57.
5[5]Yehia H, Kuratate T, Vatikiotis-Bateson E. Using speech acoustics to drive facial motion. In: Proceedings of the 14th international congress of phonetic sciences (ICPhS'99). 1999. 631～634. http://trill.berkeley.edu/ICPhS/frameless/acceptance.html.
6[6]Massaro DW, Beskow J, Cohen MM. Picture my voice: audio to visual speech synthesis using artificial neural networks. In: Proceedings of the 4th Annual Auditory-Visual Speech Processing Conference (AVSP'99). 1999. 105～111. http://mambo.ucsc.edu/ pdf/avsp9922.pdf.
7[7]Brand M. Voice puppetry. In: Proceedings of the SIGGRAPH'99. 1999. 21～28. http://www.cs.cmu.edu/～ph/869/papers/Brand- sigg99.pdf.
8[8]Ostermann J. Animation of synthetic faces in MPEG-4. Computer Animation, 1998. 49～51. http://www.research.att.com/projects/ AnimatedHead/pimages/companim3.pdf.
9[9]Zhen B, Wu XH, Liu ZM, Chi HS. An enhanced RASTA processing for speaker identification, In: Huang TY, ed. Proceedings of the International Symposium of Chinese Spoken Language Processing. Beijing: China Military Friendship Publish,2000. 251～255.
10[10]Wang AH, Bao HQ, Chen JY. Primary research on the viseme system in standard Chinese, In: Huang TY, ed. Proceedings of the International Symposium of Chinese Spoken Language Processing. Beijing: China Military Friendship Publish, 2000. 215～218.

共引文献33

1姜大龙,高文,王兆其,陈益强.面向纹理特征的真实感三维人脸动画方法[J].计算机学报,2004,27(6):750-757. 被引量：8
2叶静,董兰芳,王洵.用于语音动画合成的语音特征提取和聚类技术[J].微型机与应用,2004,23(8):47-49. 被引量：4
3陈皓,刘晓平.快速人脸动画方法[J].电脑应用技术,2004(60):30-34.
4叶静,董兰芳,王洵,万寿红.一个基于改进的HMM的人脸语音动画合成系统[J].计算机工程,2005,31(13):165-167.
5贾熹滨,尹宝才,李敬华.语音同步的可视语音合成技术研究[J].北京工业大学学报,2005,31(6):656-661. 被引量：5
6周东生,张强,魏小鹏.人脸动画中语音可视化算法研究进展[J].计算机工程与应用,2007,43(9):36-39. 被引量：3
7陈新,周东生,张强,魏小鹏.语音驱动人脸动画中语音参数的提取技术[J].计算机工程,2007,33(6):225-227.
8石如亮,李弼程,张连海,王波.基于编码比特流的说话人识别[J].信息工程大学学报,2007,8(3):323-326. 被引量：2
9林鑫,陈桦,王开志,王继成.语音驱动唇形自动合成算法[J].计算机工程,2007,33(17):237-238. 被引量：6
10林爱华,张文俊,王毅敏,赵光俊.语音驱动人脸唇形动画的实现[J].计算机工程,2007,33(18):239-241. 被引量：1

同被引文献3

1高慧,苏广川,陈善广.不同情绪状态下汉语语音的声学特征分析[J].航天医学与医学工程,2005,18(5):350-354. 被引量：23
2施晓敏,顾济华,陶智,赵鹤鸣,张晓俊.基于听觉感知小波变换的电子耳蜗CIS语音信号处理[J].微电子学与计算机,2006,23(12):41-43. 被引量：3
3陈力,杨莹春.基于邻居相似现象的情感说话人识别[J].浙江大学学报（工学版）,2012,46(10):1790-1795. 被引量：1

引证文献1

1吴树兴,刘新红.语音短时分析与合成的滤波器实现[J].信息与电脑,2019,0(15):50-52.

1陈益强,高文,王兆其,姜大龙.基于机器学习的语音驱动人脸动画方法[J].软件学报,2003,14(2):215-221. 被引量：20
2张晓东.基于LPC汉语信号共振峰分析[J].六安师专学报,2000,16(2):23-28.
3王现彬,杨洁,贾英茜,饶立婵.基于MATLAB的说话人识别系统设计与实现[J].石家庄学院学报,2016,18(3):5-8.
4倪巍,A.ROY,安李楠.变形金刚 BMW R1200ST[J].摩托车,2009(1):30-37.
5潘智刚,姚敏锋.基于语音识别的Android游戏应用[J].现代计算机,2015,21(5):36-39. 被引量：2
6杜凯.语音信号共振峰的LPC分析[J].哈尔滨师范大学自然科学学报,1998,14(2):49-52. 被引量：2
7郭玉红.微机控制录像播放系统[J].中国电化教育,1995(7):34-37.
8刘传东,吕述望,范修斌.密码学控选逻辑控制序列与输出序列的互信息[J].电子与信息学报,2003,25(10):1398-1402. 被引量：1
9张亚妮.基于MPEG-4的人脸动画技术研究[J].计算机应用与软件,2003,20(9):61-62.
10申中男.SDH网的定时方式[J].科技成果纵横,2006(6):93-94.

电声技术

2009年第5期

浏览历史

内容加载中请稍等...

一种基于共振峰分析的语音驱动人脸动画方法被引量：1

参考文献7

二级参考文献23

共引文献33

同被引文献3

引证文献1

相关作者

相关机构

相关主题

浏览历史

一种基于共振峰分析的语音驱动人脸动画方法 被引量：1

参考文献7

二级参考文献23

共引文献33

同被引文献3

引证文献1

相关作者

相关机构

相关主题

浏览历史

一种基于共振峰分析的语音驱动人脸动画方法被引量：1