期刊文献+

汉语语音视觉合成研究数据库CVSS1.0 被引量:3

CVSS1.0:A Nen Audio-Visual Database For Chinese Visual Speech Synthesis
下载PDF
导出
摘要 目前现有的双模态语音数据库多为外文,且绝大部分都是为语音识别或身份认证服务的。鉴于此,我们根据汉语语音的特点,建立了国内第一个较为完备的汉语语音视觉合成数据库CVSS1.0。它具有如下特点:包含136个单音节和265个连续发音语句的视频和音频数据,其语料规模超出目前同类数据库;语料是在汉语发音方式归类的基础上,依据汉字出现频度的高低选取,其中的独白语句涵盖了大部分的韵律结构,因此其反映的规律具有代表性;记录了脸部发音动作的三维运动信息;用绿点标出了部分MPEG4定义的脸部特征点,方便跟踪;可服务于多种视觉语音合成研究,有很高的通用性。 Audiovisual bimodal speech processing has been one of the international research focuses. Chinese visual speech synthesis research has also started. The building of bimodal speech database is very important to it. Now there are some audiovisual speech databases, but most of them are in foreign languages and for audiovisual speech recognition or person authentication. So we designed and created the Chinese visual speech synthesis database CVSS1.0. It has following advantages: It comprises two parts: 136 Chinese characters and 265 phonetically balanced sentences; its utterance material selection is based on the classification of Chinese pronunciation features; it records 3D motion of pronunciation; some facial features defined by MPEG4 are signed by green spots; it can fit the requirement of most visual speech synthesis researches.
出处 《微计算机应用》 2007年第3期260-265,共6页 Microcomputer Applications
关键词 视觉语音合成 数据库 语料 MPEG4 visual speech synthesis, database, corpus, MPEG4
  • 相关文献

参考文献6

  • 1Philippe Dauhias and Paul Deleglise. The LIUM-AVS Database: a corpus to test li Psegmentation and speechreading systems in natural conditions.Proceedings of EUROSPEECH'2003, 2, 1569- 1572
  • 2EK Patterson, S. Gurbuz, Z. Tufekci, and JN Gowdy. CUAVE: A New. Audio-Visual Database for Multimodal Human-Computer Interface. Research. ICASSP, Orlando, May 2002.
  • 3徐彦君,杜利民,李国强,张欣,周治.汉语听觉视觉双模态数据库CAVSR1.0[J].声学学报,2000,25(1):42-49. 被引量:16
  • 4K. Messer, J. Matas, J. Kittler, J. Luettin, and G. Maytre. XM2VTSDB: The extended MwVTS database, in Proc. 2nd AVBPA,Washington,DC, USA, Mar. 22 - 23 1999, 72 - 77
  • 5王志明,蔡莲红,吴志勇,陶建华.汉语文本-可视语音转换的研究[J].小型微型计算机系统,2002,23(4):474-477. 被引量:9
  • 6陈益强,高文,王兆其,姜大龙,左力.基于数据挖掘的语音驱动三维人脸动画合成[J].系统仿真学报,2002,14(4):496-500. 被引量:1

二级参考文献19

  • 1齐士钤 吕士楠 等.汉语综合资料库的设计[J].应用声学,1994,13(3):1-5.
  • 2朱维彬.汉语言语数据库自动标注系统的研究.中国科学院声学研究所博士论文[M].,1998..
  • 3林茂灿.北京话声调分布域的感知实验研究.语音研究报告[M].中国社会科学院语言研究所语音研究室,1992..
  • 4王志明 蔡莲红.汉语音节与口形关系的研究.第九届全国多媒体技术学术会议(NCMT'2000)[M].北京,2000..
  • 5朱维彬,博士学位论文,1998年
  • 6Chiou G I,IEEE Trans Image Processing,1997年,6卷,8期,1192页
  • 7张家,应用声学,1994年,13卷,3期,1页
  • 8林茂灿,语音研究报告,1992年
  • 9Beskow J. Rule-based visual speech synthesis [A]. ESCA-EURO- SPEECH'95. 4th European Conference on Speech Communication and Technology [C]. Madrid, September 1995.
  • 10Hani Yehia, Takaaki Kuratate, Eric Vatikiotis-Bateson. Using speech acoustics to drive facial motion [A]. Proc 14th international congress of phonetic sciences (ICPhS'99) [C], 1999, (1): 631-634.

共引文献23

同被引文献26

  • 1徐露,徐明星,杨大利.面向情感变化检测的汉语情感语音数据库[J].清华大学学报(自然科学版),2009(S1):1413-1418. 被引量:6
  • 2洪晓鹏,姚鸿勋,徐铭辉.基于句子级的唇读语料库及其切分算法[J].计算机工程与应用,2005,41(3):174-177. 被引量:7
  • 3王志明,蔡莲红,艾海舟.基于数据驱动方法的汉语文本-可视语音合成(英文)[J].软件学报,2005,16(6):1054-1063. 被引量:16
  • 4姜仕仁,陈水华.同一生境中强脚树莺鸣声的个体差异及多样性[J].Zoological Research,2006,27(5):473-480. 被引量:5
  • 5李刚,王蒙军,林凌.面向残疾人的汉语可视语音数据库[J].中国生物医学工程学报,2007,26(3):355-360. 被引量:3
  • 6Bondy M D, Petriu E M, Cordea M D, et al. Model-based face and lip animation for interactive virtual reality applications [C] // Proceedings of the 9th ACM International Conference on Multimedia, Ottawa, 2001:559-563.
  • 7Deng Z G, Bulut M, Neumann U, et al. Automatic dynamic expression synthesis for speech animation [C] //Proceedings of IEEE Computer Animation and Social Agents (CASA), Geneva, 2004:267-274.
  • 8Busso C, Deng Z G, Neumann U, etal. Natural head motion synthesis driven by acoustic prosody features [J]. Computer Animation and Virtual Worlds, 2005, 16(3/4):283-290.
  • 9Costa M, Chen T, Lavagetto F. Visual prosody analysis for realistic motion synthesis of 3D head models [C] // Proceedings of International Conference on Augmented Virtual Environments and 3D Imaging, Mykonos, 2001 :343- 346.
  • 10Zhang S, Wu Z Y, Meng H M, et al. Head movement synthesis based on semantic and prosodic features for a Chinese expressive avatar [C] //Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Honolulu, 2007:837-840.

引证文献3

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部