摘要
目前现有的双模态语音数据库多为外文,且绝大部分都是为语音识别或身份认证服务的。鉴于此,我们根据汉语语音的特点,建立了国内第一个较为完备的汉语语音视觉合成数据库CVSS1.0。它具有如下特点:包含136个单音节和265个连续发音语句的视频和音频数据,其语料规模超出目前同类数据库;语料是在汉语发音方式归类的基础上,依据汉字出现频度的高低选取,其中的独白语句涵盖了大部分的韵律结构,因此其反映的规律具有代表性;记录了脸部发音动作的三维运动信息;用绿点标出了部分MPEG4定义的脸部特征点,方便跟踪;可服务于多种视觉语音合成研究,有很高的通用性。
Audiovisual bimodal speech processing has been one of the international research focuses. Chinese visual speech synthesis research has also started. The building of bimodal speech database is very important to it. Now there are some audiovisual speech databases, but most of them are in foreign languages and for audiovisual speech recognition or person authentication. So we designed and created the Chinese visual speech synthesis database CVSS1.0. It has following advantages: It comprises two parts: 136 Chinese characters and 265 phonetically balanced sentences; its utterance material selection is based on the classification of Chinese pronunciation features; it records 3D motion of pronunciation; some facial features defined by MPEG4 are signed by green spots; it can fit the requirement of most visual speech synthesis researches.
出处
《微计算机应用》
2007年第3期260-265,共6页
Microcomputer Applications