期刊文献+

基于音频匹配的藏语驱动视觉语音合成算法研究

Research on Tibetan Driven Visual Speech Synthesis Algorithm Based on Audio Matching
下载PDF
导出
摘要 为解决唇部轮廓检测精度较低、视觉语音合成效果不好的问题,提出了基于音频匹配的藏语驱动视觉语音合成算法。该算法从藏语驱动视觉语音信号中提取短时能量和过零率,并建立语音信号的短时自相关函数。首先,提取语音信号中的特征信息,以此获得藏语语音信号的基音轨迹,即音频特征;其次,建立了唇部时空分析模型,分析唇部轮廓在发音过程中变化趋势,采用主成分分析法提取唇部轮廓特征;最后,通过输入输出隐马尔可夫模型获取音频特征与唇部轮廓特征之间的关联,在音频匹配的基础上合成藏语驱动视觉语音。实验结果表明,该方法具有较高的唇部轮廓检测精度,视觉语音合成效果较好。 In order to solve the problems of low lip contour detection accuracy and poor visual speech synthesis effect,a Tibetan-driven visual speech synthesis algorithm based on audio matching is proposed.This algorithm extracts short-term energy and short-term zero-crossing rate from Tibetan-language-driven visual speech signal,establishes short-term autocorrelation function of speech signal,and extracts feature information in speech signal,so as to obtain the pitch track of Tibetan speech signal.Secondly,the temporal and spatial analysis model of lip is established to analyze the changing trend of lip contour in the pronunciation process,and the feature of lip contour is extracted by principal component analysis.Finally,the correlation between audio features and lip contour features is obtained through the input-output hidden Markov model,and Tibetan-driven visual speech is synthesized on the basis of audio matching.Experimental results show that the proposed method has high lip contour detection accuracy and good visual speech synthesis effect.
作者 韩西 梁凯 岳宇 HAN Xi;LIANG Kai;YUE Yu(Ganzi Prefecture Science and Technology Information Research Institute,Kangding 626000,China)
出处 《吉林大学学报(信息科学版)》 CAS 2024年第3期509-515,共7页 Journal of Jilin University(Information Science Edition)
基金 四川省科技计划基金资助项目(2021YFG0138)。
关键词 音频匹配 短时自相关函数 时空分析模型 主成分分析法 视觉语音合成 audio matching short time autocorrelation function spatiotemporal analysis model principal component analysis method visual speech synthesis
  • 相关文献

参考文献15

二级参考文献75

共引文献62

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部