期刊文献+

一种基于时序损失的语音驱动面部运动方法

A Speech-Driven Facial Motion Method Based on Temporal Loss
下载PDF
导出
摘要 语音驱动3D面部运动的研究主要聚焦于拓展多说话人的3D面部运动数据以及获取高质量音频特征上,但采集3D面部运动数据往往需要高昂的成本和繁琐的标注流程,单一说话人的少量数据样本又会导致模型因为数据的稀疏性难以获取高质量音频特征。针对该问题,论文从时间序列任务中获得启发,将可微动态时间规整(Smoothed formulation of Dynamic Time Warping, Soft-DTW)应用到语音特征与面部网格(Mesh)顶点的跨模态对齐中。经过实验表明,采用Soft-DTW作为损失函数在生成面部动画的唇形吻合度方面相较于使用均方误差(Mean Squared Error, MSE)时有所提高,可以合成更高质量的面部动画。 Research on voice-driven 3D facial motion primarily focuses on expanding 3D facial motion data for multiple speakers and obtaining high-quality audio features. However, the collection of 3D facial motion data often entails high costs and a labor-intensive annotation process. Additionally, having a limited amount of data samples for a single speaker can make it challenging for models to obtain high-quality audio features due to data sparsity. To address this issue, this study draws inspiration from temporal tasks and applies the concept of Smoothed Dynamic Time Warping (Soft-DTW) to the cross-modal alignment between speech features and facial mesh vertices. Experimental results have shown that using Soft-DTW as a loss function leads to improved lip synchronization in generating facial animations compared to using Mean Squared Error (MSE). This approach enables the synthesis of higher-quality facial animations.
出处 《计算机科学与应用》 2023年第12期2521-2527,共7页 Computer Science and Application
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部