While much progress has been made in capturing high-quality facial performances using motion capture markers and shape-from-shading,high-end systems typically also rely on rotoscope curves hand-drawn on the image.Thes...While much progress has been made in capturing high-quality facial performances using motion capture markers and shape-from-shading,high-end systems typically also rely on rotoscope curves hand-drawn on the image.These curves are subjective and difficult to draw consistently;moreover,ad-hoc procedural methods are required for generating matching rotoscope curves on synthetic renders embedded in the optimization used to determine three-dimensional(3D)facial pose and expression.We propose an alternative approach whereby these curves and other keypoints are detected automatically on both the image and the synthetic renders using trained neural networks,eliminating artist subjectivity,and the ad-hoc procedures meant to mimic it.More generally,we propose using machine learning networks to implicitly define deep energies which when minimized using classical optimization techniques lead to 3D facial pose and expression estimation.展开更多
基金supported in part by the Office of Naval Research(ONR)N00014-13-1-0346,ONR N00014-17-1-2174,ARL AHPCRC W911NF-07-0027generous gifts from Amazon and Toyota+1 种基金supported in part by the VMWare Fellowship in Honor of Ole Agesensupported in part by the Stanford School of Engineering Fellowship.
文摘While much progress has been made in capturing high-quality facial performances using motion capture markers and shape-from-shading,high-end systems typically also rely on rotoscope curves hand-drawn on the image.These curves are subjective and difficult to draw consistently;moreover,ad-hoc procedural methods are required for generating matching rotoscope curves on synthetic renders embedded in the optimization used to determine three-dimensional(3D)facial pose and expression.We propose an alternative approach whereby these curves and other keypoints are detected automatically on both the image and the synthetic renders using trained neural networks,eliminating artist subjectivity,and the ad-hoc procedures meant to mimic it.More generally,we propose using machine learning networks to implicitly define deep energies which when minimized using classical optimization techniques lead to 3D facial pose and expression estimation.