It remains an interesting and challenging problem to synthesize a vivid and realistic singing face driven by music. In this paper, we present a method for this task with natural motions for the lips, facial expression...It remains an interesting and challenging problem to synthesize a vivid and realistic singing face driven by music. In this paper, we present a method for this task with natural motions for the lips, facial expression, head pose, and eyes. Due to the coupling of mixed information for the human voice and backing music in common music audio signals, we design a decouple-and-fuse strategy to tackle the challenge. We first decompose the input music audio into a human voice stream and a backing music stream. Due to the implicit and complicated correlation between the two-stream input signals and the dynamics of the facial expressions, head motions, and eye states, we model their relationship with an attention scheme, where the effects of the two streams are fused seamlessly. Furthermore, to improve the expressivenes of the generated results, we decompose head movement generation in terms of speed and direction, and decompose eye state generation into short-term blinking and long-term eye closing, modeling them separately. We have also built a novel dataset, SingingFace, to support training and evaluation of models for this task, including future work on this topic. Extensive experiments and a user study show that our proposed method is capable of synthesizing vivid singing faces, qualitatively and quantitatively better than the prior state-of-the-art.展开更多
In unstructured finite volume method,loop on different mesh components such as cells,faces,nodes,etc is used widely for the traversal of data.Mesh loop results in direct or indirect data access that affects data local...In unstructured finite volume method,loop on different mesh components such as cells,faces,nodes,etc is used widely for the traversal of data.Mesh loop results in direct or indirect data access that affects data locality significantly.By loop on mesh,many threads accessing the same data lead to data dependence.Both data locality and data dependence play an important part in the performance of GPU simulations.For optimizing a GPU-accelerated unstructured finite volume Computational Fluid Dynamics(CFD)program,the performance of hot spots under different loops on cells,faces,and nodes is evaluated on Nvidia Tesla V100 and K80.Numerical tests under different mesh scales show that the effects of mesh loop modes are different on data locality and data dependence.Specifically,face loop makes the best data locality,so long as access to face data exists in kernels.Cell loop brings the smallest overheads due to non-coalescing data access,when both cell and node data are used in computing without face data.Cell loop owns the best performance in the condition that only indirect access of cell data exists in kernels.Atomic operations reduced the performance of kernels largely in K80,which is not obvious on V100.With the suitable mesh loop mode in all kernels,the overall performance of GPU simulations can be increased by 15%-20%.Finally,the program on a single GPU V100 can achieve maximum 21.7 and average 14.1 speed up compared with 28 MPI tasks on two Intel CPUs Xeon Gold 6132.展开更多
In this paper we present a novel featurebased RGB-D camera pose optimization algorithm for real-time 3D reconstruction systems. During camera pose estimation, current methods in online systems suffer from fast-scanned...In this paper we present a novel featurebased RGB-D camera pose optimization algorithm for real-time 3D reconstruction systems. During camera pose estimation, current methods in online systems suffer from fast-scanned RGB-D data, or generate inaccurate relative transformations between consecutive frames. Our approach improves current methods by utilizing matched features across all frames and is robust for RGB-D data with large shifts in consecutive frames. We directly estimate camera pose for each frame by efficiently solving a quadratic minimization problem to maximize the consistency of3 D points in global space across frames corresponding to matched feature points. We have implemented our method within two state-of-the-art online 3D reconstruction platforms. Experimental results testify that our method is efficient and reliable in estimating camera poses for RGB-D data with large shifts.展开更多
In this paper, we present a framework allowing users to interact with geometrically complex3 D deformable objects using(multiple) haptic devices based on an extended shape matching approach. There are two major challe...In this paper, we present a framework allowing users to interact with geometrically complex3 D deformable objects using(multiple) haptic devices based on an extended shape matching approach. There are two major challenges for haptic-enabled interaction using the shape matching method. The first is how to obtain a rapid deformation propagation when a large number of shape matching clusters exist. The second is how to robustly handle the collision response when the haptic interaction point hits the particlesampled deformable volume. Our framework extends existing multi-resolution shape matching methods,providing an improved energy convergence rate. This is achieved by using adaptive integration strategies to avoid insignificant shape matching iterations during the simulation. Furthermore, we present a new mechanism called stable constraint particle coupling which ensures consistent deformable behavior during haptic interaction. As demonstrated in our experimental results, the proposed method provides natural and smooth haptic rendering as well as efficient yet stable deformable simulation of complex models in real time.展开更多
基金This work was supported in part by grants from the National Key R&D Program of China(2021YFC3300403)National Natural Science Foundation of China(62072382)Yango Charitable Foundation,and the National Science Foundation(OAC-2007661).
文摘It remains an interesting and challenging problem to synthesize a vivid and realistic singing face driven by music. In this paper, we present a method for this task with natural motions for the lips, facial expression, head pose, and eyes. Due to the coupling of mixed information for the human voice and backing music in common music audio signals, we design a decouple-and-fuse strategy to tackle the challenge. We first decompose the input music audio into a human voice stream and a backing music stream. Due to the implicit and complicated correlation between the two-stream input signals and the dynamics of the facial expressions, head motions, and eye states, we model their relationship with an attention scheme, where the effects of the two streams are fused seamlessly. Furthermore, to improve the expressivenes of the generated results, we decompose head movement generation in terms of speed and direction, and decompose eye state generation into short-term blinking and long-term eye closing, modeling them separately. We have also built a novel dataset, SingingFace, to support training and evaluation of models for this task, including future work on this topic. Extensive experiments and a user study show that our proposed method is capable of synthesizing vivid singing faces, qualitatively and quantitatively better than the prior state-of-the-art.
基金supported by National Numerical Wind tunnel project NNW2019ZT6-B18 and Guangdong Introducing Innovative&Entrepreneurial Teams under Grant No.2016ZT06D211.
文摘In unstructured finite volume method,loop on different mesh components such as cells,faces,nodes,etc is used widely for the traversal of data.Mesh loop results in direct or indirect data access that affects data locality significantly.By loop on mesh,many threads accessing the same data lead to data dependence.Both data locality and data dependence play an important part in the performance of GPU simulations.For optimizing a GPU-accelerated unstructured finite volume Computational Fluid Dynamics(CFD)program,the performance of hot spots under different loops on cells,faces,and nodes is evaluated on Nvidia Tesla V100 and K80.Numerical tests under different mesh scales show that the effects of mesh loop modes are different on data locality and data dependence.Specifically,face loop makes the best data locality,so long as access to face data exists in kernels.Cell loop brings the smallest overheads due to non-coalescing data access,when both cell and node data are used in computing without face data.Cell loop owns the best performance in the condition that only indirect access of cell data exists in kernels.Atomic operations reduced the performance of kernels largely in K80,which is not obvious on V100.With the suitable mesh loop mode in all kernels,the overall performance of GPU simulations can be increased by 15%-20%.Finally,the program on a single GPU V100 can achieve maximum 21.7 and average 14.1 speed up compared with 28 MPI tasks on two Intel CPUs Xeon Gold 6132.
文摘In this paper we present a novel featurebased RGB-D camera pose optimization algorithm for real-time 3D reconstruction systems. During camera pose estimation, current methods in online systems suffer from fast-scanned RGB-D data, or generate inaccurate relative transformations between consecutive frames. Our approach improves current methods by utilizing matched features across all frames and is robust for RGB-D data with large shifts in consecutive frames. We directly estimate camera pose for each frame by efficiently solving a quadratic minimization problem to maximize the consistency of3 D points in global space across frames corresponding to matched feature points. We have implemented our method within two state-of-the-art online 3D reconstruction platforms. Experimental results testify that our method is efficient and reliable in estimating camera poses for RGB-D data with large shifts.
基金supported by the National Science Foundation under Grant No. 1012975
文摘In this paper, we present a framework allowing users to interact with geometrically complex3 D deformable objects using(multiple) haptic devices based on an extended shape matching approach. There are two major challenges for haptic-enabled interaction using the shape matching method. The first is how to obtain a rapid deformation propagation when a large number of shape matching clusters exist. The second is how to robustly handle the collision response when the haptic interaction point hits the particlesampled deformable volume. Our framework extends existing multi-resolution shape matching methods,providing an improved energy convergence rate. This is achieved by using adaptive integration strategies to avoid insignificant shape matching iterations during the simulation. Furthermore, we present a new mechanism called stable constraint particle coupling which ensures consistent deformable behavior during haptic interaction. As demonstrated in our experimental results, the proposed method provides natural and smooth haptic rendering as well as efficient yet stable deformable simulation of complex models in real time.