We present a multiview method for markerless motion capture of multiple people. The main challenge in this problem is to determine crossview correspondences for the 2 D joints in the presence of noise. We propose a 3 ...We present a multiview method for markerless motion capture of multiple people. The main challenge in this problem is to determine crossview correspondences for the 2 D joints in the presence of noise. We propose a 3 D hypothesis clustering technique to solve this problem. The core idea is to transform joint matching in 2 D space into a clustering problem in a 3 D hypothesis space. In this way, evidence from photometric appearance, multiview geometry, and bone length can be integrated to solve the clustering problem efficiently and robustly. Each cluster encodes a set of matched 2 D joints for the same person across different views, from which the 3 D joints can be effectively inferred. We then assemble the inferred 3 D joints to form full-body skeletons for all persons in a bottom–up way. Our experiments demonstrate the robustness of our approach even in challenging cases with heavy occlusion,closely interacting people, and few cameras. We have evaluated our method on many datasets, and our results show that it has significantly lower estimation errors than many state-of-the-art methods.展开更多
基金partially supported by National Natural Science Foundation of China(No.61872317)Face Unity Technology。
文摘We present a multiview method for markerless motion capture of multiple people. The main challenge in this problem is to determine crossview correspondences for the 2 D joints in the presence of noise. We propose a 3 D hypothesis clustering technique to solve this problem. The core idea is to transform joint matching in 2 D space into a clustering problem in a 3 D hypothesis space. In this way, evidence from photometric appearance, multiview geometry, and bone length can be integrated to solve the clustering problem efficiently and robustly. Each cluster encodes a set of matched 2 D joints for the same person across different views, from which the 3 D joints can be effectively inferred. We then assemble the inferred 3 D joints to form full-body skeletons for all persons in a bottom–up way. Our experiments demonstrate the robustness of our approach even in challenging cases with heavy occlusion,closely interacting people, and few cameras. We have evaluated our method on many datasets, and our results show that it has significantly lower estimation errors than many state-of-the-art methods.