摘要
This paper presents a novel deep neural network for designated point tracking(DPT)in a monocular RGB video,VideoInNet.More concretely,the aim is to track four designated points correlated by a local homography on a textureless planar region in the scene.DPT can be applied to augmented reality and video editing,especially in the field of video advertising.Existing methods predict the location of four designated points without appropriately considering the point correlation.To solve this problem,VideoInNet predicts the motion of the four designated points correlated by a local homography within the heatmap prediction framework.Our network refines the heatmaps of designated points through two stages.On the first stage,we introduce a context-aware and location-aware structure to learn a local homography for the designated plane in a supervised way.On the second stage,we introduce an iterative heatmap refinement module to improve the tracking accuracy.We propose a dataset focusing on textureless planar regions,named ScanDPT,for training and evaluation.We show that the error rate of VideoInNet is about 29%lower than that of the state-of-the-art approach when testing in the first 120 frames of testing videos on ScanDPT.
基金
the Key Research Projects of the Foundation Strengthening Program under Grant No.2020JCJQZD01412
the National Natural Science Foundation of China under Grant No.61832016.