摘要
三维模板跟踪旨在将预先构建的三维CAD模型与输入图像中的相应目标进行精确配准,在增强现实、机器人等领域具有重要的应用,也是计算机视觉领域的关键问题之一.近年来,三维模板跟踪的准确率和稳定性都得到了持续提升,但仅有少量的工作关注三维模板跟踪数据集的构建.随着深度学习的普及,各领域中大规模数据集的构建越来越被重视,为算法的训练、测试和评估奠定了基础,极大地推动了相关领域的发展.以往的三维模板跟踪数据集大多存在规模有限,画面不够自然、真实,多样性不足等问题.基于此,本文创建了一个大规模的基于真实感渲染的三维模板跟踪数据集(Render Dataset for Object Tracking,简称RDOT),其包含多种不同结构和材质的物体、复杂的运动模式,并且在场景、光照、噪声、运动模糊和遮挡等方面有丰富细致的设置,是目前三维模板跟踪领域最大的数据集,满足三维模板跟踪算法评估的各种需求.针对现有三维模板跟踪算法测评时使用的数据集不统一,测评结果难以客观全面地反映算法性能的问题,本文基于所构建的数据集,利用平均边缘距离、平均表面距离和重初始化率三种度量标准全面评估了目前主流的三维模板跟踪算法,并对评测结果进行了深入的分析讨论,给出了全面的分析报告和技术展望.此外,基于所构建的数据集,本文提出了对跟踪结果建立误差分析模型,并对结果进行校正的方法,有效改善了三维模版跟踪算法的准确率.
3D template tracking aims to accurately align pre-constructed 3D CAD models with the corresponding targets in the input images,and has important applications in augmented reality and robotics.It is also one of the key problems in the field of computer vision.In recent years,various approaches have been proposed to improve the accuracy and robustness of 3D template tracking,but only a small amount of work has contributed to the construction of 3D template tracking datasets.With the development and wide applications of deep learning,the construction of large-scale datasets in various fields has been paid more and more attention,laying the foundation for the training,testing and evaluation of algorithms,which has greatly promoted the development of related fields.Previous datasets for 3D template tracking are acquired by either video capture or computer rendering.Video-captured datasets are realistic,but since the pose is computed based on hand-crafted markers,the accuracy of the ground-truth pose is not guaranteed and the size of these datasets are also limited due to the time-consuming labelling process.Computer-rendered datasets could be synthesized massively,but the quality of rendered image sequences is limited by the adopted render techniques.Altogether,previous datasets suffer from problems such as limited scale,inaccurate ground-truth poses,unrealistic images and insufficient diversity of model settings,therefore it is meaningful and challenging to construct a high-quality and large-scale dataset for 3D template tracking.In this paper,we propose to construct a large-scale 3D template tracking dataset RDOT(Render Dataset for Object Tracking)based on photorealistic rendering.RDOT is rendered with photorealistic rendering method.The model set contains tens of objects with different physical structures and realistic materials,it also allows the camera and objects to move in pre-defined complex motion modes.Moreover,compared with previous datasets,RDOT takes more accurate control of settings of rendering scenes,it offers various detailed settings of lighting,noise,motion blur and occlusion in different degrees of difficulty.To the best of our knowledge,RDOT is currently the largest 3D template tracking dataset which meets the demands of performance evaluation.Based on RDOT,we evaluated previous 3D template tracking methods in an objective and fair way.Previous approaches have been evaluated on different datasets that suffer the aforementioned problems.In our evaluation,the tracking methods are evaluated with three precision metrics,including ADE(Average Edge Distance),ASD(Average Surface Distance)and RR(Reinitialization Rate).We analyze the evaluation results from multiple aspects considering structures of objects,materials of objects and different settings of rendering scenes.In addition,since RGB-based 3D tracking method usually produce significant errors in the depth direction due to the missing of depth constraint,we propose a statistical model of tracking errors that can be computed based on the accurate ground-truth pose of RDOT.By applying the error model to compensate the resulting object pose parameters,the tracking accuracy can be improved significantly.Finally,we discuss the disadvantages of different tracking approaches,and give an overall conclusion and perspective for future 3D template tracking approaches.
作者
何弦
李佳宸
金立
刘力
钟凡
秦学英
HE Xian;LI Jia-Chen;JIN Li;LIU Li;ZHONG Fan;QIN Xue-Ying(Department of Software,Shandong University,Jinan 250101;Engineering Research Center of Digital Media Technology,Ministry of Education,Shandong University,Jinan 250101;Shichen Information Technology(Shanghai)Co.,Ltd,Shanghai 201203;Department of Computer Science and Technology,Shandong University,Qingdao,Shandong 266237)
出处
《计算机学报》
EI
CAS
CSCD
北大核心
2022年第3期585-600,共16页
Chinese Journal of Computers
基金
国家自然科学基金项目(62172260,61907026)
工信部2019年工业互联网创新发展工程项目
之江实验室项目(2020NB0AB02)
山东省高等学校科学技术计划项目(J18KA392)资助
关键词
三维模板跟踪
数据集构建
算法测评
增强现实
真实感渲染
3D template tracking
dataset construction
algorithm evaluation
augmented reality
photorealistic rendering