利用时空特征编码的单目标跟踪网络被引量：3

A spatio-temporal encoded network for single object tracking

导出

摘要目的随着深度神经网络的出现,视觉跟踪快速发展,视觉跟踪任务中的视频时空特性,尤其是时序外观一致性(temporal appearance consistency)具有巨大探索空间。本文提出一种新颖简单实用的跟踪算法——时间感知网络(temporal-aware network, TAN),从视频角度出发,对序列的时间特征和空间特征同时编码。方法 TAN内部嵌入了一个新的时间聚合模块(temporal aggregation module, TAM)用来交换和融合多个历史帧的信息,无需任何模型更新策略也能适应目标的外观变化,如形变、旋转等。为了构建简单实用的跟踪算法框架,设计了一种目标估计策略,通过检测目标的4个角点,由对角构成两组候选框,结合目标框选择策略确定最终目标位置,能够有效应对遮挡等困难。通过离线训练,在没有任何模型更新的情况下,本文提出的跟踪器TAN通过完全前向推理(fully feed-forward)实现跟踪。结果在OTB(online object tracking:a benchmark)50、OTB100、TrackingNet、LaSOT(a high-quality benchmark for large-scale single object tracking)和UAV(a benchmark and simulator for UAV tracking)123公开数据集上的效果达到了小网络模型的领先水平,并且同时保持高速处理速度(70帧/s)。与多个目前先进的跟踪器对比,TAN在性能和速度上达到了很好的平衡,即使部分跟踪器使用了复杂的模板更新策略或在线更新机制,TAN仍表现出优越的性能。消融实验进一步验证了提出的各个模块的有效性。结论本文提出的跟踪器完全离线训练,前向推理不需任何在线模型更新策略,能够适应目标的外观变化,相比其他轻量级的跟踪器,具有更优的性能。 Objective Current visual tracking has been developed dramatically based on deep neural networks. The task of single object visual tracking aims at tracking random objects in a sequential video streaming through yielding the bounding box of the object bounding box in the initial frame. It can be an essential task for multiple computer vision applications like surveillance systems, robotics and human-computer interaction. The simplified, small scale, easy-used and feed-forward trackers are preferred due to resources-constrained application scenarios. Most of methods are focused on top performance. Instead, we break this paradox from another perspective through the key temporal cues modeling inside our model and the ignorance of model update process and large-scaled models. The intrinsic qualities are required to be developed in the research community(i.e., video streaming). A video analysis task is beneficial to formulate the basis of the tracking task itself. First, the object has a spatial displacement constraint, which means the object locations in adjacent frames will not be widely ranged unless dramatic camera motions happened. Almost visual trackers are followed this path and a new framed search objects is overlapped starting from the location in the last frame. Next, the potential temporal appearance consistency problem, which indicates the target information from preceding frames changes smoothly. This could be regarded as the temporal context, which can provide clear cues for the following predictions. However, the second feature has not been fully explored in the literature. Existing methods is leveraged temporal appearance consistency in two ways as mentioned below: 1) use the target information only in the first frame by modeling visual tracking as a matching problem between the given initial patch and the follow-up frames. Siamese-network-based methods are the most popular and effective methods in this category. They applied a one-shot learning scheme for visual tracking, where the object patch in the first frame is treated as an exemplar, and the patches in the search regions within the consecutive frames are regarded as the candidate instances. The task is transferred to find the most similar instance from each frame. This paradigm ignores other historical frames completely, deals with each frame independently and causes tremendous information loss. 2) Use the given initial patch and the historical target patches both targets at every frame or selected frames to predict the object location in a new frame, including traditional and deep-neural-network-based ones. Traditional trackers based methods like the correlation filter(CF) can learn their models or classifiers from the first frame and update models in the subsequent frames with a small learning rate. Our diverse deep-neural-network-based methods first learn their models offline with vast training data and fine-tune the models online at the initial frame and other frames. However, the solution remains open to balancing the accuracy and latency, especially in deep-neural-network-based methods. Also, network finetuning is forbidden in some practical applications when it is deployed in inference chips, which hinders the wide deployment of these methods. Method We facilitate a novel and straightforward tracker to re-formulate the visual tracking problem from the perspective of video analysis. A new temporal-aware network(TAN) is designed to encode target information from multiple frames, which aims at taking advantage of the temporal appearance consistency and the spatial displacement constraint in the forward path without an online model update. To exchange and fuse information from historical frames input, we introduce temporal aggregation modules in TAN and empower our tracker TAN with the ability to learn spatio-temporal features. To balance the computational burden resulting from the multi-frame inputs and tracking accuracy, we employ a shallow network ResNet-18 as our feature extraction backbone and achieve a high speed of over 70 frame/s. Our tracker runs completely feed-forward and can adapt to the target’s appearance changes with our offline-trained, temporal-encoded TAN because previous temporal appearance consistency is maintained by the first frame or historical frames, which require expensive online finetuning to be adaptable. To construct a completed simple tracking pipeline further, we design a novel anchor-free and proposal-free target estimation method, which can detect the four corners, including top-left, top-right, bottom-left, and bottom-right, with a corner detection head in TAN. As target locations can be determined by a pair of top-left and bottom-right corners or top-right and bottom-left, we make use of a center score map to indicate the confidence of these two bounding boxes rather than complicated embedding constraints, which can easily locate the target. Thanks to a corner-based target estimation mechanism, our tracker is capable of handling challenging scenarios for significant changes involved. Result Without bells and whistles, our method has its potentials on several public datasets, such as online object tracking: a benchmark 50(OTB50), OTB100, TrackingNet, a high-quality benchmark for large-scale single object tracking(LaSOT), and a benchmark and simulator for UAV tracking(UAV)123. Our real-time speed optimization and simplified pipeline make TAN more suitable for real applications, especially resource-limited platforms where the large models and online model updates are not supported. Conclusion The proposed tracker will provide a new perspective for single-object tracking by mining the video nature, especially the temporal appearance consistency of this task.

作者王蒙蒙杨小倩刘勇 Wang Mengmeng;Yang Xiaoqian;Liu Yong(College of Control Science and Engineering,Zhejiang University,Hangzhou 310027,China)

机构地区浙江大学控制科学与工程学院

出处《中国图象图形学报》 CSCD 北大核心 2022年第9期2733-2748,共16页 Journal of Image and Graphics

基金国家自然科学基金项目(61836015)。

关键词计算机视觉目标跟踪时空特征编码任意目标跟踪角点跟踪时序外观一致性高速跟踪 computer vision object tracking spatial-temporal feature coding arbitrary target tracking corner tracking temporal appearance consistency high-speed tracking

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献8

1周笑宇,王玲,马燕新,陈沛铂.融合附加神经网络的激光雷达点云单目标跟踪[J].中国激光,2021,48(21):152-164. 被引量：9
2王鑫,唐振民.基于特征融合的粒子滤波在红外小目标跟踪中的应用[J].中国图象图形学报,2010,15(1):91-97. 被引量：17
3宋建锋,苗启广,王崇晓,徐浩,杨瑾.注意力机制的多尺度单目标跟踪算法[J].西安电子科技大学学报,2021,48(5):110-116. 被引量：9
4任仙怡,廖云涛,张桂林,张天序.一种新的相关跟踪方法研究[J].中国图象图形学报（A辑）,2002,7(6):553-557. 被引量：56
5宁纪锋,赵耀博,石武祯.多通道Haar-like特征多示例学习目标跟踪[J].中国图象图形学报,2014,19(7):1038-1045. 被引量：11
6宫海洋,任红格,史涛,李福进.基于改进粒子滤波的稀疏子空间单目标跟踪算法[J].现代电子技术,2018,41(13):10-13. 被引量：4
7丁欢,张文生.融合SPA遮挡分割的多目标跟踪方法[J].中国图象图形学报,2012,17(1):90-98. 被引量：3
8陈晨,邓赵红,高艳丽,王士同.多模糊核融合的单目标跟踪算法[J].计算机科学与探索,2020,14(5):848-860. 被引量：9

二级参考文献69

1田杰,张春华.基于分形的水声图像目标探测[J].中国图象图形学报（A辑）,2005,10(4):479-483. 被引量：14
2程建,周越,蔡念,杨杰.基于粒子滤波的红外目标跟踪[J].红外与毫米波学报,2006,25(2):113-117. 被引量：73
3王华忠,俞金寿.核函数方法及其模型选择[J].江南大学学报（自然科学版）,2006,5(4):500-504. 被引量：40
4杨皓筠.相关跟踪中若干关键问题的研究：硕士学位论文[M].武汉:华中科技大学图象识别与人工智能研究所,2000,4..
5Sun Sun-gu. Target detection using local fuzzy thresholding and binary template matching in forward-looking infrared images [ J]. Optical Engineering, 2007,46 ( 3 ) -036402:1-9.
6Qu You-shan, Duan Yong-qiang, Li Ying-cai, et al. Automatic detection of moving point targets in staring infrared binocular imaging system [ C ]//Proceedings of SPIE: Remote Sensing and Infrared Devices and Systems. Changchun, China: SPIE Press, 2006, 6031: 603110/1-603110/9.
7Zhang Hai-ying, Zhang Tian-wen, Wen Xuan, et al. Modified pipeline algorithm for detection and tracking of infrared point targets[C ]//Proceedings of SPIE: 2nd International Symposium Advanced Optical Manufacturing and Testing. USA: International Society for Optical Engineering, 2006, 6150: 615027/1-615027/6.
8Gordon N J, Salmond D J, Smith A F M. Novel approach to nonlinear/non-Gaussian bayesian state estimation [ J ]. IEE Proceedings Part F: Radar and Signal Processing, 1993,140 (2): 107-113.
9Perez P, Hue C, Vermaak J, et al. Color-based probabilistic tracking[ C]//Proceedings of the 7th European Conference on Computer Vision. London, UK: Springer-Verlag, 2002, 1: 661- 675.
10Nummiaro K, Koller-Meier E, Van Gool L. An adaptive color based particle filter [ J ]. Image and Vision Computing, 2003, 21(1): 99-110.

共引文献110

1陈慧娴,吴一全,张耀.基于深度学习的三维点云分析方法研究进展[J].仪器仪表学报,2023,44(11):130-158. 被引量：3
2路子赟,张旭东,高隽.基于稀疏贝叶斯学习的相关向量跟踪[J].仪器仪表学报,2005,26(z2):387-391.
3王剑,王敬东,李鹏.一种基于主动轮廓模型的自适应模板更新算法[J].光电子技术,2009,29(1):42-46. 被引量：2
4惠斌,陈法领,罗海波.基于互信息的目标跟踪方法[J].红外与激光工程,2007,36(z2):209-212. 被引量：2
5叶新伟,杨为民,杨云,李天石.基于图像识别技术的石油钻杆管理系统研究[J].机床与液压,2004,32(7):66-67. 被引量：1
6雍杨,王敬儒,张启衡.基于塔型结构的快速相关跟踪算法[J].光电工程,2003,30(6):11-14. 被引量：6
7马训鸣.基于机器视觉的农业机械无人驾驶研究[J].西安石油大学学报（自然科学版）,2004,19(5):71-73. 被引量：4
8朱永松,国澄明.基于相关系数的相关跟踪算法研究[J].中国图象图形学报（A辑）,2004,9(8):963-967. 被引量：37
9黄黎红.家用机器人的部分遮挡物体的识别[J].现代电子技术,2005,28(1):112-113. 被引量：1
10余静,游志胜.自动目标识别与跟踪技术研究综述[J].计算机应用研究,2005,22(1):12-15. 被引量：38

同被引文献35

1吴培,李哲敏.混频数据模型应用研究现状及展望[J].统计与决策,2021(8):23-28. 被引量：4
2左文进,贺小刚,闻传震,刘丽君.大数据资源质量评价指标体系构建研究——基于用户感知视角对图书馆大数据的分析[J].价格理论与实践,2022(8):55-58. 被引量：5
3迟国泰,李战江.基于主成分-熵的评价指标体系信息贡献模型[J].科研管理,2014,35(12):137-144. 被引量：23
4李默涵,李建中,高宏.数据时效性判定问题的求解算法[J].计算机学报,2012,35(11):2348-2360. 被引量：20
5李学龙,龚海刚.大数据系统综述[J].中国科学：信息科学,2015,45(1):1-44. 被引量：456
6李建中,王宏志,高宏.大数据可用性的研究进展[J].软件学报,2016,27(7):1605-1625. 被引量：64
7董震,裴明涛.基于异构哈希网络的跨模态人脸检索方法[J].计算机学报,2019,42(1):73-84. 被引量：11
8段旭良,郭兵,沈艳,申云成,董祥千,张洪.基于时效规则的数据修复方法[J].软件学报,2019,30(3):589-603. 被引量：12
9余游,冯林,王格格,徐其凤.一种基于伪标签的半监督少样本学习模型[J].电子学报,2019,47(11):2284-2291. 被引量：11
10彭刚,赵乐新.中国数字经济总量测算问题研究--兼论数字经济与我国经济增长动能转换[J].统计学报,2020,1(3):1-13. 被引量：63

引证文献3

1杜彦东,冯林,陶鹏,龚勋,王俊.元迁移学习在少样本跨域图像分类中的研究[J].中国图象图形学报,2023,28(9):2899-2912. 被引量：4
2赵洁,袁永胜,张鹏宇,王栋.轻量化Transformer目标跟踪数据标注算法[J].中国图象图形学报,2023,28(10):3176-3190.
3肖玲,张雪,王永.数据要素的统计测算方法探究[J].重庆邮电大学学报（社会科学版）,2024,36(3):138-147.

二级引证文献4

1周伯俊,陈峙宇.基于深度元学习的小样本图像分类研究综述[J].计算机工程与应用,2024,60(8):1-15.
2胡美辰,刘敦龙,桑学佳,张少杰,陈乔.面向摄像头视频监控的泥石流发生场景智能识别方法[J].计算机与现代化,2024(3):41-46.
3姚涵涛,余璐,徐常胜.视觉语言模型引导的文本知识嵌入的小样本增量学习[J].软件学报,2024,35(5):2101-2119.
4林思媛,吴一全.基于视觉的液晶屏/OLED屏缺陷检测方法综述[J].中国图象图形学报,2024,29(5):1321-1345.

1陈晨,高艳丽,邓赵红,王士同.TSK模糊逻辑系统相关滤波器跟踪算法[J].计算机科学与探索,2020,14(2):294-306. 被引量：3
2蔡小爱,张海民.VR全景视频运动区域目标精准跟踪[J].商丘师范学院学报,2022,38(9):18-21. 被引量：2
3杨婷婷,赵士友,潘刚伟,刘蓉,姚理荣,高强.基于辐照聚合的聚丙烯腈性能研究[J].化工新型材料,2022,50(6):126-130.
4赵美玲,张巧珍,朱俊,翟春丽.基于在线模型的供水管网优化调度系统设计[J].中国给水排水,2022,38(16):35-39. 被引量：5
5Nikita Karandikar,Rockey Abhishek,Nishant Saurabh,Zhiming Zhao,Alexander Lercher,Ninoslav Marina,Radu Prodan,Chunming Rong,Antorweep Chakravorty.Blockchain-based prosumer incentivization for peak mitigation through temporal aggregation and contextual clustering[J].Blockchain(Research and Applications),2021,2(2):82-96.
6Suyu MEI.A framework combines supervised learning and dense subgraphs discovery to predict protein complexes[J].Frontiers of Computer Science,2022,16(1):173-186.
7Yong Hong Kuo,Kawi-Lam Wong,Jeff Chak-Fu Wong.Investigation of Taylor-Gortler-like Vortices Using the Parallel Consistent Splitting Scheme[J].Advances in Applied Mathematics and Mechanics,2009,1(6):799-815.
8Xiao-Long Zou,Tie-Jun Huang,Si Wu.Towards a New Paradigm for Brain-inspired Computer Vision[J].Machine Intelligence Research,2022,19(5):412-424. 被引量：2
9Seung-Yeon Hwang,Jeong-Joon Kim.A Deep Learning Hierarchical Ensemble for Remote Sensing Image Classification[J].Computers, Materials & Continua,2022(8):2649-2663.
10Hong-jin Wang,Hai-feng Ran,Yue Yin,Xiao-gang Xu,Bao-xiang Jiang,Shi-qi Yu,Yi-jin Chen,Hui-jing Ren,Shan Feng,Ji-fen Zhang,Yi Chen,Qiang Xue,Xiao-yu Xu.Catalpol improves impaired neurovascular unit in ischemic stroke rats via enhancing VEGF-PI3K/AKT and VEGF-MEK1/2/ERK1/2 signaling[J].Acta Pharmacologica Sinica,2022,43(7):1670-1685. 被引量：10

中国图象图形学报

2022年第9期

浏览历史

内容加载中请稍等...

利用时空特征编码的单目标跟踪网络被引量：3

参考文献8

二级参考文献69

共引文献110

同被引文献35

引证文献3

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

利用时空特征编码的单目标跟踪网络 被引量：3

参考文献8

二级参考文献69

共引文献110

同被引文献35

引证文献3

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

利用时空特征编码的单目标跟踪网络被引量：3