期刊文献+

SiamCross:孪生交叉的目标跟踪对象感知网络

SiamCross:Siamese Cross Object-Aware Networks for Visual Object Tracking
下载PDF
导出
摘要 近来,基于孪生架构的方法因其能在保持良好速度的同时取得较显著的性能,引起了视觉跟踪领域的广泛关注.然而,孪生网络分支通常是独立的,缺乏信息交互,这限制了模型性能的进一步提升.为了增强孪生网络分支的协作能力,本文提出基于孪生架构的交叉感知网络模型——SiamCross(Siamese Cross Object-Aware Network).孪生网络双分支特征提取是提升模型性能的首要核心操作,区分目标和语义背景在很大程度上依赖模型挖掘的特征鲁棒性.在SiamCross中,我们首先基于孪生网络分支的互监督,设计了全新的孪生交叉感知子网络(Siamese Cross-Aware Network,SCAN)用来提取鲁棒特征.SCAN允许孪生分支彼此全方位高效协同工作,使模板分支可充分利用特征丰富的上下文语义信息,对目标产生更具有区分性的表示;搜索分支结合模板特征,也主动学习到了目标的本质信息.另一方面,无锚框算法将跟踪任务直接映射为对每个像素的分类和回归,网络分支特征可各自聚焦于目标的局部与全局空间信息.以上两种特征具有很好的潜在局部-全局互补性.具体而言,回归特征学习到了更多的目标全局尺寸信息,但同时也引入了周围背景信息,而分类分支专注于学习局部中心定位信息.二者结合,有利于抑制回归特征的背景信息表达.同时,回归特征会在目标周边位置进行突出响应,揭示目标所在区域,也为分类分支进行定位提供了有益参考.为充分利用以上不同的分支空间特征信息来获得更精确的跟踪结果,我们又提出了新型的目标注意力交互网络(Obejct-Attention Interaction Network,OAIN),并将其融入到SiamCross中.OAIN包含并行交叉注意力模块(Parallel Cross Attention Module,PCA)和自适应可形变交叉对齐模块(Adaptive Deformable Cross Align Module,ADCA).PCA模块通过对分支中局部与全局信息的巧妙融合,提升了目标状态估计的准确性.为了进一步使回归特征和目标区域对齐,缓解特征对齐失焦导致的分类分支参考信源可靠性大幅度降低,我们为ADCA模块设计了自适应空间转换操作,可以使得回归特征更好反映目标所在区域.最终,ADCA模块完善了无锚框网络的高效交互机制.最后,我们在OTB2015、VOT2018/2019、GOT-10k和LaSOT五个具有挑战性的公开基准中对SiamCross进行了详尽的实验评估.实验结果显示,SiamCross与当前先进的跟踪器SiamRPN++、ATOM及DiMP相比,均取得了更优异的综合表现,并且可实现实时跟踪. Visual object tracking is a fundamental task of computer vision.Siamese based approaches recently attracted extensive attention in the visual tracking community,due to their ability to achieve remarkable performance while maintaining good speed.However,Siamese network branches are usually independent and lack information interaction,which limits the further improvement of model performance.In order to enhance the collaboration ability of the Siamese network branch,this paper proposes a cross-awareness network model based on the Siamese architecture,called SiamCross(Siamese Cross Object-Aware Network).The feature extraction of the Siamese network branches is the primary core operation to improve the performance of the model.The distinction between the object and the semantic background largely depends on the feature robustness of model mining.In SiamCross,we first designed a new Siamese Cross-Aware Network(SCAN)based on the mutual supervision of the Siamese network branches.SCAN allows the twin branches to work efficiently with each other in an all-round way,so that the template branch can benefit from the rich contextual semantic information of search features and generate a more differentiated representation of the object;the search branch combines template features and actively learns the essential information of the object.On the other hand,thanks to the anchor-free network clearly formulate tracking tasks like classification and regression for each pixel directly,branch features can focus on local and global spatial information of the object,respectively.The above two features also tend to have good potential local-global complementarity.Specifically,the regression feature learns more information such as the global size of the object,but also inevitably introduces surrounding background information,while the classification branch focuses on learning local center localization information.The combination of the two is beneficial to suppress the expression of background information of regression features.Moreover,the regression feature will respond prominently around the object,revealing the area where the object is located,and also providing a good reference for the classification branch to locate.In order to make full use of the different spatial feature information in the branches to obtain more accurate tracking results,we also propose the Object-Attention Interaction Network(OAIN)and integrate it into SiamCross.OAIN contains two modules,Parallel Cross Attention Module(PCA)and Adaptive Deformable Cross-Align Module(ADCA).The PCA module improves the accuracy of object state estimation by ingenious fusion of local and global in the branch.In particular,in order to better align the regression features with the object region and avoid feature alignment out of focus,its reliability as a reference source for classification branches is greatly reduced.The ADCA module has an adaptive spatial transformation operation,which can make the regression features better reflect the object area.The ADCA module improves the efficient interaction mechanism of the anchor-free network.Finally,we extensively evaluated the proposed tracker on five public challenging benchmarks,including OTB2015,VOT2018/2019,GOT-10k,and LaSOT.The experimental results show that compared with current state-of-the-art trackers(e.g.,SiamRPN++,ATOM and DiMP),SiamCross achieved excellent comprehensive performance and could run in real-time.
作者 黄旺辉 冯永 强保华 裴钰璇 罗越 HUANG Wang-Hui;FENG Yong;QIANG Bao-Hua;PEI Yu-Xuan;LUO Yue(College of Computer Science,Chongqing University,Chongqing 400044;Key Laboratory of Dependable Service Computing in Cyber Physical Society,Ministry of Education,Chongqing University,Chongqing 400030;Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin,Guangxi 541004;Guangxi Key Laboratory of Optoelectronic Information Processing,Guilin University of Electronic Technology,Guilin,Guangxi541004)
出处 《计算机学报》 EI CAS CSCD 北大核心 2022年第10期2151-2166,共16页 Chinese Journal of Computers
基金 之江实验室开放课题(2021KE0AB01) 重庆市技术创新与应用发展专项重点项目(cstc2021jscx-gksbX0058) 国家自然科学基金(61762025) 广西可信软件重点实验室研究课题(kx202006) 广西光电信息处理重点实验室(培育基地)基金(GD18202) 广西自然科学基金重点基金(2019GXNSFDA185007)资助.
关键词 视觉目标跟踪 孪生网络 信息交互 交叉注意力 visual object tracking siamese network information interaction cross attention
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部