摘要
基于STARK目标跟踪方法中采用ResNet为骨干网络,其特征提取能力不足,跟踪效果较差。针对此问题,本文基于Swin-Transformer网络,提出了一种改进的目标跟踪算法。首先,对Swin-Transformer内窗口注意力机制进行多尺度改进,设计多尺度窗口模块MW-MSA,旨在提取更为丰富的局部细节信息,与全局上下文信息共同构成多尺度判别性特征。接着,结合Transformer的编码-解码结构作为特征融合网络,采用优化的多层感知机作为更新分数判断网络构成状态感知模块。最后,针对目标消失、重现挑战,提出了一种多跟踪器融合方法。融合多尺度改进的跟踪算法和SuperDiMP跟踪算法,设计消失状态判断模块,综合考虑两种跟踪器的置信度分数及目标在预测框附近的可能性估计。实验结果表明,相较STARK跟踪算法,本文算法在GOT-10K数据集上的平均重叠率(AO)提升2.7%、成功率SR_(0.5)提高3.3%。在L-LaSOT数据集上,相较于STARK算法,成功率(AUC)提升0.8%,在目标消失重现挑战下成功率提升1%。
An improved target tracking algorithm is proposed based on the Swin-Transformer network to address the problem of insufficient feature extraction capability and poor tracking effect often encountered when using convolutional neural networks in deep learning-based target tracking methods.Firstly,the window attention mechanism of the Swin-Transformer is enhanced across multiple scales,and a multi-scale window module termed MW-MSA is devised to extract more comprehensive local detail information.This augmentation,in conjunction with global contextual insights,engenders multi-scale discriminative features.Then,these features are integrated with the encoding-decoding structure of the Transformer,serving as the feature fusion network.An optimized multi-layer perceptron is employed as the update score judgment network to establish the state awareness module.Finally,a multi-tracker fusion method is proposed to address challenges like target occlusion and disappearance by integrating an improved tracking algorithm with the SuperDiMP tracking algorithm.Results from testing on L-LaSOT and GOT-10K datasets show significant improvements over the STARK tracking algorithm:a 2.7%increase in average overlap rate(AO)and a 3.3%increase in success rate(SR)on GOT-10K,and a 0.8%increase in success rate(AUC)on L-LaSOT.Moreover,under the target disappearance challenge,the success rate is improved by 1%.
作者
刘时
朱明
LIU Shi;ZHU Ming(Changchun Institute of Optics,Fine Mechanics and Physics,Chinese Academy of Sciences,Changchun 130033,China)
出处
《液晶与显示》
CAS
CSCD
北大核心
2024年第11期1569-1580,共12页
Chinese Journal of Liquid Crystals and Displays