Significant advancements have beenwitnessed in visual tracking applications leveragingViT in recent years,mainly due to the formidablemodeling capabilities of Vision Transformer(ViT).However,the strong performance of ...Significant advancements have beenwitnessed in visual tracking applications leveragingViT in recent years,mainly due to the formidablemodeling capabilities of Vision Transformer(ViT).However,the strong performance of such trackers heavily relies on ViT models pretrained for long periods,limitingmore flexible model designs for tracking tasks.To address this issue,we propose an efficient unsupervised ViT pretraining method for the tracking task based on masked autoencoders,called TrackMAE.During pretraining,we employ two shared-parameter ViTs,serving as the appearance encoder and motion encoder,respectively.The appearance encoder encodes randomly masked image data,while the motion encoder encodes randomly masked pairs of video frames.Subsequently,an appearance decoder and a motion decoder separately reconstruct the original image data and video frame data at the pixel level.In this way,ViT learns to understand both the appearance of images and the motion between video frames simultaneously.Experimental results demonstrate that ViT-Base and ViT-Large models,pretrained with TrackMAE and combined with a simple tracking head,achieve state-of-the-art(SOTA)performance without additional design.Moreover,compared to the currently popular MAE pretraining methods,TrackMAE consumes only 1/5 of the training time,which will facilitate the customization of diverse models for tracking.For instance,we additionally customize a lightweight ViT-XS,which achieves SOTA efficient tracking performance.展开更多
Object tracking with abrupt motion is an important research topic and has attracted wide attention.To obtain accurate tracking results,an improved particle filter tracking algorithm based on sparse representation and ...Object tracking with abrupt motion is an important research topic and has attracted wide attention.To obtain accurate tracking results,an improved particle filter tracking algorithm based on sparse representation and nonlinear resampling is proposed in this paper. First,the sparse representation is used to compute particle weights by considering the fact that the weights are sparse when the object moves abruptly,so the potential object region can be predicted more precisely. Then,a nonlinear resampling process is proposed by utilizing the nonlinear sorting strategy,which can solve the problem of particle diversity impoverishment caused by traditional resampling methods. Experimental results based on videos containing objects with various abrupt motions have demonstrated the effectiveness of the proposed algorithm.展开更多
This paper presents an adaptive method of objects and shadows detection in video streams. Models of background are firstly set up and adaptively updated in Hue Saturation Intensity (HSI) color space to detect motion r...This paper presents an adaptive method of objects and shadows detection in video streams. Models of background are firstly set up and adaptively updated in Hue Saturation Intensity (HSI) color space to detect motion regions. Then, detection errors are dealt with by motion continuity and velocity consistency. Finally, cast shadows are removed by the generic properties of luminance, chrominance and gradient density. Experimental results and their evaluation are presented to verify the effectiveness of this new method.展开更多
Occlusion problem is one of the challenging issues in vision field for a long time,and the occlusion phenomenon of visual object will be involved in many vision research fields. Once the occlusion occurs in a visual s...Occlusion problem is one of the challenging issues in vision field for a long time,and the occlusion phenomenon of visual object will be involved in many vision research fields. Once the occlusion occurs in a visual system,it will affect the effects of object recognition,tracking,observation and operation,so detecting occlusion autonomously should be one of the abilities for an intelligent vision system. The research on occlusion detection method for visual object has increasingly attracted attentions of scholars. First,the definition and classification of the occlusion problem are presented.Then,the characteristics and deficiencies of the occlusion detection methods based on the intensity image and the depth image are analyzed respectively,and the existing occlusion detection methods are compared. Finally,the problems of existing occlusion detection methods and possible research directions are pointed out.展开更多
The paper examines a particular aspect of the way semiosis models complex anthroposemiotic activity as exemplified by the "persuasion path" implicit in any source or origin of intentional influence in human ...The paper examines a particular aspect of the way semiosis models complex anthroposemiotic activity as exemplified by the "persuasion path" implicit in any source or origin of intentional influence in human communication.Now,in theory,we should be able to account for every stage in the process of semiosis,and this ability has a bearing on the way signs are to be classified according to the nature of their immediate objects.The topic is a pretext,consequently,for exploring the stages in semiosis from the dynamic object to the sign via the immediate object in selected pictorial examples of purpose and intentionality in semiosis,since,to be understood successfully—indeed,to function at all—any such persuasive or influential activity depends upon the formal organisation of its representation.The paper thus presents one possible explanation of the role of the immediate object in cases of evident intentionality.However,in view of the fact that Peirce never developed a clear idea of semiosis,it is necessarily speculative and abductive.展开更多
当前的目标分割模型难以兼顾分割性能与推断效率,为此提出一种基于尺度注意知识迁移的自蒸馏目标分割方法。首先,构建了一个仅利用主干特征的目标分割网络作为推断网络,实现高效的前向推断过程。其次,提出了一种基于尺度注意知识的自蒸...当前的目标分割模型难以兼顾分割性能与推断效率,为此提出一种基于尺度注意知识迁移的自蒸馏目标分割方法。首先,构建了一个仅利用主干特征的目标分割网络作为推断网络,实现高效的前向推断过程。其次,提出了一种基于尺度注意知识的自蒸馏学习模型:一方面,设计了具有尺度注意机制的金字塔特征模块,利用尺度注意机制自适应地捕获不同语义水平的上下文信息,提取更具区分性的自蒸馏知识;另一方面,融合交叉熵、KL(Kullback-Leibler)散度和L2距离构造蒸馏损失,高效驱动蒸馏知识向分割网络迁移,提升泛化性能。该方法在COD(Camouflaged Object Detection)、DUT-O(Dalian University of Technology-OMRON)、SOC(Salient Objects in Clutter)等五个目标分割数据集上进行了验证:将所提推断网络作为基准网络,所提自蒸馏模型分割性能在Fβ指标上平均提升3.01%,比免教师(TF)自蒸馏模型增加了1.00%;所提网络与近期的残差分割网络(R2Net)相比,参数量减少了2.33×10^(6),推断帧率提升了2.53%,浮点运算量减少了40.50%,分割性能提升了0.51%。实验结果表明:所提方法能有效兼顾性能与效率,适用于计算和存储资源受限的应用场景。展开更多
基金supported in part by National Natural Science Foundation of China(No.62176041)in part by Excellent Science and Technique Talent Foundation of Dalian(No.2022RY21).
文摘Significant advancements have beenwitnessed in visual tracking applications leveragingViT in recent years,mainly due to the formidablemodeling capabilities of Vision Transformer(ViT).However,the strong performance of such trackers heavily relies on ViT models pretrained for long periods,limitingmore flexible model designs for tracking tasks.To address this issue,we propose an efficient unsupervised ViT pretraining method for the tracking task based on masked autoencoders,called TrackMAE.During pretraining,we employ two shared-parameter ViTs,serving as the appearance encoder and motion encoder,respectively.The appearance encoder encodes randomly masked image data,while the motion encoder encodes randomly masked pairs of video frames.Subsequently,an appearance decoder and a motion decoder separately reconstruct the original image data and video frame data at the pixel level.In this way,ViT learns to understand both the appearance of images and the motion between video frames simultaneously.Experimental results demonstrate that ViT-Base and ViT-Large models,pretrained with TrackMAE and combined with a simple tracking head,achieve state-of-the-art(SOTA)performance without additional design.Moreover,compared to the currently popular MAE pretraining methods,TrackMAE consumes only 1/5 of the training time,which will facilitate the customization of diverse models for tracking.For instance,we additionally customize a lightweight ViT-XS,which achieves SOTA efficient tracking performance.
基金Supported by the National Natural Science Foundation of China(61701029)
文摘Object tracking with abrupt motion is an important research topic and has attracted wide attention.To obtain accurate tracking results,an improved particle filter tracking algorithm based on sparse representation and nonlinear resampling is proposed in this paper. First,the sparse representation is used to compute particle weights by considering the fact that the weights are sparse when the object moves abruptly,so the potential object region can be predicted more precisely. Then,a nonlinear resampling process is proposed by utilizing the nonlinear sorting strategy,which can solve the problem of particle diversity impoverishment caused by traditional resampling methods. Experimental results based on videos containing objects with various abrupt motions have demonstrated the effectiveness of the proposed algorithm.
基金the National Natural Science Foundation of China (60472072)the Specialized Research Foundation for the Doctoral Program of Higher Education (20040699034)+1 种基金the Aeronautical Science Foundation of China (04I50370)the Natural Science Foundation of Shaan’xi Province (2004K05-G23).
文摘This paper presents an adaptive method of objects and shadows detection in video streams. Models of background are firstly set up and adaptively updated in Hue Saturation Intensity (HSI) color space to detect motion regions. Then, detection errors are dealt with by motion continuity and velocity consistency. Finally, cast shadows are removed by the generic properties of luminance, chrominance and gradient density. Experimental results and their evaluation are presented to verify the effectiveness of this new method.
基金Supported by the National Natural Science Foundation of China(No.61379065) Natural Science Foundation of Hebei Province(No.F2014203119)
文摘Occlusion problem is one of the challenging issues in vision field for a long time,and the occlusion phenomenon of visual object will be involved in many vision research fields. Once the occlusion occurs in a visual system,it will affect the effects of object recognition,tracking,observation and operation,so detecting occlusion autonomously should be one of the abilities for an intelligent vision system. The research on occlusion detection method for visual object has increasingly attracted attentions of scholars. First,the definition and classification of the occlusion problem are presented.Then,the characteristics and deficiencies of the occlusion detection methods based on the intensity image and the depth image are analyzed respectively,and the existing occlusion detection methods are compared. Finally,the problems of existing occlusion detection methods and possible research directions are pointed out.
文摘The paper examines a particular aspect of the way semiosis models complex anthroposemiotic activity as exemplified by the "persuasion path" implicit in any source or origin of intentional influence in human communication.Now,in theory,we should be able to account for every stage in the process of semiosis,and this ability has a bearing on the way signs are to be classified according to the nature of their immediate objects.The topic is a pretext,consequently,for exploring the stages in semiosis from the dynamic object to the sign via the immediate object in selected pictorial examples of purpose and intentionality in semiosis,since,to be understood successfully—indeed,to function at all—any such persuasive or influential activity depends upon the formal organisation of its representation.The paper thus presents one possible explanation of the role of the immediate object in cases of evident intentionality.However,in view of the fact that Peirce never developed a clear idea of semiosis,it is necessarily speculative and abductive.
文摘当前的目标分割模型难以兼顾分割性能与推断效率,为此提出一种基于尺度注意知识迁移的自蒸馏目标分割方法。首先,构建了一个仅利用主干特征的目标分割网络作为推断网络,实现高效的前向推断过程。其次,提出了一种基于尺度注意知识的自蒸馏学习模型:一方面,设计了具有尺度注意机制的金字塔特征模块,利用尺度注意机制自适应地捕获不同语义水平的上下文信息,提取更具区分性的自蒸馏知识;另一方面,融合交叉熵、KL(Kullback-Leibler)散度和L2距离构造蒸馏损失,高效驱动蒸馏知识向分割网络迁移,提升泛化性能。该方法在COD(Camouflaged Object Detection)、DUT-O(Dalian University of Technology-OMRON)、SOC(Salient Objects in Clutter)等五个目标分割数据集上进行了验证:将所提推断网络作为基准网络,所提自蒸馏模型分割性能在Fβ指标上平均提升3.01%,比免教师(TF)自蒸馏模型增加了1.00%;所提网络与近期的残差分割网络(R2Net)相比,参数量减少了2.33×10^(6),推断帧率提升了2.53%,浮点运算量减少了40.50%,分割性能提升了0.51%。实验结果表明:所提方法能有效兼顾性能与效率,适用于计算和存储资源受限的应用场景。