In the past several years,various visual object tracking benchmarks have been proposed,and some of them have been used widely in numerous recently proposed trackers.However,most of the discussions focus on the overall...In the past several years,various visual object tracking benchmarks have been proposed,and some of them have been used widely in numerous recently proposed trackers.However,most of the discussions focus on the overall performance,and cannot describe the strengths and weaknesses of the trackers in detail.Meanwhile,several benchmark measures that are often used in tests lack convincing interpretation.In this paper,12 frame-wise visual attributes that reflect different aspects of the characteristics of image sequences are collated,and a normalized quantitative formulaic definition has been given to each of them for the first time.Based on these definitions,we propose two novel test methodologies,a correlation-based test and a weight-based test,which can provide a more intuitive and easier demonstration of the trackers’performance for each aspect.Then these methods have been applied to the raw results from one of the most famous tracking challenges,the Video Object Tracking(VOT)Challenge 2017.From the tests,most trackers did not perform well when the size of the target changed rapidly or intensely,and even the advanced deep learning based trackers did not perfectly solve the problem.The scale of the targets was not considered in the calculation of the center location error;however,in a practical test,the center location error is still sensitive to the targets’changes in size.展开更多
基金Project supported by the National Natural Science Foundation of China(No.61501139)the Natural Scientific Research Innovation Foundation in Harbin Institute of Technology,Weihai(No.2019KYCXJJYB06).
文摘In the past several years,various visual object tracking benchmarks have been proposed,and some of them have been used widely in numerous recently proposed trackers.However,most of the discussions focus on the overall performance,and cannot describe the strengths and weaknesses of the trackers in detail.Meanwhile,several benchmark measures that are often used in tests lack convincing interpretation.In this paper,12 frame-wise visual attributes that reflect different aspects of the characteristics of image sequences are collated,and a normalized quantitative formulaic definition has been given to each of them for the first time.Based on these definitions,we propose two novel test methodologies,a correlation-based test and a weight-based test,which can provide a more intuitive and easier demonstration of the trackers’performance for each aspect.Then these methods have been applied to the raw results from one of the most famous tracking challenges,the Video Object Tracking(VOT)Challenge 2017.From the tests,most trackers did not perform well when the size of the target changed rapidly or intensely,and even the advanced deep learning based trackers did not perfectly solve the problem.The scale of the targets was not considered in the calculation of the center location error;however,in a practical test,the center location error is still sensitive to the targets’changes in size.