摘要
目的目前,特征点轨迹稳像算法无法兼顾轨迹长度、鲁棒性及轨迹利用率,因此容易造成该类算法的视频稳像结果扭曲失真或者局部不稳。针对此问题,提出基于三焦点张量重投影的特征点轨迹稳像算法。方法利用三焦点张量构建长虚拟轨迹,通过平滑虚拟轨迹定义稳定视图,然后利用三焦点张量将实特征点重投影到稳定视图,以此实现实特征点轨迹的平滑,最后利用网格变形生成稳定帧。结果对大量不同类型的视频进行稳像效果测试,并且与典型的特征点轨迹稳像算法以及商业软件进行稳像效果对比,其中包括基于轨迹增长的稳像算法、基于对极几何点转移的稳像算法以及商业软件Warp Stabilizer。本文算法的轨迹长度要求低、轨迹利用率高以及鲁棒性好,对于92%剧烈抖动的视频,稳像效果优于基于轨迹增长的稳像算法;对于93%缺乏长轨迹的视频以及71.4%存在滚动快门失真的视频,稳像效果优于Warp Stabilizer;而与基于对极几何点转移的稳像算法相比,退化情况更少,可避免摄像机阶段性静止、摄像机纯旋转等情况带来的算法失效问题。结论本文算法对摄像机运动模式和场景深度限制少,不仅适宜处理缺少视差、场景结构非平面、滚动快门失真等常见的视频稳像问题,而且在摄像机摇头、运动模糊、剧烈抖动等长轨迹缺乏的情况下,依然能取得较好的稳像效果,但该算法的时间性能还有所不足。
Objective Video stabilization is one of the key research areas of computer vision. Currently, the three major categories of video stabilization algorithms are 2D global motion, 2D local motion, and feature trajectory stabilization. The 2D global and local motion stabilization algorithms usually cannot achieve a satisfying stabilization result in scenes with nonplanar depth variations. By contrast, the feature trajectory stabilization algorithm handles nonplanar depth variations well in the aforementioned scenes and outperforms the two video stabilization algorithms. However, the feature trajectory stabilization algorithm normally suffers from stabilization output distortion and unstable local result because of its drawbacks in the trajectory length, robustness, and trajectory utilization rate. To solve this problem, this paper proposes a feature trajectory stabilization algorithm using trifocal tensor. Method This algorithm extracts real feature point trajectory in the scene of the video with KLT algorithm and leverages the RANSAC algorithm to eliminate mismatches in the tracking feature point. The algorithm then adaptively selects a segment of the real trajectories to initialize the virtual trajectories based on the length of real trajectories. A long virtual trajectory is constructed by applying a trifocal tensor transfer to extend the initial virtual trajectory. This virtual trajectory extending process stops when either the virtual trajectory exceeds half of the frame width or height, or the difference between the mean and median of transferred points is larger than five pixels. When the number of virtual trajectories through one frame is less than 300, new initial virtual trajectories are added using the real trajectories on the same frame. With the acquired long trajectories, the algorithm odd-extends the beginning of the virtual trajectories to the first frame and the ending of the virtual trajectories to the last frame. The stabilized view is defined by the smoothed virtual trajectories from the output of the FIR filter. To smoothen the real trajectories, the algorithm re-projects real feature points to the stabilized views by the trifocal tensor transfer and divides the original frames into 16×32 uniform meshed grids. The final stabilized frames are rendered using mesh grid warping conversion of the original frames, while the input to the mesh grid warping is the smoothing vectors between the real feature points and the smoothed real feature points. For the smoothing vectors with unneglectable error, the proposed algorithm deletes them to guarantee the output of mesh grid warping by a combined usage of discarding the smoothed trajectories at most five frames long and the RANSAC algorithm based on the affine model. The degraded precision of virtual trajectory construction and real feature point reprojection is observed because of the degeneration of trifocal tensor transfer, and this algorithm adaptively changes the size of the transfer window according to the severity of degeneration. This process guarantees sufficient transferred points acquired to guarantee the precision of virtual trajectory construction and real feature point reprojection. In the construction of virtual trajectories, this algorithm marks the previous frame as a breakpoint to process the partitioned video when the number of virtual trajectories through one frame is detected at 25% less than the previous frame. Thus, the proposed algorithm achieves enhanced stabilization result. Result The experiment on a number of videos of different types shows the proposed algorithm has advantages in the video stabilization result over the traditional feature trajectory stabilization algorithms that are based on the feature trajectory augmentation or epipolar point transfer and the commercial software, Warp Stabilizer. The testing videos are classified into categories, including “simple,”“running,”“rolling shutter,”“depth,” and “driving,” when compared with the stabilization algorithm based on the feature trajectory augmentation. The “simple” videos have relatively slow camera motions and smooth depth variations. The “running” videos are captured while the users are running, thus these videos are challenging because of excessive wobbling. The “rolling shutter” videos suffer from noticeable rolling shutter distortions. The “depth” videos have a significant abrupt depth change. The “driving” videos are captured on moving vehicles. Furthermore, when compared with Warp Stabilizer, the classification is slightly changed to include “simple,” “lack of long trajectory,”“rolling shutter,”“depth,” and “driving.” The “lack of long trajectory” videos lack long trajectories because of possible reasons, such as camera panning, motion blurring, or excessive jitters. To compare the stabilization results of the three algorithms, a scoring system is used to evaluate their stabilization outputs and then statistically analyze the results of each category to demonstrate the performance of the proposed algorithm. indicates that the proposed algorithm requires a reduced trajectory length and achieves high trajectory utilization rate and good robustness. When compared with the algorithm based on feature trajectory augmentation, the stabilization results have fewer distortions and better stability for 92% of the “running” videos; both algorithms have similar stabilization result for nearly 50% of the “rolling shutter” videos; and the proposed algorithm has fewer distortions for 38% of the videos in this category. Both algorithms have similar stability and no distinct distortion for 55% of the “simple” videos; the proposed algorithm can achieve better stability for the remaining 45% of the videos in this category. Meanwhile, both algorithms have similar stability and extent of distortion for most of the “depth” and “driving” videos; the proposed algorithm has slightly improved stability for few “depth” videos. When compared with Warp Stabilizer, the proposed algorithm has fewer distortions and better overall effect for 93% of the “lack of long trajectory” videos and 71.4% of the “rolling shutter” videos. For the “simple” and “driving” videos, both algorithms achieve good stabilization result. Both algorithms achieve a similar result for 75% of the “depth” videos; for the remaining 25% of the videos in this category, the proposed algorithm has fewer distortions. When compared with the stabilization algorithm based on epipolar point transfer, the proposed algorithm has fewer degenerated situations and therefore can avoid distortion introduced by phased motionless camera or pure camera rotation. Conclusion The proposed algorithm has less restriction on the camera motion pattern and scene depth and is suitable for common video stabilization situations, including scenarios that lack parallax with the nonplanar structure or with rolling shutter distortion. The proposed algorithm can still achieve satisfying stabilization result in scenarios that lack long trajectory because of camera panning, motion blurring, or excessive jitters. The time complexity of this algorithm may require improvement because this algorithm requires approximately 3-5 s per frame on a machine with a 2.1 GHz Intel Core i3 CPU and 3 GB of memory. In the future, parallel computing may be a potential solution for increasing speed.
出处
《中国图象图形学报》
CSCD
北大核心
2017年第7期935-945,共11页
Journal of Image and Graphics
基金
国家自然科学基金项目(U1531110)
中央高校基本科研业务费专项基金项目(NZ2015202)~~
关键词
视频稳像
三焦点张量
虚拟轨迹
长轨迹
重投影
video stabilization
trifocal tensor
virtual trajectory
long trajectory
reprojection