Transformer-based stereo image super-resolution reconstruction(Stereo SR)methods have significantly improved image quality.However,existing methods have deficiencies in paying attention to detailed features and do not...Transformer-based stereo image super-resolution reconstruction(Stereo SR)methods have significantly improved image quality.However,existing methods have deficiencies in paying attention to detailed features and do not consider the offset of pixels along the epipolar lines in complementary views when integrating stereo information.To address these challenges,this paper introduces a novel epipolar line window attention stereo image super-resolution network(EWASSR).For detail feature restoration,we design a feature extractor based on Transformer and convolutional neural network(CNN),which consists of(shifted)window-based self-attention((S)W-MSA)and feature distillation and enhancement blocks(FDEB).This combination effectively solves the problem of global image perception and local feature attention and captures more discriminative high-frequency features of the image.Furthermore,to address the problem of offset of complementary pixels in stereo images,we propose an epipolar line window attention(EWA)mechanism,which divides windows along the epipolar direction to promote efficient matching of shifted pixels,even in pixel smooth areas.More accurate pixel matching can be achieved using adjacent pixels in the window as a reference.Extensive experiments demonstrate that our EWASSR can reconstruct more realistic detailed features.Comparative quantitative results show that in the experimental results of our EWASSR on the Middlebury and Flickr1024 data sets for 2×SR,compared with the recent network,the Peak signal-to-noise ratio(PSNR)increased by 0.37 dB and 0.34 dB,respectively.展开更多
This paper proposes a simple geometrical ray based approach to solve the stereo correspondence problem for the single-lens bi-prism stereovision system. Each image captured using this system can be divided into two su...This paper proposes a simple geometrical ray based approach to solve the stereo correspondence problem for the single-lens bi-prism stereovision system. Each image captured using this system can be divided into two sub-images on the left and right and these sub-images are generated by two virtual cameras which are produced by the bi-prism. This stereovision system is equivalent to the conventional two camera system and the two sub-images captured have disparities which can be used to reconstruct back the 3-dimensional (3D) scene. The stereo correspondence problem of this system will be solved geometrically by applying the epipolar geometry constraint on the generated virtual cameras instead of the real CCD camera. Experiments are conducted to validate the proposed method and the results are compared to the calibration based approach to confirm its accuracy and effectiveness.展开更多
This paper introduces a new algorithm for estimating the relative pose of a moving camera using consecutive frames of a video sequence. State-of-the-art algorithms for calculating the relative pose between two images ...This paper introduces a new algorithm for estimating the relative pose of a moving camera using consecutive frames of a video sequence. State-of-the-art algorithms for calculating the relative pose between two images use matching features to estimate the essential matrix. The essential matrix is then decomposed into the relative rotation and normalized translation between frames. To be robust to noise and feature match outliers, these methods generate a large number of essential matrix hypotheses from randomly selected minimal subsets of feature pairs, and then score these hypotheses on all feature pairs. Alternatively, the algorithm introduced in this paper calculates relative pose hypotheses by directly optimizing the rotation and normalized translation between frames, rather than calculating the essential matrix and then performing the decomposition. The resulting algorithm improves computation time by an order of magnitude. If an inertial measurement unit(IMU) is available, it is used to seed the optimizer, and in addition, we reuse the best hypothesis at each iteration to seed the optimizer thereby reducing the number of relative pose hypotheses that must be generated and scored. These advantages greatly speed up performance and enable the algorithm to run in real-time on low cost embedded hardware. We show application of our algorithm to visual multi-target tracking(MTT) in the presence of parallax and demonstrate its real-time performance on a 640 × 480 video sequence captured on a UAV. Video results are available at https://youtu.be/Hh K-p2 h XNn U.展开更多
This paper combines the least-square method and iteration method to get the fundamental matrix and develops a new evaluation function based on the epipolar geometry. During the iteration, with the evaluation function ...This paper combines the least-square method and iteration method to get the fundamental matrix and develops a new evaluation function based on the epipolar geometry. During the iteration, with the evaluation function as a measurment, the points which bring larger noise are deleted, and the points with smaller noise are retained, thus the precision of our method is increased. The experiment results indicate the new method is precise in calculation, stable in performance and resistant to noise.展开更多
稠密地图估计是同步定位与地图构建(Simultaneous Localization and Mapping,SLAM)的重要目标。针对经典的深度滤波算法重建精度不高的问题,提出一种基于逆深度滤波的改进单目稠密点云重建方法,在极线搜索阶段通过设置阈值提高效率,通...稠密地图估计是同步定位与地图构建(Simultaneous Localization and Mapping,SLAM)的重要目标。针对经典的深度滤波算法重建精度不高的问题,提出一种基于逆深度滤波的改进单目稠密点云重建方法,在极线搜索阶段通过设置阈值提高效率,通过逆深度高斯滤波器更新后验逆深度概率分布,通过帧内检测剔除外点。实验结果验证改进后的稠密重建算法具有更稠密、更精确的重建效果,且无须GPU加速。展开更多
极线校正是一种针对双目相机原始图像对的投影变换方法,使校正后图像对应的极线位于同一水平线上,消除垂直视差,将立体匹配优化为一维搜索问题。针对现今极线校正的不足,本文提出一种基于双目相机平移矩阵的极线校正方法:首先利用奇异...极线校正是一种针对双目相机原始图像对的投影变换方法,使校正后图像对应的极线位于同一水平线上,消除垂直视差,将立体匹配优化为一维搜索问题。针对现今极线校正的不足,本文提出一种基于双目相机平移矩阵的极线校正方法:首先利用奇异值分解(singular value decomposition,SVD)平移矩阵,求得校正后的新旋转矩阵;其次通过校正前后的图像关系确立一个新相机内参矩阵,完成极线校正。运用本文方法对SYNTIM数据库的不同场景多组双目图像进行验证,实验结果表明平均校正误差在0.6像素内,图像几乎不产生畸变,平均偏斜在2.4°左右,平均运行时间为0.2302 s,该方法具有应用价值,完全满足极线校正的需求,解决了双目相机在立体匹配过程中由于相机的机械偏差而产生的误差和繁琐的计算过程。展开更多
视觉同时定位与地图构建(Simultaneous localization and mapping,SLAM)过程中,动态物体引入的干扰信息会严重影响定位精度。通过剔除动态对象,修复空洞区域解决动态场景下的SLAM问题。采用Mask-RCNN获取语义信息,结合对极几何方法对动...视觉同时定位与地图构建(Simultaneous localization and mapping,SLAM)过程中,动态物体引入的干扰信息会严重影响定位精度。通过剔除动态对象,修复空洞区域解决动态场景下的SLAM问题。采用Mask-RCNN获取语义信息,结合对极几何方法对动态对象进行剔除。使用关键帧像素加权映射的方式对RGB和深度图空洞区域进行逐像素恢复。依据深度图相邻像素相关性使用区域生长算法完善深度信息。在TUM数据集上的实验结果表明,位姿估计精度较ORB-SLAM2平均提高85.26%,较DynaSLAM提高28.54%,在实际场景中进行测试依旧表现良好。展开更多
目的移动智能体在执行同步定位与地图构建(Simultaneous Localization and Mapping,SLAM)的复杂任务时,动态物体的干扰会导致特征点间的关联减弱,系统定位精度下降,为此提出一种面向室内动态场景下基于YOLOv5和几何约束的视觉SLAM算法...目的移动智能体在执行同步定位与地图构建(Simultaneous Localization and Mapping,SLAM)的复杂任务时,动态物体的干扰会导致特征点间的关联减弱,系统定位精度下降,为此提出一种面向室内动态场景下基于YOLOv5和几何约束的视觉SLAM算法。方法首先,以YOLOv5s为基础,将原有的CSPDarknet主干网络替换成轻量级的MobileNetV3网络,可以减少参数、加快运行速度,同时与ORB-SLAM2系统相结合,在提取ORB特征点的同时获取语义信息,并剔除先验的动态特征点。然后,结合光流法和对极几何约束对可能残存的动态特征点进一步剔除。最后,仅用静态特征点对相机位姿进行估计。结果在TUM数据集上的实验结果表明,与ORB-SLAM2相比,在高动态序列下的ATE和RPE都减少了90%以上,与DS-SLAM、Dyna-SLAM同类型系统相比,在保证定位精度和鲁棒性的同时,跟踪线程中处理一帧图像平均只需28.26 ms。结论该算法能够有效降低动态物体对实时SLAM过程造成的干扰,为实现更加智能化、自动化的包装流程提供了可能。展开更多
树高是监测森林状况的重要参数,摄影测量法具有低成本、灵活的特性,是树高采集的重要方法之一.作为一种被动遥感方式,传统的摄影测量方法往往需要数量较多,重叠率较高的图像数据,这与传统图像特征的稀疏性有关.为了提高图像数量受限条...树高是监测森林状况的重要参数,摄影测量法具有低成本、灵活的特性,是树高采集的重要方法之一.作为一种被动遥感方式,传统的摄影测量方法往往需要数量较多,重叠率较高的图像数据,这与传统图像特征的稀疏性有关.为了提高图像数量受限条件下的树高提取精度,提出将稀疏特征匹配和稠密像素匹配相结合,并使用对极约束过滤外点的方法,得到稠密且精度较高的匹配结果,并通过三维重建算法得到森林场景点云.该方法在少量图像的情况下就可以较为完整地重建森林场景并提取树高,将提取的树高与机载激光雷达(light detection and ranging,LiDAR)点云的结果进行对比,相关系数为0.91,最大误差为1.64 m.该算法只需要少量的重叠图像,这表明了该算法在处理高分辨率卫星图像方面具有一定潜力.展开更多
基金This work was supported by Sichuan Science and Technology Program(2023YFG0262).
文摘Transformer-based stereo image super-resolution reconstruction(Stereo SR)methods have significantly improved image quality.However,existing methods have deficiencies in paying attention to detailed features and do not consider the offset of pixels along the epipolar lines in complementary views when integrating stereo information.To address these challenges,this paper introduces a novel epipolar line window attention stereo image super-resolution network(EWASSR).For detail feature restoration,we design a feature extractor based on Transformer and convolutional neural network(CNN),which consists of(shifted)window-based self-attention((S)W-MSA)and feature distillation and enhancement blocks(FDEB).This combination effectively solves the problem of global image perception and local feature attention and captures more discriminative high-frequency features of the image.Furthermore,to address the problem of offset of complementary pixels in stereo images,we propose an epipolar line window attention(EWA)mechanism,which divides windows along the epipolar direction to promote efficient matching of shifted pixels,even in pixel smooth areas.More accurate pixel matching can be achieved using adjacent pixels in the window as a reference.Extensive experiments demonstrate that our EWASSR can reconstruct more realistic detailed features.Comparative quantitative results show that in the experimental results of our EWASSR on the Middlebury and Flickr1024 data sets for 2×SR,compared with the recent network,the Peak signal-to-noise ratio(PSNR)increased by 0.37 dB and 0.34 dB,respectively.
文摘This paper proposes a simple geometrical ray based approach to solve the stereo correspondence problem for the single-lens bi-prism stereovision system. Each image captured using this system can be divided into two sub-images on the left and right and these sub-images are generated by two virtual cameras which are produced by the bi-prism. This stereovision system is equivalent to the conventional two camera system and the two sub-images captured have disparities which can be used to reconstruct back the 3-dimensional (3D) scene. The stereo correspondence problem of this system will be solved geometrically by applying the epipolar geometry constraint on the generated virtual cameras instead of the real CCD camera. Experiments are conducted to validate the proposed method and the results are compared to the calibration based approach to confirm its accuracy and effectiveness.
基金funded by the Center for Unmanned Aircraft Systems(C-UAS)a National Science Foundation Industry/University Cooperative Research Center(I/UCRC)under NSF award Numbers IIP-1161036 and CNS-1650547along with significant contributions from C-UAS industry members。
文摘This paper introduces a new algorithm for estimating the relative pose of a moving camera using consecutive frames of a video sequence. State-of-the-art algorithms for calculating the relative pose between two images use matching features to estimate the essential matrix. The essential matrix is then decomposed into the relative rotation and normalized translation between frames. To be robust to noise and feature match outliers, these methods generate a large number of essential matrix hypotheses from randomly selected minimal subsets of feature pairs, and then score these hypotheses on all feature pairs. Alternatively, the algorithm introduced in this paper calculates relative pose hypotheses by directly optimizing the rotation and normalized translation between frames, rather than calculating the essential matrix and then performing the decomposition. The resulting algorithm improves computation time by an order of magnitude. If an inertial measurement unit(IMU) is available, it is used to seed the optimizer, and in addition, we reuse the best hypothesis at each iteration to seed the optimizer thereby reducing the number of relative pose hypotheses that must be generated and scored. These advantages greatly speed up performance and enable the algorithm to run in real-time on low cost embedded hardware. We show application of our algorithm to visual multi-target tracking(MTT) in the presence of parallax and demonstrate its real-time performance on a 640 × 480 video sequence captured on a UAV. Video results are available at https://youtu.be/Hh K-p2 h XNn U.
基金Supported by the National Science Foundation(69275004)the France-China Advanced Research Program
文摘This paper combines the least-square method and iteration method to get the fundamental matrix and develops a new evaluation function based on the epipolar geometry. During the iteration, with the evaluation function as a measurment, the points which bring larger noise are deleted, and the points with smaller noise are retained, thus the precision of our method is increased. The experiment results indicate the new method is precise in calculation, stable in performance and resistant to noise.
文摘稠密地图估计是同步定位与地图构建(Simultaneous Localization and Mapping,SLAM)的重要目标。针对经典的深度滤波算法重建精度不高的问题,提出一种基于逆深度滤波的改进单目稠密点云重建方法,在极线搜索阶段通过设置阈值提高效率,通过逆深度高斯滤波器更新后验逆深度概率分布,通过帧内检测剔除外点。实验结果验证改进后的稠密重建算法具有更稠密、更精确的重建效果,且无须GPU加速。
文摘极线校正是一种针对双目相机原始图像对的投影变换方法,使校正后图像对应的极线位于同一水平线上,消除垂直视差,将立体匹配优化为一维搜索问题。针对现今极线校正的不足,本文提出一种基于双目相机平移矩阵的极线校正方法:首先利用奇异值分解(singular value decomposition,SVD)平移矩阵,求得校正后的新旋转矩阵;其次通过校正前后的图像关系确立一个新相机内参矩阵,完成极线校正。运用本文方法对SYNTIM数据库的不同场景多组双目图像进行验证,实验结果表明平均校正误差在0.6像素内,图像几乎不产生畸变,平均偏斜在2.4°左右,平均运行时间为0.2302 s,该方法具有应用价值,完全满足极线校正的需求,解决了双目相机在立体匹配过程中由于相机的机械偏差而产生的误差和繁琐的计算过程。
文摘视觉同时定位与地图构建(Simultaneous localization and mapping,SLAM)过程中,动态物体引入的干扰信息会严重影响定位精度。通过剔除动态对象,修复空洞区域解决动态场景下的SLAM问题。采用Mask-RCNN获取语义信息,结合对极几何方法对动态对象进行剔除。使用关键帧像素加权映射的方式对RGB和深度图空洞区域进行逐像素恢复。依据深度图相邻像素相关性使用区域生长算法完善深度信息。在TUM数据集上的实验结果表明,位姿估计精度较ORB-SLAM2平均提高85.26%,较DynaSLAM提高28.54%,在实际场景中进行测试依旧表现良好。
文摘目的移动智能体在执行同步定位与地图构建(Simultaneous Localization and Mapping,SLAM)的复杂任务时,动态物体的干扰会导致特征点间的关联减弱,系统定位精度下降,为此提出一种面向室内动态场景下基于YOLOv5和几何约束的视觉SLAM算法。方法首先,以YOLOv5s为基础,将原有的CSPDarknet主干网络替换成轻量级的MobileNetV3网络,可以减少参数、加快运行速度,同时与ORB-SLAM2系统相结合,在提取ORB特征点的同时获取语义信息,并剔除先验的动态特征点。然后,结合光流法和对极几何约束对可能残存的动态特征点进一步剔除。最后,仅用静态特征点对相机位姿进行估计。结果在TUM数据集上的实验结果表明,与ORB-SLAM2相比,在高动态序列下的ATE和RPE都减少了90%以上,与DS-SLAM、Dyna-SLAM同类型系统相比,在保证定位精度和鲁棒性的同时,跟踪线程中处理一帧图像平均只需28.26 ms。结论该算法能够有效降低动态物体对实时SLAM过程造成的干扰,为实现更加智能化、自动化的包装流程提供了可能。
文摘树高是监测森林状况的重要参数,摄影测量法具有低成本、灵活的特性,是树高采集的重要方法之一.作为一种被动遥感方式,传统的摄影测量方法往往需要数量较多,重叠率较高的图像数据,这与传统图像特征的稀疏性有关.为了提高图像数量受限条件下的树高提取精度,提出将稀疏特征匹配和稠密像素匹配相结合,并使用对极约束过滤外点的方法,得到稠密且精度较高的匹配结果,并通过三维重建算法得到森林场景点云.该方法在少量图像的情况下就可以较为完整地重建森林场景并提取树高,将提取的树高与机载激光雷达(light detection and ranging,LiDAR)点云的结果进行对比,相关系数为0.91,最大误差为1.64 m.该算法只需要少量的重叠图像,这表明了该算法在处理高分辨率卫星图像方面具有一定潜力.