This paper proposes a Visual-Inertial Odometry(VIO)algorithm that relies solely on monocular cameras and Inertial Measurement Units(IMU),capable of real-time self-position estimation for robots during movement.By inte...This paper proposes a Visual-Inertial Odometry(VIO)algorithm that relies solely on monocular cameras and Inertial Measurement Units(IMU),capable of real-time self-position estimation for robots during movement.By integrating the optical flow method,the algorithm tracks both point and line features in images simultaneously,significantly reducing computational complexity and the matching time for line feature descriptors.Additionally,this paper advances the triangulation method for line features,using depth information from line segment endpoints to determine their Plcker coordinates in three-dimensional space.Tests on the EuRoC datasets show that the proposed algorithm outperforms PL-VIO in terms of processing speed per frame,with an approximate 5%to 10%improvement in both relative pose error(RPE)and absolute trajectory error(ATE).These results demonstrate that the proposed VIO algorithm is an efficient solution suitable for low-computing platforms requiring real-time localization and navigation.展开更多
Although VSLAM/VISLAM has achieved great success,it is still difficult to quantitatively evaluate the localization results of different kinds of SLAM systems from the aspect of augmented reality due to the lack of an ...Although VSLAM/VISLAM has achieved great success,it is still difficult to quantitatively evaluate the localization results of different kinds of SLAM systems from the aspect of augmented reality due to the lack of an appropriate benchmark.For AR applications in practice,a variety of challenging situations(e.g.,fast motion,strong rotation,serious motion blur,dynamic interference)may be easily encountered since a home user may not carefully move the AR device,and the real environment may be quite complex.In addition,the frequency of camera lost should be minimized and the recovery from the failure status should be fast and accurate for good AR experience.Existing SLAM datasets/benchmarks generally only provide the evaluation of pose accuracy and their camera motions are somehow simple and do not fit well the common cases in the mobile AR applications.With the above motivation,we build a new visual-inertial dataset as well as a series of evaluation criteria for AR.We also review the existing monocular VSLAM/VISLAM approaches with detailed analyses and comparisons.Especially,we select 8 representative monocular VSLAM/VISLAM approaches/systems and quantitatively evaluate them on our benchmark.Our dataset,sample code and corresponding evaluation tools are available at the benchmark website http://www.zjucvg.net/eval-vislam/.展开更多
Feature detection and Tracking, which heavily rely on the gray value information of images, is a very importance procedure for Visual-Inertial Odometry (VIO) and the tracking results significantly affect the accuracy ...Feature detection and Tracking, which heavily rely on the gray value information of images, is a very importance procedure for Visual-Inertial Odometry (VIO) and the tracking results significantly affect the accuracy of the estimation results and the robustness of VIO. In high contrast lighting condition environment, images captured by auto exposure camera shows frequently change with its exposure time. As a result, the gray value of the same feature in the image show vary from frame to frame, which poses large challenge to the feature detection and tracking procedure. Moreover, this problem further been aggravated by the nonlinear camera response function and lens attenuation. However, very few VIO methods take full advantage of photometric camera calibration and discuss the influence of photometric calibration to the VIO. In this paper, we proposed a robust monocular visual-inertial odometry, PC-VINS-Mono, which can be understood as an extension of the opens-source VIO pipeline, VINS-Mono, with the capability of photometric calibration. We evaluate the proposed algorithm with the public dataset. Experimental results show that, with photometric calibration, our algorithm achieves better performance comparing to the VINS-Mono.展开更多
Visual-Inertial Odometry(VIO)has been developed from Simultaneous Localization and Mapping(SLAM)as a lowcost and versatile sensor fusion approach and attracted increasing attention in ground vehicle positioning.Howeve...Visual-Inertial Odometry(VIO)has been developed from Simultaneous Localization and Mapping(SLAM)as a lowcost and versatile sensor fusion approach and attracted increasing attention in ground vehicle positioning.However,VIOs usually have the degraded performance in challenging environments and degenerated motion scenarios.In this paper,we propose a ground vehicle-based VIO algorithm based on the Multi-State Constraint Kalman Filter(MSCKF)framework.Based on a unifed motion manifold assumption,we derive the measurement model of manifold constraints,including velocity,rotation,and translation constraints.Then we present a robust flter-based algorithm dedicated to ground vehicles,whose key is the real-time manifold noise estimation and adaptive measurement update.Besides,GNSS position measurements are loosely coupled into our approach,where the transformation between GNSS and VIO frame is optimized online.Finally,we theoretically analyze the system observability matrix and observability measures.Our algorithm is tested on both the simulation test and public datasets including Brno Urban dataset and Kaist Urban dataset.We compare the performance of our algorithm with classical VIO algorithms(MSCKF,VINS-Mono,R-VIO,ORB_SLAM3)and GVIO algorithms(GNSS-MSCKF,VINS-Fusion).The results demonstrate that our algorithm is more robust than other compared algorithms,showing a competitive position accuracy and computational efciency.展开更多
Monocular visual odometry (VO) is the process of determining a user’s trajectory through a series of consecutive images taken by a single camera. A major problem that affects the accuracy of monocular visual odometry...Monocular visual odometry (VO) is the process of determining a user’s trajectory through a series of consecutive images taken by a single camera. A major problem that affects the accuracy of monocular visual odometry, however, is the scale ambiguity. This research proposes an innovative augmentation technique, which resolves the scale ambiguity problem of monocular visual odometry. The proposed technique augments the camera images with range measurements taken by an ultra-low-cost laser device known as the Spike. The size of the Spike laser rangefinder is small and can be mounted on a smartphone. Two datasets were collected along precisely surveyed tracks, both outdoor and indoor, to assess the effectiveness of the proposed technique. The coordinates of both tracks were determined using a total station to serve as a ground truth. In order to calibrate the smartphone’s camera, seven images of a checkerboard were taken from different positions and angles and then processed using a MATLAB-based camera calibration toolbox. Subsequently, the speeded-up robust features (SURF) method was used for image feature detection and matching. The random sample consensus (RANSAC) algorithm was then used to remove the outliers in the matched points between the sequential images. The relative orientation and translation between the frames were computed and then scaled using the spike measurements in order to obtain the scaled trajectory. Subsequently, the obtained scaled trajectory was used to construct the surrounding scene using the structure from motion (SfM) technique. Finally, both of the computed camera trajectory and the constructed scene were compared with ground truth. It is shown that the proposed technique allows for achieving centimeter-level accuracy in monocular VO scale recovery, which in turn leads to an enhanced mapping accuracy.展开更多
Background Based on the seminal work proposed by Zhou et al., much of the recent progress in learning monocular visual odometry, i.e., depth and camera motion from monocular videos, can be attributed to the tricks in ...Background Based on the seminal work proposed by Zhou et al., much of the recent progress in learning monocular visual odometry, i.e., depth and camera motion from monocular videos, can be attributed to the tricks in the training procedure, such as data augmentation and learning objectives. Methods Herein, we categorize a collection of such tricks through the theoretical examination and empirical evaluation of their effects on the final accuracy of the visual odometry. Results/Conclusions By combining the aforementioned tricks, we were able to significantly improve a baseline model adapted from SfMLearner without additional inference costs. Furthermore, we analyzed the principles of these tricks and the reason for their success. Practical guidelines for future research are also presented.展开更多
Visual-Inertial Odometry(VIO) fuses measurements from camera and Inertial Measurement Unit(IMU) to achieve accumulative performance that is better than using individual sensors.Hybrid VIO is an extended Kalman filter-...Visual-Inertial Odometry(VIO) fuses measurements from camera and Inertial Measurement Unit(IMU) to achieve accumulative performance that is better than using individual sensors.Hybrid VIO is an extended Kalman filter-based solution which augments features with long tracking length into the state vector of Multi-State Constraint Kalman Filter(MSCKF). In this paper, a novel hybrid VIO is proposed, which focuses on utilizing low-cost sensors while also considering both the computational efficiency and positioning precision. The proposed algorithm introduces several novel contributions. Firstly, by deducing an analytical error transition equation, onedimensional inverse depth parametrization is utilized to parametrize the augmented feature state.This modification is shown to significantly improve the computational efficiency and numerical robustness, as a result achieving higher precision. Secondly, for better handling of the static scene,a novel closed-form Zero velocity UPda Te(ZUPT) method is proposed. ZUPT is modeled as a measurement update for the filter rather than forbidding propagation roughly, which has the advantage of correcting the overall state through correlation in the filter covariance matrix. Furthermore, online spatial and temporal calibration is also incorporated. Experiments are conducted on both public dataset and real data. The results demonstrate the effectiveness of the proposed solution by showing that its performance is better than the baseline and the state-of-the-art algorithms in terms of both efficiency and precision. A related software is open-sourced to benefit the community.展开更多
文摘This paper proposes a Visual-Inertial Odometry(VIO)algorithm that relies solely on monocular cameras and Inertial Measurement Units(IMU),capable of real-time self-position estimation for robots during movement.By integrating the optical flow method,the algorithm tracks both point and line features in images simultaneously,significantly reducing computational complexity and the matching time for line feature descriptors.Additionally,this paper advances the triangulation method for line features,using depth information from line segment endpoints to determine their Plcker coordinates in three-dimensional space.Tests on the EuRoC datasets show that the proposed algorithm outperforms PL-VIO in terms of processing speed per frame,with an approximate 5%to 10%improvement in both relative pose error(RPE)and absolute trajectory error(ATE).These results demonstrate that the proposed VIO algorithm is an efficient solution suitable for low-computing platforms requiring real-time localization and navigation.
基金the National Key Research and Development Program of China(2016YFB1001501)NSF of China(61672457)+1 种基金the Fundamental Research Funds for the Central Universities(2018FZA5011)Zhejiang University-SenseTime Joint Lab of 3D Vision.
文摘Although VSLAM/VISLAM has achieved great success,it is still difficult to quantitatively evaluate the localization results of different kinds of SLAM systems from the aspect of augmented reality due to the lack of an appropriate benchmark.For AR applications in practice,a variety of challenging situations(e.g.,fast motion,strong rotation,serious motion blur,dynamic interference)may be easily encountered since a home user may not carefully move the AR device,and the real environment may be quite complex.In addition,the frequency of camera lost should be minimized and the recovery from the failure status should be fast and accurate for good AR experience.Existing SLAM datasets/benchmarks generally only provide the evaluation of pose accuracy and their camera motions are somehow simple and do not fit well the common cases in the mobile AR applications.With the above motivation,we build a new visual-inertial dataset as well as a series of evaluation criteria for AR.We also review the existing monocular VSLAM/VISLAM approaches with detailed analyses and comparisons.Especially,we select 8 representative monocular VSLAM/VISLAM approaches/systems and quantitatively evaluate them on our benchmark.Our dataset,sample code and corresponding evaluation tools are available at the benchmark website http://www.zjucvg.net/eval-vislam/.
基金support from National Natural Science Foundation of China (No.61375086)Key Project (No.KZ201610005010) of S&T Plan of Beijing Municipal Commission of EducationBeijing Natural Science Foundation(4174083).
文摘Feature detection and Tracking, which heavily rely on the gray value information of images, is a very importance procedure for Visual-Inertial Odometry (VIO) and the tracking results significantly affect the accuracy of the estimation results and the robustness of VIO. In high contrast lighting condition environment, images captured by auto exposure camera shows frequently change with its exposure time. As a result, the gray value of the same feature in the image show vary from frame to frame, which poses large challenge to the feature detection and tracking procedure. Moreover, this problem further been aggravated by the nonlinear camera response function and lens attenuation. However, very few VIO methods take full advantage of photometric camera calibration and discuss the influence of photometric calibration to the VIO. In this paper, we proposed a robust monocular visual-inertial odometry, PC-VINS-Mono, which can be understood as an extension of the opens-source VIO pipeline, VINS-Mono, with the capability of photometric calibration. We evaluate the proposed algorithm with the public dataset. Experimental results show that, with photometric calibration, our algorithm achieves better performance comparing to the VINS-Mono.
基金the National Nature Science Foundation of China(NSFC)under Grant No.62273229the Equipment PreResearch Field Foundation under Grant No.80913010303.
文摘Visual-Inertial Odometry(VIO)has been developed from Simultaneous Localization and Mapping(SLAM)as a lowcost and versatile sensor fusion approach and attracted increasing attention in ground vehicle positioning.However,VIOs usually have the degraded performance in challenging environments and degenerated motion scenarios.In this paper,we propose a ground vehicle-based VIO algorithm based on the Multi-State Constraint Kalman Filter(MSCKF)framework.Based on a unifed motion manifold assumption,we derive the measurement model of manifold constraints,including velocity,rotation,and translation constraints.Then we present a robust flter-based algorithm dedicated to ground vehicles,whose key is the real-time manifold noise estimation and adaptive measurement update.Besides,GNSS position measurements are loosely coupled into our approach,where the transformation between GNSS and VIO frame is optimized online.Finally,we theoretically analyze the system observability matrix and observability measures.Our algorithm is tested on both the simulation test and public datasets including Brno Urban dataset and Kaist Urban dataset.We compare the performance of our algorithm with classical VIO algorithms(MSCKF,VINS-Mono,R-VIO,ORB_SLAM3)and GVIO algorithms(GNSS-MSCKF,VINS-Fusion).The results demonstrate that our algorithm is more robust than other compared algorithms,showing a competitive position accuracy and computational efciency.
文摘Monocular visual odometry (VO) is the process of determining a user’s trajectory through a series of consecutive images taken by a single camera. A major problem that affects the accuracy of monocular visual odometry, however, is the scale ambiguity. This research proposes an innovative augmentation technique, which resolves the scale ambiguity problem of monocular visual odometry. The proposed technique augments the camera images with range measurements taken by an ultra-low-cost laser device known as the Spike. The size of the Spike laser rangefinder is small and can be mounted on a smartphone. Two datasets were collected along precisely surveyed tracks, both outdoor and indoor, to assess the effectiveness of the proposed technique. The coordinates of both tracks were determined using a total station to serve as a ground truth. In order to calibrate the smartphone’s camera, seven images of a checkerboard were taken from different positions and angles and then processed using a MATLAB-based camera calibration toolbox. Subsequently, the speeded-up robust features (SURF) method was used for image feature detection and matching. The random sample consensus (RANSAC) algorithm was then used to remove the outliers in the matched points between the sequential images. The relative orientation and translation between the frames were computed and then scaled using the spike measurements in order to obtain the scaled trajectory. Subsequently, the obtained scaled trajectory was used to construct the surrounding scene using the structure from motion (SfM) technique. Finally, both of the computed camera trajectory and the constructed scene were compared with ground truth. It is shown that the proposed technique allows for achieving centimeter-level accuracy in monocular VO scale recovery, which in turn leads to an enhanced mapping accuracy.
文摘Background Based on the seminal work proposed by Zhou et al., much of the recent progress in learning monocular visual odometry, i.e., depth and camera motion from monocular videos, can be attributed to the tricks in the training procedure, such as data augmentation and learning objectives. Methods Herein, we categorize a collection of such tricks through the theoretical examination and empirical evaluation of their effects on the final accuracy of the visual odometry. Results/Conclusions By combining the aforementioned tricks, we were able to significantly improve a baseline model adapted from SfMLearner without additional inference costs. Furthermore, we analyzed the principles of these tricks and the reason for their success. Practical guidelines for future research are also presented.
基金supported by the National Key Research and Development Program of China(Nos.2016YFB0502004,2017YFC0821102)。
文摘Visual-Inertial Odometry(VIO) fuses measurements from camera and Inertial Measurement Unit(IMU) to achieve accumulative performance that is better than using individual sensors.Hybrid VIO is an extended Kalman filter-based solution which augments features with long tracking length into the state vector of Multi-State Constraint Kalman Filter(MSCKF). In this paper, a novel hybrid VIO is proposed, which focuses on utilizing low-cost sensors while also considering both the computational efficiency and positioning precision. The proposed algorithm introduces several novel contributions. Firstly, by deducing an analytical error transition equation, onedimensional inverse depth parametrization is utilized to parametrize the augmented feature state.This modification is shown to significantly improve the computational efficiency and numerical robustness, as a result achieving higher precision. Secondly, for better handling of the static scene,a novel closed-form Zero velocity UPda Te(ZUPT) method is proposed. ZUPT is modeled as a measurement update for the filter rather than forbidding propagation roughly, which has the advantage of correcting the overall state through correlation in the filter covariance matrix. Furthermore, online spatial and temporal calibration is also incorporated. Experiments are conducted on both public dataset and real data. The results demonstrate the effectiveness of the proposed solution by showing that its performance is better than the baseline and the state-of-the-art algorithms in terms of both efficiency and precision. A related software is open-sourced to benefit the community.