Estimating an accurate six-degree-of-freedom(6-Do F)pose from correspondences with outliers remains a critical issue to 3D rigid registration.Random sample consensus(RANSAC)and its variants are popular solutions to th...Estimating an accurate six-degree-of-freedom(6-Do F)pose from correspondences with outliers remains a critical issue to 3D rigid registration.Random sample consensus(RANSAC)and its variants are popular solutions to this problem.Although there have been a number of RANSAC-fashion estimators,two issues remain unsolved.First,it is unclear which estimator is more appropriate to a particular application.Second,the impacts of different sampling strategies,hypothesis generation methods,hypothesis evaluation metrics,and stop criteria on the overall estimators remain ambiguous.This work fills these gaps by first considering six existing RANSAC-fashion methods and then proposing eight variants for a comprehensive evaluation.The objective is to thoroughly compare estimators in the RANSAC family,and evaluate the effects of each key stage on the eventual 6-Do F pose estimation performance.Experiments have been carried out on four standard datasets with different application scenarios,data modalities,and nuisances.They provide us with input correspondence sets with a variety of inlier ratios,spatial distributions,and scales.Based on the experimental results,we summarize remarkable outcomes and valuable findings,so as to give practical instructions to real-world applications,and highlight current bottlenecks and potential solutions in this research realm.展开更多
X-ray imaging is the conventional method for diagnosing the orthopedic condition of a patient. Computerized Tomography(CT) scanning is another diagnostic method that provides patient’s 3D anatomical information. Howe...X-ray imaging is the conventional method for diagnosing the orthopedic condition of a patient. Computerized Tomography(CT) scanning is another diagnostic method that provides patient’s 3D anatomical information. However, both methods have limitations when diagnosing the whole leg; X-ray imaging does not provide 3D information, and normal CT scanning cannot be performed with a standing posture. Obtaining 3D data regarding the whole leg in a standing posture is clinically important because it enables 3D analysis in the weight bearing condition.Based on these clinical needs, a hardware-based bi-plane X-ray imaging system has been developed; it uses two orthogonal X-ray images. However, such methods have not been made available in general clinics because of the hight cost. Therefore, we proposed a widely adaptive method for 2 D X-ray image and 3D CT scan data. By this method, it is possible to threedimensionally analyze the whole leg in standing posture. The optimal position that generates the most similar image is the captured X-ray image. The algorithm verifies the similarity using the performance of the proposed method by simulation-based experiments. Then, we analyzed the internal-external rotation angle of the femur using real patient data. Approximately 10.55 degrees of internal rotations were found relative to the defined anterior-posterior direction. In this paper, we present a useful registration method using the conventional X-ray image and 3D CT scan data to analyze the whole leg in the weight-bearing condition.展开更多
Deep-learning methods provide a promising approach for measuring in-vivo knee joint motion from fast registration of two-dimensional(2D)to three-dimensional(3D)data with a broad range of capture.However,if there are i...Deep-learning methods provide a promising approach for measuring in-vivo knee joint motion from fast registration of two-dimensional(2D)to three-dimensional(3D)data with a broad range of capture.However,if there are insufficient data for training,the data-driven approach will fail.We propose a feature-based transfer-learning method to extract features from fluoroscopic images.With three subjects and fewer than 100 pairs of real fluoroscopic images,we achieved a mean registration success rate of up to 40%.The proposed method provides a promising solution,using a learning-based registration method when only a limited number of real fluoroscopic images is available.展开更多
An automatic method is proposed to solve the registration problem,which aligns a single 2D fluoroscopic image to a 3D image volume without demanding any additional media like calibration plate or user interactions.Fir...An automatic method is proposed to solve the registration problem,which aligns a single 2D fluoroscopic image to a 3D image volume without demanding any additional media like calibration plate or user interactions.First,a mathematic projection model is designed which can reduce the influence of projection distortion on parameter optimization and improve the registration accuracy.Then,a two stage optimization method is proposed,which enables a robust registration in a wide parameter space.Furthermore,an automatic registration framework is proposed based on the FourierMellin robust image comparison descriptor.Experimental results show that the registration method has a high accuracy with average rotation error of 0.6 degree and average translation error of 1.4mm.展开更多
In order to improve the registration accuracy of brain magnetic resonance images(MRI),some deep learning registration methods use segmentation images for training model.How-ever,the segmentation values are constant fo...In order to improve the registration accuracy of brain magnetic resonance images(MRI),some deep learning registration methods use segmentation images for training model.How-ever,the segmentation values are constant for each label,which leads to the gradient variation con-centrating on the boundary.Thus,the dense deformation field(DDF)is gathered on the boundary and there even appears folding phenomenon.In order to fully leverage the label information,the morphological opening and closing information maps are introduced to enlarge the non-zero gradi-ent regions and improve the accuracy of DDF estimation.The opening information maps supervise the registration model to focus on smaller,narrow brain regions.The closing information maps supervise the registration model to pay more attention to the complex boundary region.Then,opening and closing morphology networks(OC_Net)are designed to automatically generate open-ing and closing information maps to realize the end-to-end training process.Finally,a new registra-tion architecture,VM_(seg+oc),is proposed by combining OC_Net and VoxelMorph.Experimental results show that the registration accuracy of VM_(seg+oc) is significantly improved on LPBA40 and OASIS1 datasets.Especially,VM_(seg+oc) can well improve registration accuracy in smaller brain regions and narrow regions.展开更多
This paper surveys state-of-the-art image features and descriptors for the task of 3D scan registration based on panoramic reflectance images.As modern terrestrial laser scanners digitize their environment in a spheri...This paper surveys state-of-the-art image features and descriptors for the task of 3D scan registration based on panoramic reflectance images.As modern terrestrial laser scanners digitize their environment in a spherical way,the sphere has to be projected to a two-dimensional image.To this end,we evaluate the equirectangular,the cylindrical,the Mercator,the rectilinear,the Pannini,the stereographic,and the z-axis projection.We show that the Mercator and the Pannini projection outperform the other projection methods.展开更多
Structure reconstruction of 3 D anatomy from biplanar X-ray images is a challenging topic. Traditionally, the elastic-model-based method was used to reconstruct 3 D shapes by deforming the control points on the elasti...Structure reconstruction of 3 D anatomy from biplanar X-ray images is a challenging topic. Traditionally, the elastic-model-based method was used to reconstruct 3 D shapes by deforming the control points on the elastic mesh. However, the reconstructed shape is not smooth because the limited control points are only distributed on the edge of the elastic mesh.Alternatively, statistical-model-based methods, which include shape-model-based and intensity-model-based methods, are introduced due to their smooth reconstruction. However, both suffer from limitations. With the shape-model-based method, only the boundary profile is considered, leading to the loss of valid intensity information. For the intensity-based-method, the computation speed is slow because it needs to calculate the intensity distribution in each iteration. To address these issues, we propose a new reconstruction method using X-ray images and a specimen’s CT data. Specifically, the CT data provides both the shape mesh and the intensity model of the vertebra. Intensity model is used to generate the deformation field from X-ray images, while the shape model is used to generate the patient specific model by applying the calculated deformation field.Experiments on the public synthetic dataset and clinical dataset show that the average reconstruction errors are 1.1 mm and1.2 mm, separately. The average reconstruction time is 3 minutes.展开更多
We address the 3D shape assembly of multiple geometric pieces without overlaps, a scenario often encountered in 3D shape design, field archeology, and robotics. Existing methods depend on strong assumptions on the num...We address the 3D shape assembly of multiple geometric pieces without overlaps, a scenario often encountered in 3D shape design, field archeology, and robotics. Existing methods depend on strong assumptions on the number of shape pieces and coherent geometry or semantics of shape pieces. Despite raising attention to 3D registration with complex or low overlapping patterns, few methods consider shape assembly with rare overlaps. To address this problem, we present a novel framework inspired by solving puzzles, named PuzzleNet, which conducts multi-task learning by leveraging both 3D alignment and boundary information. Specifically, we design an end-to-end neural network based on a point cloud transformer with two-way branches for estimating rigid transformation and predicting boundaries simultaneously. The framework is then naturally extended to reassemble multiple pieces into a full shape by using an iterative greedy approach based on the distance between each pair of candidate-matched pieces. To train and evaluate PuzzleNet, we construct two datasets, named ModelPuzzle and DublinPuzzle, based on a real-world urban scan dataset (DublinCity) and a synthetic CAD dataset (ModelNet40) respectively. Experiments demonstrate our effectiveness in solving 3D shape assembly for multiple pieces with arbitrary geometry and inconsistent semantics. Our method surpasses state-of-the-art algorithms by more than 10 times in rotation metrics and four times in translation metrics.展开更多
Estimation of fruit size in tree fruit crops is essential for selective robotic harvesting and crop-load estimation.Machine vision systems for fruit detection and localization have been studied widely for robotic harv...Estimation of fruit size in tree fruit crops is essential for selective robotic harvesting and crop-load estimation.Machine vision systems for fruit detection and localization have been studied widely for robotic harvesting and crop-load estimation.However,only a few studies have been carried out to estimate fruit size in orchards using machine vision systems.This study was carried out to develop a machine vision system consisting of a color CCD camera and a time-of-flight(TOF)light-based 3D camera for estimating apple size in tree canopies.As a measure of fruit size,the major axis(longest axis)was estimated based on(i)the 3D coordinates of pixels on corresponding apple surfaces,and(ii)the 2D size of individual pixels within apple surfaces.In the 3D coordinates-based method,the distance between pairs of pixels within apple regions were calculated using 3D coordinates,and the maximum distance between all pixel pairs within an apple region was estimated to be the major axis.The accuracy of estimating the major axis using 3D coordinates was 69.1%.In the pixel-size-based method,the physical sizes of pixels were estimated using a calibration model developed based on pixel coordinates and the distance to pixels from the camera.The major axis length was then estimated by summing the size of individual pixels along the major axis of the fruit.The accuracy of size estimation increased to 84.8%when the pixel size-based method was used.The results showed the potential for estimating fruit size in outdoor environments using a 3D machine vision system.展开更多
基金supported in part by the National Natural Science Foundation of China(NFSC)(62002295,U19B2037)China Postdoctoral Science Foundation(2020M673319)+1 种基金Shaanxi Provincial Key R&D Program(2021KWZ-03)the Natural Science Basic Research Plan in Shaanxi Province of China(2021JQ-290,2020JQ-210)。
文摘Estimating an accurate six-degree-of-freedom(6-Do F)pose from correspondences with outliers remains a critical issue to 3D rigid registration.Random sample consensus(RANSAC)and its variants are popular solutions to this problem.Although there have been a number of RANSAC-fashion estimators,two issues remain unsolved.First,it is unclear which estimator is more appropriate to a particular application.Second,the impacts of different sampling strategies,hypothesis generation methods,hypothesis evaluation metrics,and stop criteria on the overall estimators remain ambiguous.This work fills these gaps by first considering six existing RANSAC-fashion methods and then proposing eight variants for a comprehensive evaluation.The objective is to thoroughly compare estimators in the RANSAC family,and evaluate the effects of each key stage on the eventual 6-Do F pose estimation performance.Experiments have been carried out on four standard datasets with different application scenarios,data modalities,and nuisances.They provide us with input correspondence sets with a variety of inlier ratios,spatial distributions,and scales.Based on the experimental results,we summarize remarkable outcomes and valuable findings,so as to give practical instructions to real-world applications,and highlight current bottlenecks and potential solutions in this research realm.
基金Supported by the KIST institutional program(2E26880,2E26276)
文摘X-ray imaging is the conventional method for diagnosing the orthopedic condition of a patient. Computerized Tomography(CT) scanning is another diagnostic method that provides patient’s 3D anatomical information. However, both methods have limitations when diagnosing the whole leg; X-ray imaging does not provide 3D information, and normal CT scanning cannot be performed with a standing posture. Obtaining 3D data regarding the whole leg in a standing posture is clinically important because it enables 3D analysis in the weight bearing condition.Based on these clinical needs, a hardware-based bi-plane X-ray imaging system has been developed; it uses two orthogonal X-ray images. However, such methods have not been made available in general clinics because of the hight cost. Therefore, we proposed a widely adaptive method for 2 D X-ray image and 3D CT scan data. By this method, it is possible to threedimensionally analyze the whole leg in standing posture. The optimal position that generates the most similar image is the captured X-ray image. The algorithm verifies the similarity using the performance of the proposed method by simulation-based experiments. Then, we analyzed the internal-external rotation angle of the femur using real patient data. Approximately 10.55 degrees of internal rotations were found relative to the defined anterior-posterior direction. In this paper, we present a useful registration method using the conventional X-ray image and 3D CT scan data to analyze the whole leg in the weight-bearing condition.
基金sponsored by the National Natural Science Foundation of China(31771017,31972924,81873997)the Science and Technology Commission of Shanghai Municipality(16441908700)+3 种基金the Innovation Research Plan supported by Shanghai Municipal Education Commission(ZXWF082101)the National Key R&D Program of China(2017YFC0110700,2018YFF0300504,2019YFC0120600)the Natural Science Foundation of Shanghai(18ZR1428600)the Interdisciplinary Program of Shanghai Jiao Tong University(ZH2018QNA06,YG2017MS09).
文摘Deep-learning methods provide a promising approach for measuring in-vivo knee joint motion from fast registration of two-dimensional(2D)to three-dimensional(3D)data with a broad range of capture.However,if there are insufficient data for training,the data-driven approach will fail.We propose a feature-based transfer-learning method to extract features from fluoroscopic images.With three subjects and fewer than 100 pairs of real fluoroscopic images,we achieved a mean registration success rate of up to 40%.The proposed method provides a promising solution,using a learning-based registration method when only a limited number of real fluoroscopic images is available.
基金Supported by the National Natural Science Foundation of China(No.30970780)Ph.D.Programs Foundation of Ministry of Education ofChina(No.20091103110005)
文摘An automatic method is proposed to solve the registration problem,which aligns a single 2D fluoroscopic image to a 3D image volume without demanding any additional media like calibration plate or user interactions.First,a mathematic projection model is designed which can reduce the influence of projection distortion on parameter optimization and improve the registration accuracy.Then,a two stage optimization method is proposed,which enables a robust registration in a wide parameter space.Furthermore,an automatic registration framework is proposed based on the FourierMellin robust image comparison descriptor.Experimental results show that the registration method has a high accuracy with average rotation error of 0.6 degree and average translation error of 1.4mm.
基金supported by Shandong Provincial Natural Science Foundation(No.ZR2023MF062)the National Natural Science Foundation of China(No.61771230).
文摘In order to improve the registration accuracy of brain magnetic resonance images(MRI),some deep learning registration methods use segmentation images for training model.How-ever,the segmentation values are constant for each label,which leads to the gradient variation con-centrating on the boundary.Thus,the dense deformation field(DDF)is gathered on the boundary and there even appears folding phenomenon.In order to fully leverage the label information,the morphological opening and closing information maps are introduced to enlarge the non-zero gradi-ent regions and improve the accuracy of DDF estimation.The opening information maps supervise the registration model to focus on smaller,narrow brain regions.The closing information maps supervise the registration model to pay more attention to the complex boundary region.Then,opening and closing morphology networks(OC_Net)are designed to automatically generate open-ing and closing information maps to realize the end-to-end training process.Finally,a new registra-tion architecture,VM_(seg+oc),is proposed by combining OC_Net and VoxelMorph.Experimental results show that the registration accuracy of VM_(seg+oc) is significantly improved on LPBA40 and OASIS1 datasets.Especially,VM_(seg+oc) can well improve registration accuracy in smaller brain regions and narrow regions.
文摘This paper surveys state-of-the-art image features and descriptors for the task of 3D scan registration based on panoramic reflectance images.As modern terrestrial laser scanners digitize their environment in a spherical way,the sphere has to be projected to a two-dimensional image.To this end,we evaluate the equirectangular,the cylindrical,the Mercator,the rectilinear,the Pannini,the stereographic,and the z-axis projection.We show that the Mercator and the Pannini projection outperform the other projection methods.
基金supported in part by The National Key Research and Development Program of China(2018YFC2001302)the National Natural Science Foundation of China(61976209)+1 种基金CAS International Collaboration Key Project(173211KYSB20190024)Strategic Priority Research Program of CAS(XDB32040000)。
文摘Structure reconstruction of 3 D anatomy from biplanar X-ray images is a challenging topic. Traditionally, the elastic-model-based method was used to reconstruct 3 D shapes by deforming the control points on the elastic mesh. However, the reconstructed shape is not smooth because the limited control points are only distributed on the edge of the elastic mesh.Alternatively, statistical-model-based methods, which include shape-model-based and intensity-model-based methods, are introduced due to their smooth reconstruction. However, both suffer from limitations. With the shape-model-based method, only the boundary profile is considered, leading to the loss of valid intensity information. For the intensity-based-method, the computation speed is slow because it needs to calculate the intensity distribution in each iteration. To address these issues, we propose a new reconstruction method using X-ray images and a specimen’s CT data. Specifically, the CT data provides both the shape mesh and the intensity model of the vertebra. Intensity model is used to generate the deformation field from X-ray images, while the shape model is used to generate the patient specific model by applying the calculated deformation field.Experiments on the public synthetic dataset and clinical dataset show that the average reconstruction errors are 1.1 mm and1.2 mm, separately. The average reconstruction time is 3 minutes.
基金supported by the National Natural Science Foundation of China under Grant Nos.U22B2034,62172416,U21A20515,62172415,62271467the Youth Innovation Promotion Association of the Chinese Academy of Sciences under Grant No.2022131.
文摘We address the 3D shape assembly of multiple geometric pieces without overlaps, a scenario often encountered in 3D shape design, field archeology, and robotics. Existing methods depend on strong assumptions on the number of shape pieces and coherent geometry or semantics of shape pieces. Despite raising attention to 3D registration with complex or low overlapping patterns, few methods consider shape assembly with rare overlaps. To address this problem, we present a novel framework inspired by solving puzzles, named PuzzleNet, which conducts multi-task learning by leveraging both 3D alignment and boundary information. Specifically, we design an end-to-end neural network based on a point cloud transformer with two-way branches for estimating rigid transformation and predicting boundaries simultaneously. The framework is then naturally extended to reassemble multiple pieces into a full shape by using an iterative greedy approach based on the distance between each pair of candidate-matched pieces. To train and evaluate PuzzleNet, we construct two datasets, named ModelPuzzle and DublinPuzzle, based on a real-world urban scan dataset (DublinCity) and a synthetic CAD dataset (ModelNet40) respectively. Experiments demonstrate our effectiveness in solving 3D shape assembly for multiple pieces with arbitrary geometry and inconsistent semantics. Our method surpasses state-of-the-art algorithms by more than 10 times in rotation metrics and four times in translation metrics.
基金supported in part by the USDA’s Hatch and Multistate Project Funds(Accession Nos.1005756 and 1001246)。
文摘Estimation of fruit size in tree fruit crops is essential for selective robotic harvesting and crop-load estimation.Machine vision systems for fruit detection and localization have been studied widely for robotic harvesting and crop-load estimation.However,only a few studies have been carried out to estimate fruit size in orchards using machine vision systems.This study was carried out to develop a machine vision system consisting of a color CCD camera and a time-of-flight(TOF)light-based 3D camera for estimating apple size in tree canopies.As a measure of fruit size,the major axis(longest axis)was estimated based on(i)the 3D coordinates of pixels on corresponding apple surfaces,and(ii)the 2D size of individual pixels within apple surfaces.In the 3D coordinates-based method,the distance between pairs of pixels within apple regions were calculated using 3D coordinates,and the maximum distance between all pixel pairs within an apple region was estimated to be the major axis.The accuracy of estimating the major axis using 3D coordinates was 69.1%.In the pixel-size-based method,the physical sizes of pixels were estimated using a calibration model developed based on pixel coordinates and the distance to pixels from the camera.The major axis length was then estimated by summing the size of individual pixels along the major axis of the fruit.The accuracy of size estimation increased to 84.8%when the pixel size-based method was used.The results showed the potential for estimating fruit size in outdoor environments using a 3D machine vision system.