In order to find better simplicity measurements for 3D object recognition, a new set of local regularities is developed and tested in a stepwise 3D reconstruction method, including localized minimizing standard deviat...In order to find better simplicity measurements for 3D object recognition, a new set of local regularities is developed and tested in a stepwise 3D reconstruction method, including localized minimizing standard deviation of angles(L-MSDA), localized minimizing standard deviation of segment magnitudes(L-MSDSM), localized minimum standard deviation of areas of child faces (L-MSDAF), localized minimum sum of segment magnitudes of common edges (L-MSSM), and localized minimum sum of areas of child face (L-MSAF). Based on their effectiveness measurements in terms of form and size distortions, it is found that when two local regularities: L-MSDA and L-MSDSM are combined together, they can produce better performance. In addition, the best weightings for them to work together are identified as 10% for L-MSDSM and 90% for L-MSDA. The test results show that the combined usage of L-MSDA and L-MSDSM with identified weightings has a potential to be applied in other optimization based 3D recognition methods to improve their efficacy and robustness.展开更多
In this Paper, a classification method based on neural networks is presented for recognition of 3D objects. Indeed, the objective of this paper is to classify an object query against objects in a database, which leads...In this Paper, a classification method based on neural networks is presented for recognition of 3D objects. Indeed, the objective of this paper is to classify an object query against objects in a database, which leads to recognition of the former. 3D objects of this database are transformations of other objects by one element of the overall transformation. The set of transformations considered in this work is the general affine group.展开更多
The point pair feature(PPF)is widely used for 6D pose estimation.In this paper,we propose an efficient 6D pose estimation method based on the PPF framework.We introduce a well-targeted down-sampling strategy that focu...The point pair feature(PPF)is widely used for 6D pose estimation.In this paper,we propose an efficient 6D pose estimation method based on the PPF framework.We introduce a well-targeted down-sampling strategy that focuses on edge areas for efficient feature extraction for complex geometry.A pose hypothesis validation approach is proposed to resolve ambiguity due to symmetry by calculating the edge matching degree.We perform evaluations on two challenging datasets and one real-world collected dataset,demonstrating the superiority of our method for pose estimation for geometrically complex,occluded,symmetrical objects.We further validate our method by applying it to simulated punctures.展开更多
Holoscopic 3D imaging is a true 3D imaging system mimics fly’s eye technique to acquire a true 3D optical model of a real scene. To reconstruct the 3D image computationally, an efficient implementation of an Auto-Fea...Holoscopic 3D imaging is a true 3D imaging system mimics fly’s eye technique to acquire a true 3D optical model of a real scene. To reconstruct the 3D image computationally, an efficient implementation of an Auto-Feature-Edge (AFE) descriptor algorithm is required that provides an individual feature detector for integration of 3D information to locate objects in the scene. The AFE descriptor plays a key role in simplifying the detection of both edge-based and region-based objects. The detector is based on a Multi-Quantize Adaptive Local Histogram Analysis (MQALHA) algorithm. This is distinctive for each Feature-Edge (FE) block i.e. the large contrast changes (gradients) in FE are easier to localise. The novelty of this work lies in generating a free-noise 3D-Map (3DM) according to a correlation analysis of region contours. This automatically combines the exploitation of the available depth estimation technique with edge-based feature shape recognition technique. The application area consists of two varied domains, which prove the efficiency and robustness of the approach: a) extracting a set of setting feature-edges, for both tracking and mapping process for 3D depthmap estimation, and b) separation and recognition of focus objects in the scene. Experimental results show that the proposed 3DM technique is performed efficiently compared to the state-of-the-art algorithms.展开更多
Active vision is inherently attention-driven:an agent actively selects views to attend in order to rapidly perform a vision task while improving its internal representation of the scene being observed.Inspired by the ...Active vision is inherently attention-driven:an agent actively selects views to attend in order to rapidly perform a vision task while improving its internal representation of the scene being observed.Inspired by the recent success of attention-based models in 2D vision tasks based on single RGB images, we address multi-view depth-based active object recognition using an attention mechanism, by use of an end-to-end recurrent 3D attentional network. The architecture takes advantage of a recurrent neural network to store and update an internal representation. Our model,trained with 3D shape datasets, is able to iteratively attend the best views targeting an object of interest for recognizing it. To realize 3D view selection, we derive a 3D spatial transformer network. It is dierentiable,allowing training with backpropagation, and so achieving much faster convergence than the reinforcement learning employed by most existing attention-based models. Experiments show that our method, with only depth input, achieves state-of-the-art next-best-view performance both in terms of time taken and recognition accuracy.展开更多
The availability of a good viewpoint space partition is crucial in three dimensional (3-D) object recognition on the approach of aspect graph. There are two important events, depicted by the aspect graph approach, e...The availability of a good viewpoint space partition is crucial in three dimensional (3-D) object recognition on the approach of aspect graph. There are two important events, depicted by the aspect graph approach, edge-:edge-edge (EEE) events and edge-vertex (EV) events. This paper presents an algorithm to compute EEE events by characteristic analysis based on conicoid theory, in contrast to current algorithms that focus too much on EV events and often overlook the importance of EEE events. Also, the paper provides a standard flowchart for the viewpoint space partitioning based on aspect graph theory that makes it suitable for perspective models. The partitioning result best demonstrates the algorithm's efficiency with more valuable viewpoints found with the help of EEE events, which can definitely help to achieve high recognition rate for 3-D object recognition.展开更多
Lightweight modules play a key role in 3D object detection tasks for autonomous driving,which are necessary for the application of 3D object detectors.At present,research still focuses on constructing complex models a...Lightweight modules play a key role in 3D object detection tasks for autonomous driving,which are necessary for the application of 3D object detectors.At present,research still focuses on constructing complex models and calculations to improve the detection precision at the expense of the running rate.However,building a lightweight model to learn the global features from point cloud data for 3D object detection is a significant problem.In this paper,we focus on combining convolutional neural networks with selfattention-based vision transformers to realize lightweight and high-speed computing for 3D object detection.We propose lightweight detection 3D(LWD-3D),which is a point cloud conversion and lightweight vision transformer for autonomous driving.LWD-3D utilizes a one-shot regression framework in 2D space and generates a 3D object bounding box from point cloud data,which provides a new feature representation method based on a vision transformer for 3D detection applications.The results of experiment on the KITTI 3D dataset show that LWD-3D achieves real-time detection(time per image<20 ms).LWD-3D obtains a mean average precision(mAP)75%higher than that of another 3D real-time detector with half the number of parameters.Our research extends the application of visual transformers to 3D object detection tasks.展开更多
The process of modern photogrammetry converts images and/or LiDAR data into usable 2D/3D/4D products.The photogrammetric industry offers engineering-grade hardware and software components for various applications.Whil...The process of modern photogrammetry converts images and/or LiDAR data into usable 2D/3D/4D products.The photogrammetric industry offers engineering-grade hardware and software components for various applications.While some components of the data processing pipeline work already automatically,there is still substantial manual involvement required in order to obtain reliable and high-quality results.The recent development of machine learning techniques has attracted a great attention in its potential to address complex tasks that traditionally require manual inputs.It is therefore worth revisiting the role and existing efforts of machine learning techniques in the field of photogrammetry,as well as its neighboring field computer vision.This paper provides an overview of the state-of-the-art efforts in machine learning in bringing the automated and‘intelligent’component to photogrammetry,computer vision and(to a lesser degree)to remote sensing.We will primarily cover the relevant efforts following a typical 3D photogrammetric processing pipeline:(1)data acquisition(2)georeferencing/interest point matching(3)Digital Surface Model generation(4)semantic interpretations,followed by conclusions and our insights.展开更多
文摘In order to find better simplicity measurements for 3D object recognition, a new set of local regularities is developed and tested in a stepwise 3D reconstruction method, including localized minimizing standard deviation of angles(L-MSDA), localized minimizing standard deviation of segment magnitudes(L-MSDSM), localized minimum standard deviation of areas of child faces (L-MSDAF), localized minimum sum of segment magnitudes of common edges (L-MSSM), and localized minimum sum of areas of child face (L-MSAF). Based on their effectiveness measurements in terms of form and size distortions, it is found that when two local regularities: L-MSDA and L-MSDSM are combined together, they can produce better performance. In addition, the best weightings for them to work together are identified as 10% for L-MSDSM and 90% for L-MSDA. The test results show that the combined usage of L-MSDA and L-MSDSM with identified weightings has a potential to be applied in other optimization based 3D recognition methods to improve their efficacy and robustness.
文摘In this Paper, a classification method based on neural networks is presented for recognition of 3D objects. Indeed, the objective of this paper is to classify an object query against objects in a database, which leads to recognition of the former. 3D objects of this database are transformations of other objects by one element of the overall transformation. The set of transformations considered in this work is the general affine group.
基金This work was supported in part by the National Key R&D Program of China(2018AAA0102200)National Natural Science Foundation of China(62132021,62102435,61902419,62002375,62002376)+2 种基金Natural Science Foundation of Hunan Province of China(2021JJ40696)Huxiang Youth Talent Support Program(2021RC3071)NUDT Research Grants(ZK19-30,ZK22-52).
文摘The point pair feature(PPF)is widely used for 6D pose estimation.In this paper,we propose an efficient 6D pose estimation method based on the PPF framework.We introduce a well-targeted down-sampling strategy that focuses on edge areas for efficient feature extraction for complex geometry.A pose hypothesis validation approach is proposed to resolve ambiguity due to symmetry by calculating the edge matching degree.We perform evaluations on two challenging datasets and one real-world collected dataset,demonstrating the superiority of our method for pose estimation for geometrically complex,occluded,symmetrical objects.We further validate our method by applying it to simulated punctures.
文摘Holoscopic 3D imaging is a true 3D imaging system mimics fly’s eye technique to acquire a true 3D optical model of a real scene. To reconstruct the 3D image computationally, an efficient implementation of an Auto-Feature-Edge (AFE) descriptor algorithm is required that provides an individual feature detector for integration of 3D information to locate objects in the scene. The AFE descriptor plays a key role in simplifying the detection of both edge-based and region-based objects. The detector is based on a Multi-Quantize Adaptive Local Histogram Analysis (MQALHA) algorithm. This is distinctive for each Feature-Edge (FE) block i.e. the large contrast changes (gradients) in FE are easier to localise. The novelty of this work lies in generating a free-noise 3D-Map (3DM) according to a correlation analysis of region contours. This automatically combines the exploitation of the available depth estimation technique with edge-based feature shape recognition technique. The application area consists of two varied domains, which prove the efficiency and robustness of the approach: a) extracting a set of setting feature-edges, for both tracking and mapping process for 3D depthmap estimation, and b) separation and recognition of focus objects in the scene. Experimental results show that the proposed 3DM technique is performed efficiently compared to the state-of-the-art algorithms.
基金supported by National Natural Science Foundation of China (Nos. 61572507, 61622212, and 61532003)supported by the China Scholarship Council
文摘Active vision is inherently attention-driven:an agent actively selects views to attend in order to rapidly perform a vision task while improving its internal representation of the scene being observed.Inspired by the recent success of attention-based models in 2D vision tasks based on single RGB images, we address multi-view depth-based active object recognition using an attention mechanism, by use of an end-to-end recurrent 3D attentional network. The architecture takes advantage of a recurrent neural network to store and update an internal representation. Our model,trained with 3D shape datasets, is able to iteratively attend the best views targeting an object of interest for recognizing it. To realize 3D view selection, we derive a 3D spatial transformer network. It is dierentiable,allowing training with backpropagation, and so achieving much faster convergence than the reinforcement learning employed by most existing attention-based models. Experiments show that our method, with only depth input, achieves state-of-the-art next-best-view performance both in terms of time taken and recognition accuracy.
基金Supported by the National Natural Science Foundation of China (No.60502013)by the National High-Tech Research and Development(863) Program of China(No.2006AA01Z115)
文摘The availability of a good viewpoint space partition is crucial in three dimensional (3-D) object recognition on the approach of aspect graph. There are two important events, depicted by the aspect graph approach, edge-:edge-edge (EEE) events and edge-vertex (EV) events. This paper presents an algorithm to compute EEE events by characteristic analysis based on conicoid theory, in contrast to current algorithms that focus too much on EV events and often overlook the importance of EEE events. Also, the paper provides a standard flowchart for the viewpoint space partitioning based on aspect graph theory that makes it suitable for perspective models. The partitioning result best demonstrates the algorithm's efficiency with more valuable viewpoints found with the help of EEE events, which can definitely help to achieve high recognition rate for 3-D object recognition.
基金supported by the National Natural Science Foundation of China(No.62206237)Japan Science Promotion Society(Nos.22K12093 and 22K12094)Japan Science and Technology Agency(No.JPMJST2281).
文摘Lightweight modules play a key role in 3D object detection tasks for autonomous driving,which are necessary for the application of 3D object detectors.At present,research still focuses on constructing complex models and calculations to improve the detection precision at the expense of the running rate.However,building a lightweight model to learn the global features from point cloud data for 3D object detection is a significant problem.In this paper,we focus on combining convolutional neural networks with selfattention-based vision transformers to realize lightweight and high-speed computing for 3D object detection.We propose lightweight detection 3D(LWD-3D),which is a point cloud conversion and lightweight vision transformer for autonomous driving.LWD-3D utilizes a one-shot regression framework in 2D space and generates a 3D object bounding box from point cloud data,which provides a new feature representation method based on a vision transformer for 3D detection applications.The results of experiment on the KITTI 3D dataset show that LWD-3D achieves real-time detection(time per image<20 ms).LWD-3D obtains a mean average precision(mAP)75%higher than that of another 3D real-time detector with half the number of parameters.Our research extends the application of visual transformers to 3D object detection tasks.
基金supported by the Office of Naval Research[Award No.N000141712928].
文摘The process of modern photogrammetry converts images and/or LiDAR data into usable 2D/3D/4D products.The photogrammetric industry offers engineering-grade hardware and software components for various applications.While some components of the data processing pipeline work already automatically,there is still substantial manual involvement required in order to obtain reliable and high-quality results.The recent development of machine learning techniques has attracted a great attention in its potential to address complex tasks that traditionally require manual inputs.It is therefore worth revisiting the role and existing efforts of machine learning techniques in the field of photogrammetry,as well as its neighboring field computer vision.This paper provides an overview of the state-of-the-art efforts in machine learning in bringing the automated and‘intelligent’component to photogrammetry,computer vision and(to a lesser degree)to remote sensing.We will primarily cover the relevant efforts following a typical 3D photogrammetric processing pipeline:(1)data acquisition(2)georeferencing/interest point matching(3)Digital Surface Model generation(4)semantic interpretations,followed by conclusions and our insights.