Aiming at the limitations of the existing railway foreign object detection methods based on two-dimensional(2D)images,such as short detection distance,strong influence of environment and lack of distance information,w...Aiming at the limitations of the existing railway foreign object detection methods based on two-dimensional(2D)images,such as short detection distance,strong influence of environment and lack of distance information,we propose Rail-PillarNet,a three-dimensional(3D)LIDAR(Light Detection and Ranging)railway foreign object detection method based on the improvement of PointPillars.Firstly,the parallel attention pillar encoder(PAPE)is designed to fully extract the features of the pillars and alleviate the problem of local fine-grained information loss in PointPillars pillars encoder.Secondly,a fine backbone network is designed to improve the feature extraction capability of the network by combining the coding characteristics of LIDAR point cloud feature and residual structure.Finally,the initial weight parameters of the model were optimised by the transfer learning training method to further improve accuracy.The experimental results on the OSDaR23 dataset show that the average accuracy of Rail-PillarNet reaches 58.51%,which is higher than most mainstream models,and the number of parameters is 5.49 M.Compared with PointPillars,the accuracy of each target is improved by 10.94%,3.53%,16.96%and 19.90%,respectively,and the number of parameters only increases by 0.64M,which achieves a balance between the number of parameters and accuracy.展开更多
Monocular 3D object detection is challenging due to the lack of accurate depth information.Some methods estimate the pixel-wise depth maps from off-the-shelf depth estimators and then use them as an additional input t...Monocular 3D object detection is challenging due to the lack of accurate depth information.Some methods estimate the pixel-wise depth maps from off-the-shelf depth estimators and then use them as an additional input to augment the RGB images.Depth-based methods attempt to convert estimated depth maps to pseudo-LiDAR and then use LiDAR-based object detectors or focus on the perspective of image and depth fusion learning.However,they demonstrate limited performance and efficiency as a result of depth inaccuracy and complex fusion mode with convolutions.Different from these approaches,our proposed depth-guided vision transformer with a normalizing flows(NF-DVT)network uses normalizing flows to build priors in depth maps to achieve more accurate depth information.Then we develop a novel Swin-Transformer-based backbone with a fusion module to process RGB image patches and depth map patches with two separate branches and fuse them using cross-attention to exchange information with each other.Furthermore,with the help of pixel-wise relative depth values in depth maps,we develop new relative position embeddings in the cross-attention mechanism to capture more accurate sequence ordering of input tokens.Our method is the first Swin-Transformer-based backbone architecture for monocular 3D object detection.The experimental results on the KITTI and the challenging Waymo Open datasets show the effectiveness of our proposed method and superior performance over previous counterparts.展开更多
3D vehicle detection based on LiDAR-camera fusion is becoming an emerging research topic in autonomous driving.The algorithm based on the Camera-LiDAR object candidate fusion method(CLOCs)is currently considered to be...3D vehicle detection based on LiDAR-camera fusion is becoming an emerging research topic in autonomous driving.The algorithm based on the Camera-LiDAR object candidate fusion method(CLOCs)is currently considered to be a more effective decision-level fusion algorithm,but it does not fully utilize the extracted features of 3D and 2D.Therefore,we proposed a 3D vehicle detection algorithm based onmultimodal decision-level fusion.First,project the anchor point of the 3D detection bounding box into the 2D image,calculate the distance between 2D and 3D anchor points,and use this distance as a new fusion feature to enhance the feature redundancy of the network.Subsequently,add an attention module:squeeze-and-excitation networks,weight each feature channel to enhance the important features of the network,and suppress useless features.The experimental results show that the mean average precision of the algorithm in the KITTI dataset is 82.96%,which outperforms previous state-ofthe-art multimodal fusion-based methods,and the average accuracy in the Easy,Moderate and Hard evaluation indicators reaches 88.96%,82.60%,and 77.31%,respectively,which are higher compared to the original CLOCs model by 1.02%,2.29%,and 0.41%,respectively.Compared with the original CLOCs algorithm,our algorithm has higher accuracy and better performance in 3D vehicle detection.展开更多
In complex traffic environment scenarios,it is very important for autonomous vehicles to accurately perceive the dynamic information of other vehicles around the vehicle in advance.The accuracy of 3D object detection ...In complex traffic environment scenarios,it is very important for autonomous vehicles to accurately perceive the dynamic information of other vehicles around the vehicle in advance.The accuracy of 3D object detection will be affected by problems such as illumination changes,object occlusion,and object detection distance.To this purpose,we face these challenges by proposing a multimodal feature fusion network for 3D object detection(MFF-Net).In this research,this paper first uses the spatial transformation projection algorithm to map the image features into the feature space,so that the image features are in the same spatial dimension when fused with the point cloud features.Then,feature channel weighting is performed using an adaptive expression augmentation fusion network to enhance important network features,suppress useless features,and increase the directionality of the network to features.Finally,this paper increases the probability of false detection and missed detection in the non-maximum suppression algo-rithm by increasing the one-dimensional threshold.So far,this paper has constructed a complete 3D target detection network based on multimodal feature fusion.The experimental results show that the proposed achieves an average accuracy of 82.60%on the Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI)dataset,outperforming previous state-of-the-art multimodal fusion networks.In Easy,Moderate,and hard evaluation indicators,the accuracy rate of this paper reaches 90.96%,81.46%,and 75.39%.This shows that the MFF-Net network has good performance in 3D object detection.展开更多
The high bandwidth and low latency of 6G network technology enable the successful application of monocular 3D object detection on vehicle platforms.Monocular 3D-object-detection-based Pseudo-LiDAR is a low-cost,lowpow...The high bandwidth and low latency of 6G network technology enable the successful application of monocular 3D object detection on vehicle platforms.Monocular 3D-object-detection-based Pseudo-LiDAR is a low-cost,lowpower solution compared to LiDAR solutions in the field of autonomous driving.However,this technique has some problems,i.e.,(1)the poor quality of generated Pseudo-LiDAR point clouds resulting from the nonlinear error distribution of monocular depth estimation and(2)the weak representation capability of point cloud features due to the neglected global geometric structure features of point clouds existing in LiDAR-based 3D detection networks.Therefore,we proposed a Pseudo-LiDAR confidence sampling strategy and a hierarchical geometric feature extraction module for monocular 3D object detection.We first designed a point cloud confidence sampling strategy based on a 3D Gaussian distribution to assign small confidence to the points with great error in depth estimation and filter them out according to the confidence.Then,we present a hierarchical geometric feature extraction module by aggregating the local neighborhood features and a dual transformer to capture the global geometric features in the point cloud.Finally,our detection framework is based on Point-Voxel-RCNN(PV-RCNN)with high-quality Pseudo-LiDAR and enriched geometric features as input.From the experimental results,our method achieves satisfactory results in monocular 3D object detection.展开更多
LIDAR point cloud-based 3D object detection aims to sense the surrounding environment by anchoring objects with the Bounding Box(BBox).However,under the three-dimensional space of autonomous driving scenes,the previou...LIDAR point cloud-based 3D object detection aims to sense the surrounding environment by anchoring objects with the Bounding Box(BBox).However,under the three-dimensional space of autonomous driving scenes,the previous object detection methods,due to the pre-processing of the original LIDAR point cloud into voxels or pillars,lose the coordinate information of the original point cloud,slow detection speed,and gain inaccurate bounding box positioning.To address the issues above,this study proposes a new two-stage network structure to extract point cloud features directly by PointNet++,which effectively preserves the original point cloud coordinate information.To improve the detection accuracy,a shell-based modeling method is proposed.It roughly determines which spherical shell the coordinates belong to.Then,the results are refined to ground truth,thereby narrowing the localization range and improving the detection accuracy.To improve the recall of 3D object detection with bounding boxes,this paper designs a self-attention module for 3D object detection with a skip connection structure.Some of these features are highlighted by weighting them on the feature dimensions.After training,it makes the feature weights that are favorable for object detection get larger.Thus,the extracted features are more adapted to the object detection task.Extensive comparison experiments and ablation experiments conducted on the KITTI dataset verify the effectiveness of our proposed method in improving recall and precision.展开更多
Light detection and ranging(LiDAR)sensors play a vital role in acquiring 3D point cloud data and extracting valuable information about objects for tasks such as autonomous driving,robotics,and virtual reality(VR).Howe...Light detection and ranging(LiDAR)sensors play a vital role in acquiring 3D point cloud data and extracting valuable information about objects for tasks such as autonomous driving,robotics,and virtual reality(VR).However,the sparse and disordered nature of the 3D point cloud poses significant challenges to feature extraction.Overcoming limitations is critical for 3D point cloud processing.3D point cloud object detection is a very challenging and crucial task,in which point cloud processing and feature extraction methods play a crucial role and have a significant impact on subsequent object detection performance.In this overview of outstanding work in object detection from the 3D point cloud,we specifically focus on summarizing methods employed in 3D point cloud processing.We introduce the way point clouds are processed in classical 3D object detection algorithms,and their improvements to solve the problems existing in point cloud processing.Different voxelization methods and point cloud sampling strategies will influence the extracted features,thereby impacting the final detection performance.展开更多
Ground constructions and mines are severely threatened by ones. Safe and precise cavity detection is vital for reasonable cavity underground cavities especially those unsafe or inaccessible evaluation and disposal. Th...Ground constructions and mines are severely threatened by ones. Safe and precise cavity detection is vital for reasonable cavity underground cavities especially those unsafe or inaccessible evaluation and disposal. The conventional cavity detection methods and their limitation were analyzed. Those methods cannot form 3D model of underground cavity which is used for instructing the cavity disposal; and their precisions in detection are always greatly affected by the geological circumstance. The importance of 3D cavity detection in metal mine for safe exploitation was pointed out; and the 3D cavity laser detection method and its principle were introduced. A cavity auto scanning laser system was recommended to actualize the cavity 3D detection after comparing with the other laser detection systems. Four boreholes were chosen to verify the validity of the cavity auto scanning laser system. The results show that the cavity auto scanning laser system is very suitable for underground 3D cavity detection, especially for those inaccessible ones.展开更多
Accurate salt dome detection from 3D seismic data is crucial to different seismic data analysis applications. We present a new edge based approach for salt dome detection in migrated 3D seismic data. The proposed algo...Accurate salt dome detection from 3D seismic data is crucial to different seismic data analysis applications. We present a new edge based approach for salt dome detection in migrated 3D seismic data. The proposed algorithm overcomes the drawbacks of existing edge-based techniques which only consider edges in the x (crossline) and y (inline) directions in 2D data and the x (crossline), y (inline), and z (time) directions in 3D data. The algorithm works by combining 3D gradient maps computed along diagonal directions and those computed in x, y, and z directions to accurately detect the boundaries of salt regions. The combination of x, y, and z directions and diagonal edges ensures that the proposed algorithm works well even if the dips along the salt boundary are represented only by weak reflectors. Contrary to other edge and texture based salt dome detection techniques, the proposed algorithm is independent of the amplitude variations in seismic data. We tested the proposed algorithm on the publicly available Netherlands offshore F3 block. The results suggest that the proposed algorithm can detect salt bodies with high accuracy than existing gradient based and texture-based techniques when used separately. More importantly, the proposed approach is shown to be computationally efficient allowing for real time implementation and deployment.展开更多
The self-attention networks and Transformer have dominated machine translation and natural language processing fields,and shown great potential in image vision tasks such as image classification and object detection.I...The self-attention networks and Transformer have dominated machine translation and natural language processing fields,and shown great potential in image vision tasks such as image classification and object detection.Inspired by the great progress of Transformer,we propose a novel general and robust voxel feature encoder for 3D object detection based on the traditional Transformer.We first investigate the permutation invariance of sequence data of the self-attention and apply it to point cloud processing.Then we construct a voxel feature layer based on the self-attention to adaptively learn local and robust context of a voxel according to the spatial relationship and context information exchanging between all points within the voxel.Lastly,we construct a general voxel feature learning framework with the voxel feature layer as the core for 3D object detection.The voxel feature with Transformer(VFT)can be plugged into any other voxel-based 3D object detection framework easily,and serves as the backbone for voxel feature extractor.Experiments results on the KITTI dataset demonstrate that our method achieves the state-of-the-art performance on 3D object detection.展开更多
Obstacle detection is essential for mobile robots to avoid collision with obstacles.Mobile robots usually operate in indoor environments,where they encounter various kinds of obstacles;however,2D range sensor can sens...Obstacle detection is essential for mobile robots to avoid collision with obstacles.Mobile robots usually operate in indoor environments,where they encounter various kinds of obstacles;however,2D range sensor can sense obstacles only in 2D plane.In contrast,by using 3D range sensor,it is possible to detect ground and aerial obstacles that 2D range sensor cannot sense.In this paper,we present a 3D obstacle detection method that will help overcome the limitations of 2D range sensor with regard to obstacle detection.The indoor environment typically consists of a flat floor.The position of the floor can be determined by estimating the plane using the least squares method.Having determined the position of the floor,the points of obstacles can be known by rejecting the points of the floor.In the experimental section,we show the results of this approach using a Kinect sensor.展开更多
Point clouds and RGB images are both critical data for 3D object detection. While recent multi-modal methods combine them directly and show remarkable performances, they ignore the distinct forms of these two types of...Point clouds and RGB images are both critical data for 3D object detection. While recent multi-modal methods combine them directly and show remarkable performances, they ignore the distinct forms of these two types of data. For mitigating the influence of this intrinsic difference on performance, we propose a novel but effective fusion model named LI-Attention model, which takes both RGB features and point cloud features into consideration and assigns a weight to each RGB feature by attention mechanism.Furthermore, based on the LI-Attention model, we propose a 3D object detection method called image attention transformer network(IAT-Net) specialized for indoor RGB-D scene. Compared with previous work on multi-modal detection, IAT-Net fuses elaborate RGB features from 2D detection results with point cloud features in attention mechanism, meanwhile generates and refines 3D detection results with transformer model. Extensive experiments demonstrate that our approach outperforms stateof-the-art performance on two widely used benchmarks of indoor 3D object detection, SUN RGB-D and NYU Depth V2, while ablation studies have been provided to analyze the effect of each module. And the source code for the proposed IAT-Net is publicly available at https://github.com/wisper181/IAT-Net.展开更多
Road accident detection plays an important role in abnormal scene reconstruction for Intelligent Transportation Systems and abnormal events warning for autonomous driving.This paper presents a novel 3D object detector...Road accident detection plays an important role in abnormal scene reconstruction for Intelligent Transportation Systems and abnormal events warning for autonomous driving.This paper presents a novel 3D object detector and adaptive space partitioning algorithm to infer traffic accidents quantitatively.Using 2D region proposals in an RGB image,this method generates deformable frustums based on point cloud for each 2D region proposal and then frustum-wisely extracts features based on the farthest point sampling network(FPS-Net)and feature extraction network(FE-Net).Subsequently,the encoder-decoder network(ED-Net)implements 3D-oriented bounding box(OBB)regression.Meanwhile,the adaptive least square regression(ALSR)method is proposed to split 3D OBB.Finally,the reduced OBB intersection test is carried out to detect traffic accidents via separating surface theorem(SST).In the experiments of KITTI benchmark,our proposed 3D object detector outperforms other state-of-theartmethods.Meanwhile,collision detection algorithm achieves the satisfactory performance of 91.8%accuracy on our SHTA dataset.展开更多
In order to solve difficult detection of far and hard objects due to the sparseness and insufficient semantic information of LiDAR point cloud,a 3D object detection network with multi-modal data adaptive fusion is pro...In order to solve difficult detection of far and hard objects due to the sparseness and insufficient semantic information of LiDAR point cloud,a 3D object detection network with multi-modal data adaptive fusion is proposed,which makes use of multi-neighborhood information of voxel and image information.Firstly,design an improved ResNet that maintains the structure information of far and hard objects in low-resolution feature maps,which is more suitable for detection task.Meanwhile,semantema of each image feature map is enhanced by semantic information from all subsequent feature maps.Secondly,extract multi-neighborhood context information with different receptive field sizes to make up for the defect of sparseness of point cloud which improves the ability of voxel features to represent the spatial structure and semantic information of objects.Finally,propose a multi-modal feature adaptive fusion strategy which uses learnable weights to express the contribution of different modal features to the detection task,and voxel attention further enhances the fused feature expression of effective target objects.The experimental results on the KITTI benchmark show that this method outperforms VoxelNet with remarkable margins,i.e.increasing the AP by 8.78%and 5.49%on medium and hard difficulty levels.Meanwhile,our method achieves greater detection performance compared with many mainstream multi-modal methods,i.e.outperforming the AP by 1%compared with that of MVX-Net on medium and hard difficulty levels.展开更多
Compared with the traditional scanning confocal microscopy, the effect of various factors on characteristic in multi-beam parallel confocal system is discussed, the error factors in multi-beam parallel confocal system...Compared with the traditional scanning confocal microscopy, the effect of various factors on characteristic in multi-beam parallel confocal system is discussed, the error factors in multi-beam parallel confocal system are analyzed. The factors influencing the characteristics of the multi-beam parallel confocal system are discussed. The construction and working principle of the non-scanning 3D detecting system is introduced, and some experiment results prove the effect of various factors on the detecting system.展开更多
Within today's product development process, various FE-simulations (finite element) for the functional validation of the desired characteristics are made to avoid expensive testing with real components. Those simul...Within today's product development process, various FE-simulations (finite element) for the functional validation of the desired characteristics are made to avoid expensive testing with real components. Those simulations are performed with great effort for discretization, use of simulations conditions, like taking different non-linearities (i.e., material behavior, etc.) into account, to create meaningful results. Despite knowing the effects of deformations occurring during the production processes, always the non-deformed design model of a CAD-system (computer aided design) is used for the FE-simulations. It seems rather doubtful that further refinement of simulation methods makes sense, if the real manufactured geometry of the component is not considered for in the simulation. For an efficient exploit of the potential of simulation methods, an approach has been developed which offers a geometry model for simulation based on the existing CAD-model but with integrated production deviations as soon as a first prototype is at hand by adapting the FE-mesh to the real, 3D surface detected geometry.展开更多
Compared to 3D object detection using a single camera,multiple cameras can overcome some limitations on field-of-view,occlusion,and low detection confidence.This study employs multiple surveillance cameras and develop...Compared to 3D object detection using a single camera,multiple cameras can overcome some limitations on field-of-view,occlusion,and low detection confidence.This study employs multiple surveillance cameras and develops a cooperative 3D object detection and tracking framework by incorporating temporal and spatial information.The framework consists of a 3D vehicle detection model,cooperatively spatial-temporal relation scheme,and heuristic camera constellation method.Specifically,the proposed cross-camera association scheme combines the geometric relationship between multiple cameras and objects in corresponding detections.The spatial-temporal method is designed to associate vehicles between different points of view at a single timestamp and fulfill vehicle tracking in the time aspect.The proposed framework is evaluated based on a synthetic cooperative dataset and shows high reliability,where the cooperative perception can recall more than 66%of the trajectory instead of 11%for single-point sensing.This could contribute to full-range surveillance for intelligent transportation systems.展开更多
A laser technique based scanning system was employed to make a comprehensive scanning through borehole forunmapped cavity under open pit bench,then the three-dimensional data will be obtained,and these data were used ...A laser technique based scanning system was employed to make a comprehensive scanning through borehole forunmapped cavity under open pit bench,then the three-dimensional data will be obtained,and these data were used for theoreticalanalysis and numerical simulation to analyze the stability of cap rock.Acoustic emission techniques were also adopted to carry outlong term real time rupture monitoring in cap rock.Therefore,a complete safety evaluation system for the cap rock was establishedto ensure safe operation of subsequent blasting processes.The ideal way of eliminating collapse hazard of such cavity is cap rockcaving through deep-hole blasting,thus,two deep-hole blasting schemes named as vertical deep-hole blasting scheme and one-timeraise driving integrated with deep-hole bench blasting scheme were proposed.The vertical deep-hole blasting scheme has moreexplosive consumption,but the relatively simple blasting net work structure can greatly reduce workloads.However,the one-timeraise driving integrated with deep-hole bench blasting scheme can obviously reduce explosive consumption,but the higher technicalrequirements on drilling,explosive charging and blasting network will increase workloads.展开更多
In recent years,autonomous driving technology has made good progress,but the noncooperative intelligence of vehicle for autonomous driving still has many technical bottlenecks when facing urban road autonomous driving...In recent years,autonomous driving technology has made good progress,but the noncooperative intelligence of vehicle for autonomous driving still has many technical bottlenecks when facing urban road autonomous driving challenges.V2I(Vehicle-to-Infrastructure)communication is a potential solution to enable cooperative intelligence of vehicles and roads.In this paper,the RGB-PVRCNN,an environment perception framework,is proposed to improve the environmental awareness of autonomous vehicles at intersections by leveraging V2I communication technology.This framework integrates vision feature based on PVRCNN.The normal distributions transform(NDT)point cloud registration algorithm is deployed both on onboard and roadside to obtain the position of the autonomous vehicles and to build the local map objects detected by roadside multi-sensor system are sent back to autonomous vehicles to enhance the perception ability of autonomous vehicles for benefiting path planning and traffic efficiency at the intersection.The field-testing results show that our method can effectively extend the environmental perception ability and range of autonomous vehicles at the intersection and outperform the PointPillar algorithm and the VoxelRCNN algorithm in detection accuracy.展开更多
Relation contexts have been proved to be useful for many challenging vision tasks.In the field of3D object detection,previous methods have been taking the advantage of context encoding,graph embedding,or explicit rela...Relation contexts have been proved to be useful for many challenging vision tasks.In the field of3D object detection,previous methods have been taking the advantage of context encoding,graph embedding,or explicit relation reasoning to extract relation contexts.However,there exist inevitably redundant relation contexts due to noisy or low-quality proposals.In fact,invalid relation contexts usually indicate underlying scene misunderstanding and ambiguity,which may,on the contrary,reduce the performance in complex scenes.Inspired by recent attention mechanism like Transformer,we propose a novel 3D attention-based relation module(ARM3D).It encompasses objectaware relation reasoning to extract pair-wise relation contexts among qualified proposals and an attention module to distribute attention weights towards different relation contexts.In this way,ARM3D can take full advantage of the useful relation contexts and filter those less relevant or even confusing contexts,which mitigates the ambiguity in detection.We have evaluated the effectiveness of ARM3D by plugging it into several state-of-the-art 3D object detectors and showing more accurate and robust detection results.Extensive experiments show the capability and generalization of ARM3D on 3D object detection.Our source code is available at https://github.com/lanlan96/ARM3D.展开更多
基金supported by a grant from the National Key Research and Development Project(2023YFB4302100)Key Research and Development Project of Jiangxi Province(No.20232ACE01011)Independent Deployment Project of Ganjiang Innovation Research Institute,Chinese Academy of Sciences(E255J001).
文摘Aiming at the limitations of the existing railway foreign object detection methods based on two-dimensional(2D)images,such as short detection distance,strong influence of environment and lack of distance information,we propose Rail-PillarNet,a three-dimensional(3D)LIDAR(Light Detection and Ranging)railway foreign object detection method based on the improvement of PointPillars.Firstly,the parallel attention pillar encoder(PAPE)is designed to fully extract the features of the pillars and alleviate the problem of local fine-grained information loss in PointPillars pillars encoder.Secondly,a fine backbone network is designed to improve the feature extraction capability of the network by combining the coding characteristics of LIDAR point cloud feature and residual structure.Finally,the initial weight parameters of the model were optimised by the transfer learning training method to further improve accuracy.The experimental results on the OSDaR23 dataset show that the average accuracy of Rail-PillarNet reaches 58.51%,which is higher than most mainstream models,and the number of parameters is 5.49 M.Compared with PointPillars,the accuracy of each target is improved by 10.94%,3.53%,16.96%and 19.90%,respectively,and the number of parameters only increases by 0.64M,which achieves a balance between the number of parameters and accuracy.
基金supported in part by the Major Project for New Generation of AI (2018AAA0100400)the National Natural Science Foundation of China (61836014,U21B2042,62072457,62006231)the InnoHK Program。
文摘Monocular 3D object detection is challenging due to the lack of accurate depth information.Some methods estimate the pixel-wise depth maps from off-the-shelf depth estimators and then use them as an additional input to augment the RGB images.Depth-based methods attempt to convert estimated depth maps to pseudo-LiDAR and then use LiDAR-based object detectors or focus on the perspective of image and depth fusion learning.However,they demonstrate limited performance and efficiency as a result of depth inaccuracy and complex fusion mode with convolutions.Different from these approaches,our proposed depth-guided vision transformer with a normalizing flows(NF-DVT)network uses normalizing flows to build priors in depth maps to achieve more accurate depth information.Then we develop a novel Swin-Transformer-based backbone with a fusion module to process RGB image patches and depth map patches with two separate branches and fuse them using cross-attention to exchange information with each other.Furthermore,with the help of pixel-wise relative depth values in depth maps,we develop new relative position embeddings in the cross-attention mechanism to capture more accurate sequence ordering of input tokens.Our method is the first Swin-Transformer-based backbone architecture for monocular 3D object detection.The experimental results on the KITTI and the challenging Waymo Open datasets show the effectiveness of our proposed method and superior performance over previous counterparts.
基金supported by the Financial Support of the Key Research and Development Projects of Anhui (202104a05020003)the Natural Science Foundation of Anhui Province (2208085MF173)the Anhui Development and Reform Commission Supports R&D and Innovation Projects ([2020]479).
文摘3D vehicle detection based on LiDAR-camera fusion is becoming an emerging research topic in autonomous driving.The algorithm based on the Camera-LiDAR object candidate fusion method(CLOCs)is currently considered to be a more effective decision-level fusion algorithm,but it does not fully utilize the extracted features of 3D and 2D.Therefore,we proposed a 3D vehicle detection algorithm based onmultimodal decision-level fusion.First,project the anchor point of the 3D detection bounding box into the 2D image,calculate the distance between 2D and 3D anchor points,and use this distance as a new fusion feature to enhance the feature redundancy of the network.Subsequently,add an attention module:squeeze-and-excitation networks,weight each feature channel to enhance the important features of the network,and suppress useless features.The experimental results show that the mean average precision of the algorithm in the KITTI dataset is 82.96%,which outperforms previous state-ofthe-art multimodal fusion-based methods,and the average accuracy in the Easy,Moderate and Hard evaluation indicators reaches 88.96%,82.60%,and 77.31%,respectively,which are higher compared to the original CLOCs model by 1.02%,2.29%,and 0.41%,respectively.Compared with the original CLOCs algorithm,our algorithm has higher accuracy and better performance in 3D vehicle detection.
基金The authors would like to thank the financial support of Natural Science Foundation of Anhui Province(No.2208085MF173)the key research and development projects of Anhui(202104a05020003)+2 种基金the anhui development and reform commission supports R&D and innovation project([2020]479)the national natural science foundation of China(51575001)Anhui university scientific research platform innovation team building project(2016-2018).
文摘In complex traffic environment scenarios,it is very important for autonomous vehicles to accurately perceive the dynamic information of other vehicles around the vehicle in advance.The accuracy of 3D object detection will be affected by problems such as illumination changes,object occlusion,and object detection distance.To this purpose,we face these challenges by proposing a multimodal feature fusion network for 3D object detection(MFF-Net).In this research,this paper first uses the spatial transformation projection algorithm to map the image features into the feature space,so that the image features are in the same spatial dimension when fused with the point cloud features.Then,feature channel weighting is performed using an adaptive expression augmentation fusion network to enhance important network features,suppress useless features,and increase the directionality of the network to features.Finally,this paper increases the probability of false detection and missed detection in the non-maximum suppression algo-rithm by increasing the one-dimensional threshold.So far,this paper has constructed a complete 3D target detection network based on multimodal feature fusion.The experimental results show that the proposed achieves an average accuracy of 82.60%on the Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI)dataset,outperforming previous state-of-the-art multimodal fusion networks.In Easy,Moderate,and hard evaluation indicators,the accuracy rate of this paper reaches 90.96%,81.46%,and 75.39%.This shows that the MFF-Net network has good performance in 3D object detection.
基金supported by the National Key Research and Development Program of China(2020YFB1807500)the National Natural Science Foundation of China(62072360,62001357,62172438,61901367)+4 种基金the key research and development plan of Shaanxi province(2021ZDLGY02-09,2023-GHZD-44,2023-ZDLGY-54)the Natural Science Foundation of Guangdong Province of China(2022A1515010988)Key Project on Artificial Intelligence of Xi'an Science and Technology Plan(2022JH-RGZN-0003,2022JH-RGZN-0103,2022JH-CLCJ-0053)Xi'an Science and Technology Plan(20RGZN0005)the Proof-ofconcept fund from Hangzhou Research Institute of Xidian University(GNYZ2023QC0201).
文摘The high bandwidth and low latency of 6G network technology enable the successful application of monocular 3D object detection on vehicle platforms.Monocular 3D-object-detection-based Pseudo-LiDAR is a low-cost,lowpower solution compared to LiDAR solutions in the field of autonomous driving.However,this technique has some problems,i.e.,(1)the poor quality of generated Pseudo-LiDAR point clouds resulting from the nonlinear error distribution of monocular depth estimation and(2)the weak representation capability of point cloud features due to the neglected global geometric structure features of point clouds existing in LiDAR-based 3D detection networks.Therefore,we proposed a Pseudo-LiDAR confidence sampling strategy and a hierarchical geometric feature extraction module for monocular 3D object detection.We first designed a point cloud confidence sampling strategy based on a 3D Gaussian distribution to assign small confidence to the points with great error in depth estimation and filter them out according to the confidence.Then,we present a hierarchical geometric feature extraction module by aggregating the local neighborhood features and a dual transformer to capture the global geometric features in the point cloud.Finally,our detection framework is based on Point-Voxel-RCNN(PV-RCNN)with high-quality Pseudo-LiDAR and enriched geometric features as input.From the experimental results,our method achieves satisfactory results in monocular 3D object detection.
基金This work was supported,in part,by the National Nature Science Foundation of China under grant numbers 62272236in part,by the Natural Science Foundation of Jiangsu Province under grant numbers BK20201136,BK20191401in part,by the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)fund.
文摘LIDAR point cloud-based 3D object detection aims to sense the surrounding environment by anchoring objects with the Bounding Box(BBox).However,under the three-dimensional space of autonomous driving scenes,the previous object detection methods,due to the pre-processing of the original LIDAR point cloud into voxels or pillars,lose the coordinate information of the original point cloud,slow detection speed,and gain inaccurate bounding box positioning.To address the issues above,this study proposes a new two-stage network structure to extract point cloud features directly by PointNet++,which effectively preserves the original point cloud coordinate information.To improve the detection accuracy,a shell-based modeling method is proposed.It roughly determines which spherical shell the coordinates belong to.Then,the results are refined to ground truth,thereby narrowing the localization range and improving the detection accuracy.To improve the recall of 3D object detection with bounding boxes,this paper designs a self-attention module for 3D object detection with a skip connection structure.Some of these features are highlighted by weighting them on the feature dimensions.After training,it makes the feature weights that are favorable for object detection get larger.Thus,the extracted features are more adapted to the object detection task.Extensive comparison experiments and ablation experiments conducted on the KITTI dataset verify the effectiveness of our proposed method in improving recall and precision.
文摘Light detection and ranging(LiDAR)sensors play a vital role in acquiring 3D point cloud data and extracting valuable information about objects for tasks such as autonomous driving,robotics,and virtual reality(VR).However,the sparse and disordered nature of the 3D point cloud poses significant challenges to feature extraction.Overcoming limitations is critical for 3D point cloud processing.3D point cloud object detection is a very challenging and crucial task,in which point cloud processing and feature extraction methods play a crucial role and have a significant impact on subsequent object detection performance.In this overview of outstanding work in object detection from the 3D point cloud,we specifically focus on summarizing methods employed in 3D point cloud processing.We introduce the way point clouds are processed in classical 3D object detection algorithms,and their improvements to solve the problems existing in point cloud processing.Different voxelization methods and point cloud sampling strategies will influence the extracted features,thereby impacting the final detection performance.
基金Project(50490274) supported by the National Natural Science Foundation of China
文摘Ground constructions and mines are severely threatened by ones. Safe and precise cavity detection is vital for reasonable cavity underground cavities especially those unsafe or inaccessible evaluation and disposal. The conventional cavity detection methods and their limitation were analyzed. Those methods cannot form 3D model of underground cavity which is used for instructing the cavity disposal; and their precisions in detection are always greatly affected by the geological circumstance. The importance of 3D cavity detection in metal mine for safe exploitation was pointed out; and the 3D cavity laser detection method and its principle were introduced. A cavity auto scanning laser system was recommended to actualize the cavity 3D detection after comparing with the other laser detection systems. Four boreholes were chosen to verify the validity of the cavity auto scanning laser system. The results show that the cavity auto scanning laser system is very suitable for underground 3D cavity detection, especially for those inaccessible ones.
基金supported by the Center for Energy and Geo Processing(CeGP) at King Fahd University of Petroleum&Minerals(KFUPM),under Project no.GTEC 1401-1402
文摘Accurate salt dome detection from 3D seismic data is crucial to different seismic data analysis applications. We present a new edge based approach for salt dome detection in migrated 3D seismic data. The proposed algorithm overcomes the drawbacks of existing edge-based techniques which only consider edges in the x (crossline) and y (inline) directions in 2D data and the x (crossline), y (inline), and z (time) directions in 3D data. The algorithm works by combining 3D gradient maps computed along diagonal directions and those computed in x, y, and z directions to accurately detect the boundaries of salt regions. The combination of x, y, and z directions and diagonal edges ensures that the proposed algorithm works well even if the dips along the salt boundary are represented only by weak reflectors. Contrary to other edge and texture based salt dome detection techniques, the proposed algorithm is independent of the amplitude variations in seismic data. We tested the proposed algorithm on the publicly available Netherlands offshore F3 block. The results suggest that the proposed algorithm can detect salt bodies with high accuracy than existing gradient based and texture-based techniques when used separately. More importantly, the proposed approach is shown to be computationally efficient allowing for real time implementation and deployment.
基金National Natural Science Foundation of China(No.61806006)Innovation Program for Graduate of Jiangsu Province(No.KYLX160-781)University Superior Discipline Construction Project of Jiangsu Province。
文摘The self-attention networks and Transformer have dominated machine translation and natural language processing fields,and shown great potential in image vision tasks such as image classification and object detection.Inspired by the great progress of Transformer,we propose a novel general and robust voxel feature encoder for 3D object detection based on the traditional Transformer.We first investigate the permutation invariance of sequence data of the self-attention and apply it to point cloud processing.Then we construct a voxel feature layer based on the self-attention to adaptively learn local and robust context of a voxel according to the spatial relationship and context information exchanging between all points within the voxel.Lastly,we construct a general voxel feature learning framework with the voxel feature layer as the core for 3D object detection.The voxel feature with Transformer(VFT)can be plugged into any other voxel-based 3D object detection framework easily,and serves as the backbone for voxel feature extractor.Experiments results on the KITTI dataset demonstrate that our method achieves the state-of-the-art performance on 3D object detection.
基金The MKE(Ministry of Knowledge Economy),Korea,under the ITRC(Information Technology Research Center)support program(NIPA-2013-H0301-13-2006)supervised by the NIPA(National IT Industry Promotion Agency)The National Research Foundation of Korea(NRF)grant funded by the Korea government(MEST)(2013-029812)The MKE(Ministry of Knowledge Economy),Korea,under the Human Resources Development Program for Convergence Robot Specialists support program supervised by the NIPA(NIPA-2013-H1502-13-1001)
文摘Obstacle detection is essential for mobile robots to avoid collision with obstacles.Mobile robots usually operate in indoor environments,where they encounter various kinds of obstacles;however,2D range sensor can sense obstacles only in 2D plane.In contrast,by using 3D range sensor,it is possible to detect ground and aerial obstacles that 2D range sensor cannot sense.In this paper,we present a 3D obstacle detection method that will help overcome the limitations of 2D range sensor with regard to obstacle detection.The indoor environment typically consists of a flat floor.The position of the floor can be determined by estimating the plane using the least squares method.Having determined the position of the floor,the points of obstacles can be known by rejecting the points of the floor.In the experimental section,we show the results of this approach using a Kinect sensor.
基金supported by the National Natural Science Foundation of China (Grant No. 61803004)Aeronautical Science Foundation of China (Grant No. 20161375002)。
文摘Point clouds and RGB images are both critical data for 3D object detection. While recent multi-modal methods combine them directly and show remarkable performances, they ignore the distinct forms of these two types of data. For mitigating the influence of this intrinsic difference on performance, we propose a novel but effective fusion model named LI-Attention model, which takes both RGB features and point cloud features into consideration and assigns a weight to each RGB feature by attention mechanism.Furthermore, based on the LI-Attention model, we propose a 3D object detection method called image attention transformer network(IAT-Net) specialized for indoor RGB-D scene. Compared with previous work on multi-modal detection, IAT-Net fuses elaborate RGB features from 2D detection results with point cloud features in attention mechanism, meanwhile generates and refines 3D detection results with transformer model. Extensive experiments demonstrate that our approach outperforms stateof-the-art performance on two widely used benchmarks of indoor 3D object detection, SUN RGB-D and NYU Depth V2, while ablation studies have been provided to analyze the effect of each module. And the source code for the proposed IAT-Net is publicly available at https://github.com/wisper181/IAT-Net.
基金National Natural Science Foundation of China(No.51805312)in part by Shanghai Sailing Program(No.18YF1409400)+4 种基金in part by Training and Funding Program of Shanghai College young teachers(No.ZZGCD15102)in part by Scientific Research Project of Shanghai University of Engineering Science(No.2016-19)in part by Science and Technology Commission of Shanghai Municipality(No.19030501100)in part by the Shanghai University of Engineering Science Innovation Fund for Graduate Students(No.18KY0613)in part by National Key R&D Program of China(No.2016YFC0802900).
文摘Road accident detection plays an important role in abnormal scene reconstruction for Intelligent Transportation Systems and abnormal events warning for autonomous driving.This paper presents a novel 3D object detector and adaptive space partitioning algorithm to infer traffic accidents quantitatively.Using 2D region proposals in an RGB image,this method generates deformable frustums based on point cloud for each 2D region proposal and then frustum-wisely extracts features based on the farthest point sampling network(FPS-Net)and feature extraction network(FE-Net).Subsequently,the encoder-decoder network(ED-Net)implements 3D-oriented bounding box(OBB)regression.Meanwhile,the adaptive least square regression(ALSR)method is proposed to split 3D OBB.Finally,the reduced OBB intersection test is carried out to detect traffic accidents via separating surface theorem(SST).In the experiments of KITTI benchmark,our proposed 3D object detector outperforms other state-of-theartmethods.Meanwhile,collision detection algorithm achieves the satisfactory performance of 91.8%accuracy on our SHTA dataset.
基金National Youth Natural Science Foundation of China(No.61806006)Innovation Program for Graduate of Jiangsu Province(No.KYLX160-781)Jiangsu University Superior Discipline Construction Project。
文摘In order to solve difficult detection of far and hard objects due to the sparseness and insufficient semantic information of LiDAR point cloud,a 3D object detection network with multi-modal data adaptive fusion is proposed,which makes use of multi-neighborhood information of voxel and image information.Firstly,design an improved ResNet that maintains the structure information of far and hard objects in low-resolution feature maps,which is more suitable for detection task.Meanwhile,semantema of each image feature map is enhanced by semantic information from all subsequent feature maps.Secondly,extract multi-neighborhood context information with different receptive field sizes to make up for the defect of sparseness of point cloud which improves the ability of voxel features to represent the spatial structure and semantic information of objects.Finally,propose a multi-modal feature adaptive fusion strategy which uses learnable weights to express the contribution of different modal features to the detection task,and voxel attention further enhances the fused feature expression of effective target objects.The experimental results on the KITTI benchmark show that this method outperforms VoxelNet with remarkable margins,i.e.increasing the AP by 8.78%and 5.49%on medium and hard difficulty levels.Meanwhile,our method achieves greater detection performance compared with many mainstream multi-modal methods,i.e.outperforming the AP by 1%compared with that of MVX-Net on medium and hard difficulty levels.
基金This project is supported by National Natural Science Foundation of China (No.50175024)Provincial Program for Young Teacher of Colleges and Universities of Anhui(No.2005jql019)Provincial Research Foundation of Key Laboratory of Anhui.
文摘Compared with the traditional scanning confocal microscopy, the effect of various factors on characteristic in multi-beam parallel confocal system is discussed, the error factors in multi-beam parallel confocal system are analyzed. The factors influencing the characteristics of the multi-beam parallel confocal system are discussed. The construction and working principle of the non-scanning 3D detecting system is introduced, and some experiment results prove the effect of various factors on the detecting system.
文摘Within today's product development process, various FE-simulations (finite element) for the functional validation of the desired characteristics are made to avoid expensive testing with real components. Those simulations are performed with great effort for discretization, use of simulations conditions, like taking different non-linearities (i.e., material behavior, etc.) into account, to create meaningful results. Despite knowing the effects of deformations occurring during the production processes, always the non-deformed design model of a CAD-system (computer aided design) is used for the FE-simulations. It seems rather doubtful that further refinement of simulation methods makes sense, if the real manufactured geometry of the component is not considered for in the simulation. For an efficient exploit of the potential of simulation methods, an approach has been developed which offers a geometry model for simulation based on the existing CAD-model but with integrated production deviations as soon as a first prototype is at hand by adapting the FE-mesh to the real, 3D surface detected geometry.
基金the National Natural Science Foundation of China(No.61873167)the Automotive Industry Science and Technology Development Foundation of Shanghai(No.1904)。
文摘Compared to 3D object detection using a single camera,multiple cameras can overcome some limitations on field-of-view,occlusion,and low detection confidence.This study employs multiple surveillance cameras and develops a cooperative 3D object detection and tracking framework by incorporating temporal and spatial information.The framework consists of a 3D vehicle detection model,cooperatively spatial-temporal relation scheme,and heuristic camera constellation method.Specifically,the proposed cross-camera association scheme combines the geometric relationship between multiple cameras and objects in corresponding detections.The spatial-temporal method is designed to associate vehicles between different points of view at a single timestamp and fulfill vehicle tracking in the time aspect.The proposed framework is evaluated based on a synthetic cooperative dataset and shows high reliability,where the cooperative perception can recall more than 66%of the trajectory instead of 11%for single-point sensing.This could contribute to full-range surveillance for intelligent transportation systems.
基金Projects(51204206,41272304,41372278) supported by the National Natural Science Foundation of China
文摘A laser technique based scanning system was employed to make a comprehensive scanning through borehole forunmapped cavity under open pit bench,then the three-dimensional data will be obtained,and these data were used for theoreticalanalysis and numerical simulation to analyze the stability of cap rock.Acoustic emission techniques were also adopted to carry outlong term real time rupture monitoring in cap rock.Therefore,a complete safety evaluation system for the cap rock was establishedto ensure safe operation of subsequent blasting processes.The ideal way of eliminating collapse hazard of such cavity is cap rockcaving through deep-hole blasting,thus,two deep-hole blasting schemes named as vertical deep-hole blasting scheme and one-timeraise driving integrated with deep-hole bench blasting scheme were proposed.The vertical deep-hole blasting scheme has moreexplosive consumption,but the relatively simple blasting net work structure can greatly reduce workloads.However,the one-timeraise driving integrated with deep-hole bench blasting scheme can obviously reduce explosive consumption,but the higher technicalrequirements on drilling,explosive charging and blasting network will increase workloads.
基金This research was supported by the National Key Research and Development Program of China under Grant No.2017YFB0102502the Beijing Municipal Natural Science Foundation No.L191001+2 种基金the National Natural Science Foundation of China under Grant No.61672082 and 61822101the Newton Advanced Fellowship under Grant No.62061130221the Young Elite Scientists Sponsorship Program by Hunan Provincial Department of Education under Grant No.18B142.
文摘In recent years,autonomous driving technology has made good progress,but the noncooperative intelligence of vehicle for autonomous driving still has many technical bottlenecks when facing urban road autonomous driving challenges.V2I(Vehicle-to-Infrastructure)communication is a potential solution to enable cooperative intelligence of vehicles and roads.In this paper,the RGB-PVRCNN,an environment perception framework,is proposed to improve the environmental awareness of autonomous vehicles at intersections by leveraging V2I communication technology.This framework integrates vision feature based on PVRCNN.The normal distributions transform(NDT)point cloud registration algorithm is deployed both on onboard and roadside to obtain the position of the autonomous vehicles and to build the local map objects detected by roadside multi-sensor system are sent back to autonomous vehicles to enhance the perception ability of autonomous vehicles for benefiting path planning and traffic efficiency at the intersection.The field-testing results show that our method can effectively extend the environmental perception ability and range of autonomous vehicles at the intersection and outperform the PointPillar algorithm and the VoxelRCNN algorithm in detection accuracy.
基金National Nature Science Foundation of China(62132021,62102435,62002375,62002376)National Key R&D Program of China(2018AAA0102200)NUDT Research Grants(ZK19-30)。
文摘Relation contexts have been proved to be useful for many challenging vision tasks.In the field of3D object detection,previous methods have been taking the advantage of context encoding,graph embedding,or explicit relation reasoning to extract relation contexts.However,there exist inevitably redundant relation contexts due to noisy or low-quality proposals.In fact,invalid relation contexts usually indicate underlying scene misunderstanding and ambiguity,which may,on the contrary,reduce the performance in complex scenes.Inspired by recent attention mechanism like Transformer,we propose a novel 3D attention-based relation module(ARM3D).It encompasses objectaware relation reasoning to extract pair-wise relation contexts among qualified proposals and an attention module to distribute attention weights towards different relation contexts.In this way,ARM3D can take full advantage of the useful relation contexts and filter those less relevant or even confusing contexts,which mitigates the ambiguity in detection.We have evaluated the effectiveness of ARM3D by plugging it into several state-of-the-art 3D object detectors and showing more accurate and robust detection results.Extensive experiments show the capability and generalization of ARM3D on 3D object detection.Our source code is available at https://github.com/lanlan96/ARM3D.