Multi-view multi-person 3D human pose estimation is a hot topic in the field of human pose estimation due to its wide range of application scenarios.With the introduction of end-to-end direct regression methods,the fi...Multi-view multi-person 3D human pose estimation is a hot topic in the field of human pose estimation due to its wide range of application scenarios.With the introduction of end-to-end direct regression methods,the field has entered a new stage of development.However,the regression results of joints that are more heavily influenced by external factors are not accurate enough even for the optimal method.In this paper,we propose an effective feature recalibration module based on the channel attention mechanism and a relative optimal calibration strategy,which is applied to themulti-viewmulti-person 3D human pose estimation task to achieve improved detection accuracy for joints that are more severely affected by external factors.Specifically,it achieves relative optimal weight adjustment of joint feature information through the recalibration module and strategy,which enables the model to learn the dependencies between joints and the dependencies between people and their corresponding joints.We call this method as the Efficient Recalibration Network(ER-Net).Finally,experiments were conducted on two benchmark datasets for this task,Campus and Shelf,in which the PCP reached 97.3% and 98.3%,respectively.展开更多
Monocular depth estimation is the basic task in computer vision.Its accuracy has tremendous improvement in the decade with the development of deep learning.However,the blurry boundary in the depth map is a serious pro...Monocular depth estimation is the basic task in computer vision.Its accuracy has tremendous improvement in the decade with the development of deep learning.However,the blurry boundary in the depth map is a serious problem.Researchers find that the blurry boundary is mainly caused by two factors.First,the low-level features,containing boundary and structure information,may be lost in deep networks during the convolution process.Second,themodel ignores the errors introduced by the boundary area due to the few portions of the boundary area in the whole area,during the backpropagation.Focusing on the factors mentioned above.Two countermeasures are proposed to mitigate the boundary blur problem.Firstly,we design a scene understanding module and scale transformmodule to build a lightweight fuse feature pyramid,which can deal with low-level feature loss effectively.Secondly,we propose a boundary-aware depth loss function to pay attention to the effects of the boundary’s depth value.Extensive experiments show that our method can predict the depth maps with clearer boundaries,and the performance of the depth accuracy based on NYU-Depth V2,SUN RGB-D,and iBims-1 are competitive.展开更多
3D human pose estimation is a major focus area in the field of computer vision,which plays an important role in practical applications.This article summarizes the framework and research progress related to the estimat...3D human pose estimation is a major focus area in the field of computer vision,which plays an important role in practical applications.This article summarizes the framework and research progress related to the estimation of monocular RGB images and videos.An overall perspective ofmethods integrated with deep learning is introduced.Novel image-based and video-based inputs are proposed as the analysis framework.From this viewpoint,common problems are discussed.The diversity of human postures usually leads to problems such as occlusion and ambiguity,and the lack of training datasets often results in poor generalization ability of the model.Regression methods are crucial for solving such problems.Considering image-based input,the multi-view method is commonly used to solve occlusion problems.Here,the multi-view method is analyzed comprehensively.By referring to video-based input,the human prior knowledge of restricted motion is used to predict human postures.In addition,structural constraints are widely used as prior knowledge.Furthermore,weakly supervised learningmethods are studied and discussed for these two types of inputs to improve the model generalization ability.The problem of insufficient training datasets must also be considered,especially because 3D datasets are usually biased and limited.Finally,emerging and popular datasets and evaluation indicators are discussed.The characteristics of the datasets and the relationships of the indicators are explained and highlighted.Thus,this article can be useful and instructive for researchers who are lacking in experience and find this field confusing.In addition,by providing an overview of 3D human pose estimation,this article sorts and refines recent studies on 3D human pose estimation.It describes kernel problems and common useful methods,and discusses the scope for further research.展开更多
A method of source depth estimation based on the multi-path time delay difference is proposed. When the minimum time arrivals in all receiver depths are snapped to a certain time on time delay-depth plane, time delay ...A method of source depth estimation based on the multi-path time delay difference is proposed. When the minimum time arrivals in all receiver depths are snapped to a certain time on time delay-depth plane, time delay arrivals of surface-bottom reflection and bottom-surface reflection intersect at the source depth. Two hydrophones deployed vertically with a certain interval are required at least. If the receiver depths are known, the pair of time delays can be used to estimate the source depth. With the proposed method the source depth can be estimated successfully in a moderate range in the deep ocean without complicated matched-field calculations in the simulations and experiments.展开更多
Mineral exploration is done by different methods. Geophysical and geochemical studies are two powerful tools in this field. In integrated studies, the results of each study are used to determine the location of the dr...Mineral exploration is done by different methods. Geophysical and geochemical studies are two powerful tools in this field. In integrated studies, the results of each study are used to determine the location of the drilling boreholes. The purpose of this study is to use field geophysics to calculate the depth of mineral reserve. The study area is located 38 km from Zarand city called Jalalabad iron mine. In this study, gravimetric data were measured and mineral depth was calculated using the Euler method. 1314 readings have been performed in this area. The rocks of the region include volcanic and sedimentary. The source of the mineralization in the area is hydrothermal processes. After gravity measuring in the region, the data were corrected, then various methods such as anomalous map remaining in levels one and two, upward expansion, first and second-degree vertical derivatives, analytical method, and analytical signal were drawn, and finally, the depth of the deposit was estimated by Euler method. As a result, the depth of the mineral deposit was calculated to be between 20 and 30 meters on average.展开更多
Depth estimation is an important task in computer vision.Collecting data at scale for monocular depth estimation is challenging,as this task requires simultaneously capturing RGB images and depth information.Therefore...Depth estimation is an important task in computer vision.Collecting data at scale for monocular depth estimation is challenging,as this task requires simultaneously capturing RGB images and depth information.Therefore,data augmentation is crucial for this task.Existing data augmentationmethods often employ pixel-wise transformations,whichmay inadvertently disrupt edge features.In this paper,we propose a data augmentationmethod formonocular depth estimation,which we refer to as the Perpendicular-Cutdepth method.This method involves cutting realworld depth maps along perpendicular directions and pasting them onto input images,thereby diversifying the data without compromising edge features.To validate the effectiveness of the algorithm,we compared it with existing convolutional neural network(CNN)against the current mainstream data augmentation algorithms.Additionally,to verify the algorithm’s applicability to Transformer networks,we designed an encoder-decoder network structure based on Transformer to assess the generalization of our proposed algorithm.Experimental results demonstrate that,in the field of monocular depth estimation,our proposed method,Perpendicular-Cutdepth,outperforms traditional data augmentationmethods.On the indoor dataset NYU,our method increases accuracy from0.900 to 0.907 and reduces the error rate from0.357 to 0.351.On the outdoor dataset KITTI,our method improves accuracy from 0.9638 to 0.9642 and decreases the error rate from 0.060 to 0.0598.展开更多
Depth estimation is an active research area with the developing of stereo vision in recent years. It is one of the key technologies to resolve the large data of stereo vision communication. Now depth estimation still ...Depth estimation is an active research area with the developing of stereo vision in recent years. It is one of the key technologies to resolve the large data of stereo vision communication. Now depth estimation still has some problems, such as occlusion, fuzzy edge, real-time processing, etc. Many algorithms have been proposed base on software, however the performance of the computer configurations limits the software processing speed. The other resolution is hardware design and the great developments of the digital signal processor (DSP), and application specific integrated circuit (ASIC) and field programmable gate array (FPGA) provide the opportunity of flexible applications. In this work, by analyzing the procedures of depth estimation, the proper algorithms which can be used in hardware design to execute real-time depth estimation are proposed. The different methods of calibration, matching and post-processing are analyzed based on the hardware design requirements. At last some tests for the algorithm have been analyzed. The results show that the algorithms proposed for hardware design can provide credited depth map for further view synthesis and are suitable for hardware design.展开更多
Learning-based multi-task models have been widely used in various scene understanding tasks,and complement each other,i.e.,they allow us to consider prior semantic information to better infer depth.We boost the unsupe...Learning-based multi-task models have been widely used in various scene understanding tasks,and complement each other,i.e.,they allow us to consider prior semantic information to better infer depth.We boost the unsupervised monocular depth estimation using semantic segmentation as an auxiliary task.To address the lack of cross-domain datasets and catastrophic forgetting problems encountered in multi-task training,we utilize existing methodology to obtain redundant segmentation maps to build our cross-domain dataset,which not only provides a new way to conduct multi-task training,but also helps us to evaluate results compared with those of other algorithms.In addition,in order to comprehensively use the extracted features of the two tasks in the early perception stage,we use a strategy of sharing weights in the network to fuse cross-domain features,and introduce a novel multi-task loss function to further smooth the depth values.Extensive experiments on KITTI and Cityscapes datasets show that our method has achieved state-of-the-art performance in the depth estimation task,as well improved semantic segmentation.展开更多
Most approaches to estimate a scene’s 3D depth from a single image often model the point spread function (PSF) as a 2D Gaussian function. However, those method<span>s</span><span> are suffered ...Most approaches to estimate a scene’s 3D depth from a single image often model the point spread function (PSF) as a 2D Gaussian function. However, those method<span>s</span><span> are suffered from some noises, and difficult to get a high quality of depth recovery. We presented a simple yet effective approach to estimate exactly the amount of spatially varying defocus blur at edges, based on </span><span>a</span><span> Cauchy distribution model for the PSF. The raw image was re-blurred twice using two known Cauchy distribution kernels, and the defocus blur amount at edges could be derived from the gradient ratio between the two re-blurred images. By propagating the blur amount at edge locations to the entire image using the matting interpolation, a full depth map was then recovered. Experimental results on several real images demonstrated both feasibility and effectiveness of our method, being a non-Gaussian model for DSF, in providing a better estimation of the defocus map from a single un-calibrated defocused image. These results also showed that our method </span><span>was</span><span> robust to image noises, inaccurate edge location and interferences of neighboring edges. It could generate more accurate scene depth maps than the most of existing methods using a Gaussian based DSF model.</span>展开更多
Background Monocular depth estimation aims to predict a dense depth map from a single RGB image,and has important applications in 3D reconstruction,automatic driving,and augmented reality.However,existing methods dire...Background Monocular depth estimation aims to predict a dense depth map from a single RGB image,and has important applications in 3D reconstruction,automatic driving,and augmented reality.However,existing methods directly feed the original RGB image into the model to extract depth features without avoiding the interference of depth-irrelevant information on depth-estimation accuracy,which leads to inferior performance.Methods To remove the influence of depth-irrelevant information and improve the depth-prediction accuracy,we propose RADepthNet,a novel reflectance-guided network that fuses boundary features.Specifically,our method predicts depth maps using the following three steps:(1)Intrinsic Image Decomposition.We propose a reflectance extraction module consisting of an encoder-decoder structure to extract the depth-related reflectance.Through an ablation study,we demonstrate that the module can reduce the influence of illumination on depth estimation.(2)Boundary Detection.A boundary extraction module,consisting of an encoder,refinement block,and upsample block,was proposed to better predict the depth at object boundaries utilizing gradient constraints.(3)Depth Prediction Module.We use an encoder different from(2)to obtain depth features from the reflectance map and fuse boundary features to predict depth.In addition,we proposed FIFADataset,a depth-estimation dataset applied in soccer scenarios.Results Extensive experiments on a public dataset and our proposed FIFADataset show that our method achieves state-of-the-art performance.展开更多
The estimation of fish mass is one of the most basic and important tasks in aquaculture.Acquiring the mass of fish at different growth stages is of great significance for feeding,monitoring the health status of fish,a...The estimation of fish mass is one of the most basic and important tasks in aquaculture.Acquiring the mass of fish at different growth stages is of great significance for feeding,monitoring the health status of fish,and making breeding plans to increase production.The existing estimation methods for fish mass often stay in the 2D plane,and it is difficult to obtain the 3D information on fish,which will lead to the error.To solve this problem,a multi-view method was proposed to obtain the 3D information of fish and predict the mass of fish through a two-stage neural network with an edge-sensitive module.In the first stage,the side-and downward-view images of the fish and some 3D information,such as side area,top area,length,deflection angle,and pitch angle,were captured to estimate the size of the fish through two vertically placed cameras.Then the area of the fish at different views was estimated accurately through the pre-trained image segmentation neural network with an edgesensitive module.In the second stage,a fully connected neural network was constructed to regress the fish mass based on the 3D information obtained in the previous stage.The experimental results indicate that the proposed method can accurately estimate the fish mass and outperform the existing estimation methods.展开更多
Remarkable progress has been made in self-supervised monocular depth estimation (SS-MDE) by exploring cross-view consistency, e.g., photometric consistency and 3D point cloud consistency. However, they are very vulner...Remarkable progress has been made in self-supervised monocular depth estimation (SS-MDE) by exploring cross-view consistency, e.g., photometric consistency and 3D point cloud consistency. However, they are very vulnerable to illumination variance, occlusions, texture-less regions, as well as moving objects, making them not robust enough to deal with various scenes. To address this challenge, we study two kinds of robust cross-view consistency in this paper. Firstly, the spatial offset field between adjacent frames is obtained by reconstructing the reference frame from its neighbors via deformable alignment, which is used to align the temporal depth features via a depth feature alignment (DFA) loss. Secondly, the 3D point clouds of each reference frame and its nearby frames are calculated and transformed into voxel space, where the point density in each voxel is calculated and aligned via a voxel density alignment (VDA) loss. In this way, we exploit the temporal coherence in both depth feature space and 3D voxel space for SS-MDE, shifting the “point-to-point” alignment paradigm to the “region-to-region” one. Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges. Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques. Extensive ablation study and analysis validate the effectiveness of the proposed losses, especially in challenging scenes. The code and models are available at https://github.com/sunnyHelen/RCVC-depth.展开更多
The law of variation of deep rock stress in gravitational and tectonic stress fields is analyzed based on the Hoek-Brown strength criterion. In the gravitational stress field,the rocks in the shallow area are in an el...The law of variation of deep rock stress in gravitational and tectonic stress fields is analyzed based on the Hoek-Brown strength criterion. In the gravitational stress field,the rocks in the shallow area are in an elastic state and the deep,relatively soft rock may be in a plastic state. However,in the tectonic stress field,the relatively soft rock in the shallow area is in a plastic state and the deep rock in an elastic state. A method is proposed to estimate stress values in coal and soft rock based on in-situ measurements of hard rock. Our estimation method relates to the type of stress field and stress state. The equations of rock stress in various stress states are presented for the elastic,plastic and critical states. The critical state is a special stress state,which indicates the conversion of the elastic to the plastic state in the gravitational stress field and the conversion of the plastic to the elastic state in the tectonic stress field. Two cases stud-ies show that the estimation method is feasible.展开更多
Identifying underground utilities and predicting their depth are fundamental when it comes to civil engineering excavations, for example, to install or repair water, sewer, gas, electric systems and others. The accide...Identifying underground utilities and predicting their depth are fundamental when it comes to civil engineering excavations, for example, to install or repair water, sewer, gas, electric systems and others. The accidental rupture of these systems can lead to unplanned repair costs, delays in completing the service, and risk injury or death of workers. One way to detect underground utilities is using the GPR-Ground Penetrating Radar geophysical method. To estimate depth, the travel time (two-way travel time) information provided by a radargram is used in conjunction with ground wave velocity, which depends on the dielectric constant of materials, where it is usually assumed to be constant for the area under investigation. This procedure provides satisfactory results in most cases. However, wrong depth estimates can result in damage to public utilities, rupturing pipes, cutting lines and so on. These cases occur mainly in areas that have a marked variation of water content and/or soil lithology, thus greater care is required to determine the depth of the targets. The present work demonstrates how the interval velocity of Dix (1955) can be applied in radargram to estimate the depth of underground utilities compared to the conventional technique of constant velocity applied to the same data set. To accomplish this, synthetic and real GPR data were used to verify the applicability of the interval velocity technique and to determine the accuracy of the depth estimates obtained. The studies were carried out at the IAG/USP test site, a controlled environment, where metallic drums are buried in known positions and depths allowing the comparison of real to estimated depths. Numerical studies were also carried out aiming to simulate the real environment with variation of dielectric constant in depth and to validate the results with real data. The results showed that the depths of the targets were estimated more accurately by means of the interval velocity technique in contrast to the constant velocity technique, minimizing the risks of accidents during excavation.展开更多
Light field cameras have a wide area of applications, such as digital refocusing, scene depth information extraction and 3-D image reconstruction. By recording the energy and direction information of light field, they...Light field cameras have a wide area of applications, such as digital refocusing, scene depth information extraction and 3-D image reconstruction. By recording the energy and direction information of light field, they can well solve many technical problems that cannot be done by conventional cameras. An important feature of light field cameras is that a microlens array is inserted between the sensor and main lens, through which a series of sub-aperture images of different perspectives are formed. Based on this feature and the full-focus image acquisition technique, we propose a light-field optical flow calculation algorithm, which involves both the depth estimation and the occlusion detection and guarantees the edge-preserving property. This algorithm consists of three steps: 1) Computing the dense optical flow field among a group of sub-aperture images;2) Obtaining a robust depth-estimation by initializing the light-filed optical flow using the linear regression approach and detecting occluded areas using the consistency;3) Computing an improved light-field depth map by using the edge-preserving algorithm to realize interpolation optimization. The reliability and high accuracy of the proposed approach is validated by experimental results.展开更多
Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models.However,alongside th...Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models.However,alongside the advantages,depth-sensing also presents many practical challenges.For instance,the depth sensors impose an additional payload burden on the robotic inspection platforms limiting the operation time and increasing the inspection cost.Additionally,some lidar-based depth sensors have poor outdoor performance due to sunlight contamination during the daytime.In this context,this study investigates the feasibility of abolishing depth-sensing at test time without compromising the segmentation performance.An autonomous damage segmentation framework is developed,based on recent advancements in vision-based multi-modal sensing such as modality hallucination(MH)and monocular depth estimation(MDE),which require depth data only during the model training.At the time of deployment,depth data becomes expendable as it can be simulated from the corresponding RGB frames.This makes it possible to reap the benefits of depth fusion without any depth perception per se.This study explored two different depth encoding techniques and three different fusion strategies in addition to a baseline RGB-based model.The proposed approach is validated on computer-generated RGB-D data of reinforced concrete buildings subjected to seismic damage.It was observed that the surrogate techniques can increase the segmentation IoU by up to 20.1%with a negligible increase in the computation cost.Overall,this study is believed to make a positive contribution to enhancing the resilience of critical civil infrastructure.展开更多
基金supported in part by the Key Program of NSFC (Grant No.U1908214)Special Project of Central Government Guiding Local Science and Technology Development (Grant No.2021JH6/10500140)+3 种基金Program for the Liaoning Distinguished Professor,Program for Innovative Research Team in University of Liaoning Province (LT2020015)Dalian (2021RT06)and Dalian University (XLJ202010)the Science and Technology Innovation Fund of Dalian (Grant No.2020JJ25CY001)Dalian University Scientific Research Platform Project (No.202101YB03).
文摘Multi-view multi-person 3D human pose estimation is a hot topic in the field of human pose estimation due to its wide range of application scenarios.With the introduction of end-to-end direct regression methods,the field has entered a new stage of development.However,the regression results of joints that are more heavily influenced by external factors are not accurate enough even for the optimal method.In this paper,we propose an effective feature recalibration module based on the channel attention mechanism and a relative optimal calibration strategy,which is applied to themulti-viewmulti-person 3D human pose estimation task to achieve improved detection accuracy for joints that are more severely affected by external factors.Specifically,it achieves relative optimal weight adjustment of joint feature information through the recalibration module and strategy,which enables the model to learn the dependencies between joints and the dependencies between people and their corresponding joints.We call this method as the Efficient Recalibration Network(ER-Net).Finally,experiments were conducted on two benchmark datasets for this task,Campus and Shelf,in which the PCP reached 97.3% and 98.3%,respectively.
基金supported in part by School Research Projects of Wuyi University (No.5041700175).
文摘Monocular depth estimation is the basic task in computer vision.Its accuracy has tremendous improvement in the decade with the development of deep learning.However,the blurry boundary in the depth map is a serious problem.Researchers find that the blurry boundary is mainly caused by two factors.First,the low-level features,containing boundary and structure information,may be lost in deep networks during the convolution process.Second,themodel ignores the errors introduced by the boundary area due to the few portions of the boundary area in the whole area,during the backpropagation.Focusing on the factors mentioned above.Two countermeasures are proposed to mitigate the boundary blur problem.Firstly,we design a scene understanding module and scale transformmodule to build a lightweight fuse feature pyramid,which can deal with low-level feature loss effectively.Secondly,we propose a boundary-aware depth loss function to pay attention to the effects of the boundary’s depth value.Extensive experiments show that our method can predict the depth maps with clearer boundaries,and the performance of the depth accuracy based on NYU-Depth V2,SUN RGB-D,and iBims-1 are competitive.
基金supported by the Program of Entrepreneurship and Innovation Ph.D.in Jiangsu Province(JSSCBS20211175)the School Ph.D.Talent Funding(Z301B2055)the Natural Science Foundation of the Jiangsu Higher Education Institutions of China(21KJB520002).
文摘3D human pose estimation is a major focus area in the field of computer vision,which plays an important role in practical applications.This article summarizes the framework and research progress related to the estimation of monocular RGB images and videos.An overall perspective ofmethods integrated with deep learning is introduced.Novel image-based and video-based inputs are proposed as the analysis framework.From this viewpoint,common problems are discussed.The diversity of human postures usually leads to problems such as occlusion and ambiguity,and the lack of training datasets often results in poor generalization ability of the model.Regression methods are crucial for solving such problems.Considering image-based input,the multi-view method is commonly used to solve occlusion problems.Here,the multi-view method is analyzed comprehensively.By referring to video-based input,the human prior knowledge of restricted motion is used to predict human postures.In addition,structural constraints are widely used as prior knowledge.Furthermore,weakly supervised learningmethods are studied and discussed for these two types of inputs to improve the model generalization ability.The problem of insufficient training datasets must also be considered,especially because 3D datasets are usually biased and limited.Finally,emerging and popular datasets and evaluation indicators are discussed.The characteristics of the datasets and the relationships of the indicators are explained and highlighted.Thus,this article can be useful and instructive for researchers who are lacking in experience and find this field confusing.In addition,by providing an overview of 3D human pose estimation,this article sorts and refines recent studies on 3D human pose estimation.It describes kernel problems and common useful methods,and discusses the scope for further research.
基金Supported by the National Natural Science Foundation of China under Grant No 11174235
文摘A method of source depth estimation based on the multi-path time delay difference is proposed. When the minimum time arrivals in all receiver depths are snapped to a certain time on time delay-depth plane, time delay arrivals of surface-bottom reflection and bottom-surface reflection intersect at the source depth. Two hydrophones deployed vertically with a certain interval are required at least. If the receiver depths are known, the pair of time delays can be used to estimate the source depth. With the proposed method the source depth can be estimated successfully in a moderate range in the deep ocean without complicated matched-field calculations in the simulations and experiments.
文摘Mineral exploration is done by different methods. Geophysical and geochemical studies are two powerful tools in this field. In integrated studies, the results of each study are used to determine the location of the drilling boreholes. The purpose of this study is to use field geophysics to calculate the depth of mineral reserve. The study area is located 38 km from Zarand city called Jalalabad iron mine. In this study, gravimetric data were measured and mineral depth was calculated using the Euler method. 1314 readings have been performed in this area. The rocks of the region include volcanic and sedimentary. The source of the mineralization in the area is hydrothermal processes. After gravity measuring in the region, the data were corrected, then various methods such as anomalous map remaining in levels one and two, upward expansion, first and second-degree vertical derivatives, analytical method, and analytical signal were drawn, and finally, the depth of the deposit was estimated by Euler method. As a result, the depth of the mineral deposit was calculated to be between 20 and 30 meters on average.
基金the Grant of Program for Scientific ResearchInnovation Team in Colleges and Universities of Anhui Province(2022AH010095)The Grant ofScientific Research and Talent Development Foundation of the Hefei University(No.21-22RC15)+2 种基金The Key Research Plan of Anhui Province(No.2022k07020011)The Grant of Anhui Provincial940 CMC,2024,vol.79,no.1Natural Science Foundation,No.2308085MF213The Open Fund of Information Materials andIntelligent Sensing Laboratory of Anhui Province IMIS202205,as well as the AI General ComputingPlatform of Hefei University.
文摘Depth estimation is an important task in computer vision.Collecting data at scale for monocular depth estimation is challenging,as this task requires simultaneously capturing RGB images and depth information.Therefore,data augmentation is crucial for this task.Existing data augmentationmethods often employ pixel-wise transformations,whichmay inadvertently disrupt edge features.In this paper,we propose a data augmentationmethod formonocular depth estimation,which we refer to as the Perpendicular-Cutdepth method.This method involves cutting realworld depth maps along perpendicular directions and pasting them onto input images,thereby diversifying the data without compromising edge features.To validate the effectiveness of the algorithm,we compared it with existing convolutional neural network(CNN)against the current mainstream data augmentation algorithms.Additionally,to verify the algorithm’s applicability to Transformer networks,we designed an encoder-decoder network structure based on Transformer to assess the generalization of our proposed algorithm.Experimental results demonstrate that,in the field of monocular depth estimation,our proposed method,Perpendicular-Cutdepth,outperforms traditional data augmentationmethods.On the indoor dataset NYU,our method increases accuracy from0.900 to 0.907 and reduces the error rate from0.357 to 0.351.On the outdoor dataset KITTI,our method improves accuracy from 0.9638 to 0.9642 and decreases the error rate from 0.060 to 0.0598.
基金supported by the National Natural Science Foundation of China(Grant Nos.60832003)the Key Laboratory of Advanced Display and System Applications(Shanghai University),Ministry of Education,China(Grant No.P200801)the Science and Technology Commission of Shanghai Municipality(Grant No.10510500500)
文摘Depth estimation is an active research area with the developing of stereo vision in recent years. It is one of the key technologies to resolve the large data of stereo vision communication. Now depth estimation still has some problems, such as occlusion, fuzzy edge, real-time processing, etc. Many algorithms have been proposed base on software, however the performance of the computer configurations limits the software processing speed. The other resolution is hardware design and the great developments of the digital signal processor (DSP), and application specific integrated circuit (ASIC) and field programmable gate array (FPGA) provide the opportunity of flexible applications. In this work, by analyzing the procedures of depth estimation, the proper algorithms which can be used in hardware design to execute real-time depth estimation are proposed. The different methods of calibration, matching and post-processing are analyzed based on the hardware design requirements. At last some tests for the algorithm have been analyzed. The results show that the algorithms proposed for hardware design can provide credited depth map for further view synthesis and are suitable for hardware design.
基金This work was supported by the national key research development plan(Project No.YS2018YFB1403703)research project of the communication university of china(Project No.CUC200D058).
文摘Learning-based multi-task models have been widely used in various scene understanding tasks,and complement each other,i.e.,they allow us to consider prior semantic information to better infer depth.We boost the unsupervised monocular depth estimation using semantic segmentation as an auxiliary task.To address the lack of cross-domain datasets and catastrophic forgetting problems encountered in multi-task training,we utilize existing methodology to obtain redundant segmentation maps to build our cross-domain dataset,which not only provides a new way to conduct multi-task training,but also helps us to evaluate results compared with those of other algorithms.In addition,in order to comprehensively use the extracted features of the two tasks in the early perception stage,we use a strategy of sharing weights in the network to fuse cross-domain features,and introduce a novel multi-task loss function to further smooth the depth values.Extensive experiments on KITTI and Cityscapes datasets show that our method has achieved state-of-the-art performance in the depth estimation task,as well improved semantic segmentation.
文摘Most approaches to estimate a scene’s 3D depth from a single image often model the point spread function (PSF) as a 2D Gaussian function. However, those method<span>s</span><span> are suffered from some noises, and difficult to get a high quality of depth recovery. We presented a simple yet effective approach to estimate exactly the amount of spatially varying defocus blur at edges, based on </span><span>a</span><span> Cauchy distribution model for the PSF. The raw image was re-blurred twice using two known Cauchy distribution kernels, and the defocus blur amount at edges could be derived from the gradient ratio between the two re-blurred images. By propagating the blur amount at edge locations to the entire image using the matting interpolation, a full depth map was then recovered. Experimental results on several real images demonstrated both feasibility and effectiveness of our method, being a non-Gaussian model for DSF, in providing a better estimation of the defocus map from a single un-calibrated defocused image. These results also showed that our method </span><span>was</span><span> robust to image noises, inaccurate edge location and interferences of neighboring edges. It could generate more accurate scene depth maps than the most of existing methods using a Gaussian based DSF model.</span>
基金Supported by the National Natural Science Foundation of China under Grants 61872241, 62077037 and 62077037Shanghai Municipal Science and Technology Major Project under Grant 2021SHZDZX0102。
文摘Background Monocular depth estimation aims to predict a dense depth map from a single RGB image,and has important applications in 3D reconstruction,automatic driving,and augmented reality.However,existing methods directly feed the original RGB image into the model to extract depth features without avoiding the interference of depth-irrelevant information on depth-estimation accuracy,which leads to inferior performance.Methods To remove the influence of depth-irrelevant information and improve the depth-prediction accuracy,we propose RADepthNet,a novel reflectance-guided network that fuses boundary features.Specifically,our method predicts depth maps using the following three steps:(1)Intrinsic Image Decomposition.We propose a reflectance extraction module consisting of an encoder-decoder structure to extract the depth-related reflectance.Through an ablation study,we demonstrate that the module can reduce the influence of illumination on depth estimation.(2)Boundary Detection.A boundary extraction module,consisting of an encoder,refinement block,and upsample block,was proposed to better predict the depth at object boundaries utilizing gradient constraints.(3)Depth Prediction Module.We use an encoder different from(2)to obtain depth features from the reflectance map and fuse boundary features to predict depth.In addition,we proposed FIFADataset,a depth-estimation dataset applied in soccer scenarios.Results Extensive experiments on a public dataset and our proposed FIFADataset show that our method achieves state-of-the-art performance.
基金funded by Guangdong Provincial Natural Science Foundation General Project(Grant No.2023A1515011700)GuangDong Basic and Applied Basic Research Foundation(Grant No.2022A1515110007)+1 种基金the Guangdong Provincial Natural Science Foundation General Project(Grant No.2023A1515012869)GDAS'Project of Science and Technology Development(Grant No.2022GDASZH-2022010108).
文摘The estimation of fish mass is one of the most basic and important tasks in aquaculture.Acquiring the mass of fish at different growth stages is of great significance for feeding,monitoring the health status of fish,and making breeding plans to increase production.The existing estimation methods for fish mass often stay in the 2D plane,and it is difficult to obtain the 3D information on fish,which will lead to the error.To solve this problem,a multi-view method was proposed to obtain the 3D information of fish and predict the mass of fish through a two-stage neural network with an edge-sensitive module.In the first stage,the side-and downward-view images of the fish and some 3D information,such as side area,top area,length,deflection angle,and pitch angle,were captured to estimate the size of the fish through two vertically placed cameras.Then the area of the fish at different views was estimated accurately through the pre-trained image segmentation neural network with an edgesensitive module.In the second stage,a fully connected neural network was constructed to regress the fish mass based on the 3D information obtained in the previous stage.The experimental results indicate that the proposed method can accurately estimate the fish mass and outperform the existing estimation methods.
文摘Remarkable progress has been made in self-supervised monocular depth estimation (SS-MDE) by exploring cross-view consistency, e.g., photometric consistency and 3D point cloud consistency. However, they are very vulnerable to illumination variance, occlusions, texture-less regions, as well as moving objects, making them not robust enough to deal with various scenes. To address this challenge, we study two kinds of robust cross-view consistency in this paper. Firstly, the spatial offset field between adjacent frames is obtained by reconstructing the reference frame from its neighbors via deformable alignment, which is used to align the temporal depth features via a depth feature alignment (DFA) loss. Secondly, the 3D point clouds of each reference frame and its nearby frames are calculated and transformed into voxel space, where the point density in each voxel is calculated and aligned via a voxel density alignment (VDA) loss. In this way, we exploit the temporal coherence in both depth feature space and 3D voxel space for SS-MDE, shifting the “point-to-point” alignment paradigm to the “region-to-region” one. Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges. Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques. Extensive ablation study and analysis validate the effectiveness of the proposed losses, especially in challenging scenes. The code and models are available at https://github.com/sunnyHelen/RCVC-depth.
基金Projects 40272114 and 40572160 supported by the National Natural Science Foundation of China
文摘The law of variation of deep rock stress in gravitational and tectonic stress fields is analyzed based on the Hoek-Brown strength criterion. In the gravitational stress field,the rocks in the shallow area are in an elastic state and the deep,relatively soft rock may be in a plastic state. However,in the tectonic stress field,the relatively soft rock in the shallow area is in a plastic state and the deep rock in an elastic state. A method is proposed to estimate stress values in coal and soft rock based on in-situ measurements of hard rock. Our estimation method relates to the type of stress field and stress state. The equations of rock stress in various stress states are presented for the elastic,plastic and critical states. The critical state is a special stress state,which indicates the conversion of the elastic to the plastic state in the gravitational stress field and the conversion of the plastic to the elastic state in the tectonic stress field. Two cases stud-ies show that the estimation method is feasible.
文摘Identifying underground utilities and predicting their depth are fundamental when it comes to civil engineering excavations, for example, to install or repair water, sewer, gas, electric systems and others. The accidental rupture of these systems can lead to unplanned repair costs, delays in completing the service, and risk injury or death of workers. One way to detect underground utilities is using the GPR-Ground Penetrating Radar geophysical method. To estimate depth, the travel time (two-way travel time) information provided by a radargram is used in conjunction with ground wave velocity, which depends on the dielectric constant of materials, where it is usually assumed to be constant for the area under investigation. This procedure provides satisfactory results in most cases. However, wrong depth estimates can result in damage to public utilities, rupturing pipes, cutting lines and so on. These cases occur mainly in areas that have a marked variation of water content and/or soil lithology, thus greater care is required to determine the depth of the targets. The present work demonstrates how the interval velocity of Dix (1955) can be applied in radargram to estimate the depth of underground utilities compared to the conventional technique of constant velocity applied to the same data set. To accomplish this, synthetic and real GPR data were used to verify the applicability of the interval velocity technique and to determine the accuracy of the depth estimates obtained. The studies were carried out at the IAG/USP test site, a controlled environment, where metallic drums are buried in known positions and depths allowing the comparison of real to estimated depths. Numerical studies were also carried out aiming to simulate the real environment with variation of dielectric constant in depth and to validate the results with real data. The results showed that the depths of the targets were estimated more accurately by means of the interval velocity technique in contrast to the constant velocity technique, minimizing the risks of accidents during excavation.
文摘Light field cameras have a wide area of applications, such as digital refocusing, scene depth information extraction and 3-D image reconstruction. By recording the energy and direction information of light field, they can well solve many technical problems that cannot be done by conventional cameras. An important feature of light field cameras is that a microlens array is inserted between the sensor and main lens, through which a series of sub-aperture images of different perspectives are formed. Based on this feature and the full-focus image acquisition technique, we propose a light-field optical flow calculation algorithm, which involves both the depth estimation and the occlusion detection and guarantees the edge-preserving property. This algorithm consists of three steps: 1) Computing the dense optical flow field among a group of sub-aperture images;2) Obtaining a robust depth-estimation by initializing the light-filed optical flow using the linear regression approach and detecting occluded areas using the consistency;3) Computing an improved light-field depth map by using the edge-preserving algorithm to realize interpolation optimization. The reliability and high accuracy of the proposed approach is validated by experimental results.
基金supported in part by a fund from Bentley Systems,Inc.
文摘Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models.However,alongside the advantages,depth-sensing also presents many practical challenges.For instance,the depth sensors impose an additional payload burden on the robotic inspection platforms limiting the operation time and increasing the inspection cost.Additionally,some lidar-based depth sensors have poor outdoor performance due to sunlight contamination during the daytime.In this context,this study investigates the feasibility of abolishing depth-sensing at test time without compromising the segmentation performance.An autonomous damage segmentation framework is developed,based on recent advancements in vision-based multi-modal sensing such as modality hallucination(MH)and monocular depth estimation(MDE),which require depth data only during the model training.At the time of deployment,depth data becomes expendable as it can be simulated from the corresponding RGB frames.This makes it possible to reap the benefits of depth fusion without any depth perception per se.This study explored two different depth encoding techniques and three different fusion strategies in addition to a baseline RGB-based model.The proposed approach is validated on computer-generated RGB-D data of reinforced concrete buildings subjected to seismic damage.It was observed that the surrogate techniques can increase the segmentation IoU by up to 20.1%with a negligible increase in the computation cost.Overall,this study is believed to make a positive contribution to enhancing the resilience of critical civil infrastructure.