Depth maps are used for synthesis virtual view in free-viewpoint television (FTV) systems. When depth maps are derived using existing depth estimation methods, the depth distortions will cause undesirable artifacts ...Depth maps are used for synthesis virtual view in free-viewpoint television (FTV) systems. When depth maps are derived using existing depth estimation methods, the depth distortions will cause undesirable artifacts in the synthesized views. To solve this problem, a 3D video quality model base depth maps (D-3DV) for virtual view synthesis and depth map coding in the FTV applications is proposed. First, the relationships between distortions in coded depth map and rendered view are derived. Then, a precisely 3DV quality model based depth characteristics is develop for the synthesized virtual views. Finally, based on D-3DV model, a multilateral filtering is applied as a pre-processed filter to reduce rendering artifacts. The experimental results evaluated by objective and subjective methods indicate that the proposed D-3DV model can reduce bit-rate of depth coding and achieve better rendering quality.展开更多
This paper proposes a new technique that is used to embed depth maps into corresponding 2-dimensional (2D) images. Since a 2D image and its depth map are integrated into one type of image format, they can be treated...This paper proposes a new technique that is used to embed depth maps into corresponding 2-dimensional (2D) images. Since a 2D image and its depth map are integrated into one type of image format, they can be treated as if they were one 2D image. Thereby, it can reduce the amount of data in 3D images by half and simplify the processes for sending them through networks because the synchronization between images for the left and right eyes becomes unnecessary. We embed depth maps in the quantized discrete cosine transform (DCT) data of 2D images. The key to this technique is whether the depth maps could be embedded into 2D images without perceivably deteriorating their quality. We try to reduce their deterioration by compressing the depth map data by using the differences from the next pixel to the left. We assume that there is only one non-zero pixel at most on one horizontal line in the DCT block because the depth map values change abruptly. We conduct an experiment to evaluate the quality of the 2D images embedded with depth maps and find that satisfactory quality could be achieved.展开更多
We propose a novel image segmentation algorithm to tackle the challenge of limited recognition and segmentation performance in identifying welding seam images during robotic intelligent operations.Initially,to enhance...We propose a novel image segmentation algorithm to tackle the challenge of limited recognition and segmentation performance in identifying welding seam images during robotic intelligent operations.Initially,to enhance the capability of deep neural networks in extracting geometric attributes from depth images,we developed a novel deep geometric convolution operator(DGConv).DGConv is utilized to construct a deep local geometric feature extraction module,facilitating a more comprehensive exploration of the intrinsic geometric information within depth images.Secondly,we integrate the newly proposed deep geometric feature module with the Fully Convolutional Network(FCN8)to establish a high-performance deep neural network algorithm tailored for depth image segmentation.Concurrently,we enhance the FCN8 detection head by separating the segmentation and classification processes.This enhancement significantly boosts the network’s overall detection capability.Thirdly,for a comprehensive assessment of our proposed algorithm and its applicability in real-world industrial settings,we curated a line-scan image dataset featuring weld seams.This dataset,named the Standardized Linear Depth Profile(SLDP)dataset,was collected from actual industrial sites where autonomous robots are in operation.Ultimately,we conducted experiments utilizing the SLDP dataset,achieving an average accuracy of 92.7%.Our proposed approach exhibited a remarkable performance improvement over the prior method on the identical dataset.Moreover,we have successfully deployed the proposed algorithm in genuine industrial environments,fulfilling the prerequisites of unmanned robot operations.展开更多
In the paper, an approach is proposed for the problem of consistency in depth maps estimation from binocular stereo video sequence. The consistent method includes temporal consistency and spatial consistency to elimin...In the paper, an approach is proposed for the problem of consistency in depth maps estimation from binocular stereo video sequence. The consistent method includes temporal consistency and spatial consistency to eliminate the flickering artifacts and smooth inaccuracy in depth recovery. So the improved global stereo matching based on graph cut and energy optimization is implemented. In temporal domain, the penalty function with coherence factor is introduced for temporal consistency, and the factor is determined by Lucas-Kanade optical flow weighted histogram similarity constraint(LKWHSC). In spatial domain, the joint bilateral truncated absolute difference(JBTAD) is proposed for segmentation smoothing. The method can smooth naturally and uniformly in low-gradient region and avoid over-smoothing as well as keep edge sharpness in high-gradient discontinuities to realize spatial consistency. The experimental results show that the algorithm can obtain better spatial and temporal consistent depth maps compared with the existing algorithms.展开更多
The creation of the 3D rendering model involves the prediction of an accurate depth map for the input images.A proposed approach of a modified semi-global block matching algorithm with variable window size and the gra...The creation of the 3D rendering model involves the prediction of an accurate depth map for the input images.A proposed approach of a modified semi-global block matching algorithm with variable window size and the gradient assessment of objects predicts the depth map.3D modeling and view synthesis algorithms could effectively handle the obtained disparity maps.This work uses the consistency check method to find an accurate depth map for identifying occluded pixels.The prediction of the disparity map by semi-global block matching has used the benchmark dataset of Middlebury stereo for evaluation.The improved depth map quality within a reasonable process-ing time outperforms the other existing depth map prediction algorithms.The experimental results have shown that the proposed depth map predictioncould identify the inter-object boundaryeven with the presence ofocclusion with less detection error and runtime.We observed that the Middlebury stereo dataset has very few images with occluded objects,which made the attainment of gain cumbersome.Considering this gain,we have created our dataset with occlu-sion using the structured lighting technique.The proposed regularization term as an optimization process in the graph cut algorithm handles occlusion for different smoothing coefficients.The experimented results demonstrated that our dataset had outperformed the Tsukuba dataset regarding the percentage of occluded pixels.展开更多
Both time-of-flight(ToF) cameras and passive stereo can provide the depth information for their corresponding captured real scenes, but they have innate limitations. ToF cameras and passive stereo are intrinsically co...Both time-of-flight(ToF) cameras and passive stereo can provide the depth information for their corresponding captured real scenes, but they have innate limitations. ToF cameras and passive stereo are intrinsically complementary for certain tasks. It is desirable to appropriately leverage all the available information by ToF cameras and passive stereo. Although some fusion methods have been presented recently, they fail to consider ToF reliability detection and ToF based improvement of passive stereo. As a result, this study proposes an approach to integrating ToF cameras and passive stereo to obtain high-accuracy depth maps. The main contributions are:(1) An energy cost function is devised to use data from ToF cameras to boost the stereo matching of passive stereo;(2) A fusion method is used to combine the depth information from both ToF cameras and passive stereo to obtain high-accuracy depth maps. Experiments show that the proposed approach achieves improved results with high accuracy and robustness.展开更多
We propose a novel interactive lighting editing system for lighting a single indoor RGB image based on spherical harmonic lighting.It allows users to intuitively edit illumination and relight the complicated low-light...We propose a novel interactive lighting editing system for lighting a single indoor RGB image based on spherical harmonic lighting.It allows users to intuitively edit illumination and relight the complicated low-light indoor scene.Our method not only achieves plausible global relighting but also enhances the local details of the complicated scene according to the spatially-varying spherical harmonic lighting,which only requires a single RGB image along with a corresponding depth map.To this end,we first present a joint optimization algorithm,which is based on the geometric optimization of the depth map and intrinsic image decomposition avoiding texture-copy,for refining the depth map and obtaining the shading map.Then we propose a lighting estimation method based on spherical harmonic lighting,which not only achieves the global illumination estimation of the scene,but also further enhances local details of the complicated scene.Finally,we use a simple and intuitive interactive method to edit the environment lighting map to adjust lighting and relight the scene.Through extensive experimental results,we demonstrate that our proposed approach is simple and intuitive for relighting the low-light indoor scene,and achieve state-of-the-art results.展开更多
We study the problem of humanactivity recognition from RGB-Depth(RGBD)sensors when the skeletons are not available.The skeleton tracking in Kinect SDK workswell when the human subject is facing thecamera and there are...We study the problem of humanactivity recognition from RGB-Depth(RGBD)sensors when the skeletons are not available.The skeleton tracking in Kinect SDK workswell when the human subject is facing thecamera and there are no occlusions.In surveillance or nursing home monitoring scenarios,however,the camera is usually mounted higher than human subjects,and there may beocclusions.The interest-point based approachis widely used in RGB based activity recognition,it can be used in both RGB and depthchannels.Whether we should extract interestpoints independently of each channel or extract interest points from only one of thechannels is discussed in this paper.The goal ofthis paper is to compare the performances ofdifferent methods of extracting interest points.In addition,we have developed a depth mapbased descriptor and built an RGBD dataset,called RGBD-SAR,for senior activity recognition.We show that the best performance isachieved when we extract interest points solely from RGB channels,and combine the RGBbased descriptors with the depth map-baseddescriptors.We also present a baseline performance of the RGBD-SAR dataset.展开更多
The depth information of the scene indicates the distance between the object and the camera,and depth extraction is a key technology in 3D video system.The emergence of Kinect makes the high resolution depth map captu...The depth information of the scene indicates the distance between the object and the camera,and depth extraction is a key technology in 3D video system.The emergence of Kinect makes the high resolution depth map capturing possible.However,the depth map captured by Kinect can not be directly used due to the existing holes and noises,which needs to be repaired.We propose a texture combined inpainting algorithm in this paper.Firstly,the foreground is segmented combined with the color characteristics of the texture image to repair the foreground of the depth map.Secondly,region growing is used to determine the match region of the hole in the depth map,and to accurately position the match region according to the texture information.Then the match region is weighted to fill the hole.Finally,a Gaussian filter is used to remove the noise in the depth map.Experimental results show that the proposed method can effectively repair the holes existing in the original depth map and get an accurate and smooth depth map,which can be used to render a virtual image with good quality.展开更多
Background Depth sensor is an essential element in virtual and augmented reality devices to digitalize users'environment in real time.The current popular technologies include the stereo,structured light,and Time-o...Background Depth sensor is an essential element in virtual and augmented reality devices to digitalize users'environment in real time.The current popular technologies include the stereo,structured light,and Time-of-Flight(ToF).The stereo and structured light method require a baseline separation between multiple sensors for depth sensing,and both suffer from a limited measurement range.The ToF depth sensors have the largest depth range but the lowest depth map resolution.To overcome these problems,we propose a co-axial depth map sensor which is potentially more compact and cost-effective than conventional structured light depth cameras.Meanwhile,it can extend the depth range while maintaining a high depth map resolution.Also,it provides a high-resolution 2 D image along with the 3 D depth map.Methods This depth sensor is constructed with a projection path and an imaging path.Those two paths are combined by a beamsplitter for a co-axial design.In the projection path,a cylindrical lens is inserted to add extra power in one direction which creates an astigmatic pattern.For depth measurement,the astigmatic pattern is projected onto the test scene,and then the depth information can be calculated from the contrast change of the reflected pattern image in two orthogonal directions.To extend the depth measurement range,we use an electronically focus tunable lens at the system stop and tune the power to implement an extended depth range without compromising depth resolution.Results In the depth measurement simulation,we project a resolution target onto a white screen which is moving along the optical axis and then tune the focus tunable lens power for three depth measurement subranges,namely,near,middle and far.In each sub-range,as the test screen moves away from the depth sensor,the horizontal contrast keeps increasing while the vertical contrast keeps decreasing in the reflected image.Therefore,the depth information can be obtained by computing the contrast ratio between features in orthogonal directions.Conclusions The proposed depth map sensor could implement depth measurement for an extended depth range with a co-axial design.展开更多
Existing GAN-based generative methodsare typically used for semantic image synthesis. Wepose the question of whether GAN-based architecturescan generate plausible depth maps and find thatexisting methods have difficul...Existing GAN-based generative methodsare typically used for semantic image synthesis. Wepose the question of whether GAN-based architecturescan generate plausible depth maps and find thatexisting methods have difficulty in generating depthmaps which reasonably represent 3D scene structuredue to the lack of global geometric correlations.Thus, we propose DepthGAN, a novel method ofgenerating a depth map using a semantic layout asinput to aid construction, and manipulation of wellstructured 3D scene point clouds. Specifically, wefirst build a feature generation model with a cascadeof semantically-aware transformer blocks to obtaindepth features with global structural information.For our semantically aware transformer block, wepropose a mixed attention module and a semanticallyaware layer normalization module to better exploitsemantic consistency for depth features generation.Moreover, we present a novel semantically weighteddepth synthesis module, which generates adaptivedepth intervals for the current scene. We generate thefinal depth map by using a weighted combination ofsemantically aware depth weights for different depthranges. In this manner, we obtain a more accuratedepth map. Extensive experiments on indoor andoutdoor datasets demonstrate that DepthGAN achievessuperior results both quantitatively and visually for thedepth generation task.展开更多
Hand gesture recognition is a popular topic in computer vision and makes human-computer interaction more flexible and convenient.The representation of hand gestures is critical for recognition.In this paper,we propose...Hand gesture recognition is a popular topic in computer vision and makes human-computer interaction more flexible and convenient.The representation of hand gestures is critical for recognition.In this paper,we propose a new method to measure the similarity between hand gestures and exploit it for hand gesture recognition.The depth maps of hand gestures captured via the Kinect sensors are used in our method,where the 3D hand shapes can be segmented from the cluttered backgrounds.To extract the pattern of salient 3D shape features,we propose a new descriptor-3D Shape Context,for 3D hand gesture representation.The 3D Shape Context information of each 3D point is obtained in multiple scales because both local shape context and global shape distribution are necessary for recognition.The description of all the 3D points constructs the hand gesture representation,and hand gesture recognition is explored via dynamic time warping algorithm.Extensive experiments are conducted on multiple benchmark datasets.The experimental results verify that the proposed method is robust to noise,articulated variations,and rigid transformations.Our method outperforms state-of-the-art methods in the comparisons of accuracy and efficiency.展开更多
The Moving Picture Experts Group (MPEG) has been developing a 3D video (3DV) coding standard for depth-based 3DV data representations, especially for multiview video plus depth (MVD) format. With MVD, depth-imag...The Moving Picture Experts Group (MPEG) has been developing a 3D video (3DV) coding standard for depth-based 3DV data representations, especially for multiview video plus depth (MVD) format. With MVD, depth-image-based rendering (DIBR)is used to synthesize virtual views that are based on a few transmitted pairs of texture and depth data. In this paper, we discuss ongoing 3DV standardization and summarize coding tools proposed in the responses to MPEG' s call for proposals on 3DV coding.展开更多
The conventional 2D metrics can be used for measuring the quality of depth maps,but none of them is considered to be efficient and is not accurate when used for evaluating 3D quality.In this paper,we propose a new ful...The conventional 2D metrics can be used for measuring the quality of depth maps,but none of them is considered to be efficient and is not accurate when used for evaluating 3D quality.In this paper,we propose a new full reference objective metric,called Sparse Representations-Mean Squared Error(SR-MSE),which efficiently evaluates the depth maps compression distortions.It adaptively models the reference and compressed depth maps in a mixed redundant transform domain dedicated to depth features.Then,it computes the mean squared error between the sparse coefficients issued from this modeling.As a benchmark of quality assessment,we perform a subjective evaluation test for depth maps compressed using the latest 3D High Efficiency Video Coding standard at various bitrates.We compare the subjective results with the proposed and conventional objective metrics.Experimental results demonstrate that the proposed SR-MSE,compared to the conventional image quality assessment metrics,yields the highest correlated scores to the subjective ones.展开更多
This Letter proposes a high bit-depth coding method to improve depth map resolution and render it suitable to human-eye observation in 3D range-intensity correlation laser imaging. In this method, a high bit-depth CCD...This Letter proposes a high bit-depth coding method to improve depth map resolution and render it suitable to human-eye observation in 3D range-intensity correlation laser imaging. In this method, a high bit-depth CCD camera with a nanosecond-sealed gated intensifier is used as an image sensor; subsequently two high bit-depth gate images with specific range-intensity profiles are obtained to establish the gray depth map and finally the gray depth map is encoded by an equidensity pseudocolor. With this method, a color depth map is generated with higher range resolution. In our experimental work, the range resolution of the depth map is improved by a factor of 1.67.展开更多
Depth map contains the space information of objects and is almost free from the influence of light,and it attracts many research interests in the field of machine vision used for human detection.Therefore,hunting a su...Depth map contains the space information of objects and is almost free from the influence of light,and it attracts many research interests in the field of machine vision used for human detection.Therefore,hunting a suitable image feature for human detection on depth map is rather attractive.In this paper,we evaluate the performance of the typical features on depth map.A depth map dataset containing various indoor scenes with human is constructed by using Microsoft’s Kinect camera as a quantitative benchmark for the study of methods of human detection on depth map.The depth map is smoothed with pixel filtering and context filtering so as to reduce particulate noise.Then,the performance of five image features and a new feature is studied and compared for human detection on the dataset through theoretic analysis and simulation experiments.Results show that the new feature outperforms other descriptors.展开更多
Reconstructing 3D models for single objects with complex backgrounds has wide applications like 3D printing,AR/VR,and so on.It is necessary to consider the tradeoff between capturing data at low cost and getting high-...Reconstructing 3D models for single objects with complex backgrounds has wide applications like 3D printing,AR/VR,and so on.It is necessary to consider the tradeoff between capturing data at low cost and getting high-quality reconstruction results.In this work,we propose a voxel-based modeling pipeline with sparse RGB-D images to effectively and efficiently reconstruct a single real object without the geometrical post-processing operation on background removal.First,referring to the idea of VisualHull,useless and inconsistent voxels of a targeted object are clipped.It helps focus on the target object and rectify the voxel projection information.Second,a modified TSDF calculation and voxel filling operations are proposed to alleviate the problem of depth missing in the depth images.They can improve TSDF value completeness for voxels on the surface of the object.After the mesh is generated by the MarchingCube,texture mapping is optimized with view selection,color optimization,and camera parameters fine-tuning.Experiments on Kinect capturing dataset,TUM public dataset,and virtual environment dataset validate the effectiveness and flexibility of our proposed pipeline.展开更多
A depth map (close to that of the thermocline as defined by 20 ℃) of climatically maximum seatemperature anomaly was created at the subsurface of the tropical Pacific and Indian Ocean, based on which the evolving ...A depth map (close to that of the thermocline as defined by 20 ℃) of climatically maximum seatemperature anomaly was created at the subsurface of the tropical Pacific and Indian Ocean, based on which the evolving sea-temperature anomaly at this depth map from 1960 to 2000 was statistically analyzed. It is noted that the evolving sea temperature anomaly at this depth map can be better analyzed than the evolving sea surface one. For example, during the ENSO event in the tropical Pacific, the seatemperature anomaly signals travel counter-clockwise within the range of 10°S-10°N, and while moving, the signals change in intensity or even type. If Dipole is used in the tropical Indian Ocean for analyzing the depth map of maximum sea-temperature anomaly, the sea-temperature anomalies of the eastern and western Indian Oceans would be negatively correlated in statistical sense (Dipole in real physical sense), which is unlike the sea surface temperature anomaly based analysis which demonstrates that the inter-annual positive and negative changes only occur on the gradients of the western and eastern temperature anomalies. Further analysis shows that the development of ENSO and Dipole has a time lag features statistically, with the sea-temperature anomaly in the eastern equatorial Pacific changing earlier (by three months or so). And the linkage between these two changes is a pair of coupled evolving Walker circulations that move reversely in the equatorial Pacific and Indian Oceans.展开更多
基金supported by the National Natural Science Foundation of China(Grant No.60832003)Key Laboratory of Advanced Display and System Application(Shanghai University),Ministry of Education,China(Grant No.P200902)the Key Project of Science and Technology Commission of Shanghai Municipality(Grant No.10510500500)
文摘Depth maps are used for synthesis virtual view in free-viewpoint television (FTV) systems. When depth maps are derived using existing depth estimation methods, the depth distortions will cause undesirable artifacts in the synthesized views. To solve this problem, a 3D video quality model base depth maps (D-3DV) for virtual view synthesis and depth map coding in the FTV applications is proposed. First, the relationships between distortions in coded depth map and rendered view are derived. Then, a precisely 3DV quality model based depth characteristics is develop for the synthesized virtual views. Finally, based on D-3DV model, a multilateral filtering is applied as a pre-processed filter to reduce rendering artifacts. The experimental results evaluated by objective and subjective methods indicate that the proposed D-3DV model can reduce bit-rate of depth coding and achieve better rendering quality.
文摘This paper proposes a new technique that is used to embed depth maps into corresponding 2-dimensional (2D) images. Since a 2D image and its depth map are integrated into one type of image format, they can be treated as if they were one 2D image. Thereby, it can reduce the amount of data in 3D images by half and simplify the processes for sending them through networks because the synchronization between images for the left and right eyes becomes unnecessary. We embed depth maps in the quantized discrete cosine transform (DCT) data of 2D images. The key to this technique is whether the depth maps could be embedded into 2D images without perceivably deteriorating their quality. We try to reduce their deterioration by compressing the depth map data by using the differences from the next pixel to the left. We assume that there is only one non-zero pixel at most on one horizontal line in the DCT block because the depth map values change abruptly. We conduct an experiment to evaluate the quality of the 2D images embedded with depth maps and find that satisfactory quality could be achieved.
基金This work was supported by the National Natural Science Foundation of China(Grant No.U20A20197).
文摘We propose a novel image segmentation algorithm to tackle the challenge of limited recognition and segmentation performance in identifying welding seam images during robotic intelligent operations.Initially,to enhance the capability of deep neural networks in extracting geometric attributes from depth images,we developed a novel deep geometric convolution operator(DGConv).DGConv is utilized to construct a deep local geometric feature extraction module,facilitating a more comprehensive exploration of the intrinsic geometric information within depth images.Secondly,we integrate the newly proposed deep geometric feature module with the Fully Convolutional Network(FCN8)to establish a high-performance deep neural network algorithm tailored for depth image segmentation.Concurrently,we enhance the FCN8 detection head by separating the segmentation and classification processes.This enhancement significantly boosts the network’s overall detection capability.Thirdly,for a comprehensive assessment of our proposed algorithm and its applicability in real-world industrial settings,we curated a line-scan image dataset featuring weld seams.This dataset,named the Standardized Linear Depth Profile(SLDP)dataset,was collected from actual industrial sites where autonomous robots are in operation.Ultimately,we conducted experiments utilizing the SLDP dataset,achieving an average accuracy of 92.7%.Our proposed approach exhibited a remarkable performance improvement over the prior method on the identical dataset.Moreover,we have successfully deployed the proposed algorithm in genuine industrial environments,fulfilling the prerequisites of unmanned robot operations.
基金the Science and Technology Innovation Project of Ministry of Culture of China(No.2014KJCXXM08)the National Key Technology Research and Development Program of the Ministry of Science and Technology of China(No.2012BAH37F02)the National High Technology Research and Development Program(863)of China(No.2011AA01A107)
文摘In the paper, an approach is proposed for the problem of consistency in depth maps estimation from binocular stereo video sequence. The consistent method includes temporal consistency and spatial consistency to eliminate the flickering artifacts and smooth inaccuracy in depth recovery. So the improved global stereo matching based on graph cut and energy optimization is implemented. In temporal domain, the penalty function with coherence factor is introduced for temporal consistency, and the factor is determined by Lucas-Kanade optical flow weighted histogram similarity constraint(LKWHSC). In spatial domain, the joint bilateral truncated absolute difference(JBTAD) is proposed for segmentation smoothing. The method can smooth naturally and uniformly in low-gradient region and avoid over-smoothing as well as keep edge sharpness in high-gradient discontinuities to realize spatial consistency. The experimental results show that the algorithm can obtain better spatial and temporal consistent depth maps compared with the existing algorithms.
文摘The creation of the 3D rendering model involves the prediction of an accurate depth map for the input images.A proposed approach of a modified semi-global block matching algorithm with variable window size and the gradient assessment of objects predicts the depth map.3D modeling and view synthesis algorithms could effectively handle the obtained disparity maps.This work uses the consistency check method to find an accurate depth map for identifying occluded pixels.The prediction of the disparity map by semi-global block matching has used the benchmark dataset of Middlebury stereo for evaluation.The improved depth map quality within a reasonable process-ing time outperforms the other existing depth map prediction algorithms.The experimental results have shown that the proposed depth map predictioncould identify the inter-object boundaryeven with the presence ofocclusion with less detection error and runtime.We observed that the Middlebury stereo dataset has very few images with occluded objects,which made the attainment of gain cumbersome.Considering this gain,we have created our dataset with occlu-sion using the structured lighting technique.The proposed regularization term as an optimization process in the graph cut algorithm handles occlusion for different smoothing coefficients.The experimented results demonstrated that our dataset had outperformed the Tsukuba dataset regarding the percentage of occluded pixels.
基金Project supported by the National Natural Science Foundation of China(Nos.61072081 and 61271338)the National High-Tech R&D Program(863)of China(No.2012AA011505)+2 种基金the National Science and Technology Major Project of the Ministry of Science and Technology of China(No.2009ZX01033-001-007)the Key Science and Technology Innovation Team of Zhejiang Province(No.2009R50003)the China Postdoctoral Science Foundation(No.2012T50545)
文摘Both time-of-flight(ToF) cameras and passive stereo can provide the depth information for their corresponding captured real scenes, but they have innate limitations. ToF cameras and passive stereo are intrinsically complementary for certain tasks. It is desirable to appropriately leverage all the available information by ToF cameras and passive stereo. Although some fusion methods have been presented recently, they fail to consider ToF reliability detection and ToF based improvement of passive stereo. As a result, this study proposes an approach to integrating ToF cameras and passive stereo to obtain high-accuracy depth maps. The main contributions are:(1) An energy cost function is devised to use data from ToF cameras to boost the stereo matching of passive stereo;(2) A fusion method is used to combine the depth information from both ToF cameras and passive stereo to obtain high-accuracy depth maps. Experiments show that the proposed approach achieves improved results with high accuracy and robustness.
基金supported by NSFC(No.61972298)Bingtuan Science and Technology Program(No.2019BC008).
文摘We propose a novel interactive lighting editing system for lighting a single indoor RGB image based on spherical harmonic lighting.It allows users to intuitively edit illumination and relight the complicated low-light indoor scene.Our method not only achieves plausible global relighting but also enhances the local details of the complicated scene according to the spatially-varying spherical harmonic lighting,which only requires a single RGB image along with a corresponding depth map.To this end,we first present a joint optimization algorithm,which is based on the geometric optimization of the depth map and intrinsic image decomposition avoiding texture-copy,for refining the depth map and obtaining the shading map.Then we propose a lighting estimation method based on spherical harmonic lighting,which not only achieves the global illumination estimation of the scene,but also further enhances local details of the complicated scene.Finally,we use a simple and intuitive interactive method to edit the environment lighting map to adjust lighting and relight the scene.Through extensive experimental results,we demonstrate that our proposed approach is simple and intuitive for relighting the low-light indoor scene,and achieve state-of-the-art results.
基金supported by the National Natural Science Foundation of China under Grants No.61075045,No.61273256the Program for New Century Excellent Talents in University under Grant No.NECT-10-0292+1 种基金the National Key Basic Research Program of China(973Program)under Grant No.2011-CB707000the Fundamental Research Funds for the Central Universities
文摘We study the problem of humanactivity recognition from RGB-Depth(RGBD)sensors when the skeletons are not available.The skeleton tracking in Kinect SDK workswell when the human subject is facing thecamera and there are no occlusions.In surveillance or nursing home monitoring scenarios,however,the camera is usually mounted higher than human subjects,and there may beocclusions.The interest-point based approachis widely used in RGB based activity recognition,it can be used in both RGB and depthchannels.Whether we should extract interestpoints independently of each channel or extract interest points from only one of thechannels is discussed in this paper.The goal ofthis paper is to compare the performances ofdifferent methods of extracting interest points.In addition,we have developed a depth mapbased descriptor and built an RGBD dataset,called RGBD-SAR,for senior activity recognition.We show that the best performance isachieved when we extract interest points solely from RGB channels,and combine the RGBbased descriptors with the depth map-baseddescriptors.We also present a baseline performance of the RGBD-SAR dataset.
基金Supported by the Key Project of National Natural Science Foundation of China(Nos.60832003 and 61172096)major Project of Shanghai Science and Technology Committee(No.10510500500)the Major Innovation Project of Shanghai Municipal Education Commission
文摘The depth information of the scene indicates the distance between the object and the camera,and depth extraction is a key technology in 3D video system.The emergence of Kinect makes the high resolution depth map capturing possible.However,the depth map captured by Kinect can not be directly used due to the existing holes and noises,which needs to be repaired.We propose a texture combined inpainting algorithm in this paper.Firstly,the foreground is segmented combined with the color characteristics of the texture image to repair the foreground of the depth map.Secondly,region growing is used to determine the match region of the hole in the depth map,and to accurately position the match region according to the texture information.Then the match region is weighted to fill the hole.Finally,a Gaussian filter is used to remove the noise in the depth map.Experimental results show that the proposed method can effectively repair the holes existing in the original depth map and get an accurate and smooth depth map,which can be used to render a virtual image with good quality.
文摘Background Depth sensor is an essential element in virtual and augmented reality devices to digitalize users'environment in real time.The current popular technologies include the stereo,structured light,and Time-of-Flight(ToF).The stereo and structured light method require a baseline separation between multiple sensors for depth sensing,and both suffer from a limited measurement range.The ToF depth sensors have the largest depth range but the lowest depth map resolution.To overcome these problems,we propose a co-axial depth map sensor which is potentially more compact and cost-effective than conventional structured light depth cameras.Meanwhile,it can extend the depth range while maintaining a high depth map resolution.Also,it provides a high-resolution 2 D image along with the 3 D depth map.Methods This depth sensor is constructed with a projection path and an imaging path.Those two paths are combined by a beamsplitter for a co-axial design.In the projection path,a cylindrical lens is inserted to add extra power in one direction which creates an astigmatic pattern.For depth measurement,the astigmatic pattern is projected onto the test scene,and then the depth information can be calculated from the contrast change of the reflected pattern image in two orthogonal directions.To extend the depth measurement range,we use an electronically focus tunable lens at the system stop and tune the power to implement an extended depth range without compromising depth resolution.Results In the depth measurement simulation,we project a resolution target onto a white screen which is moving along the optical axis and then tune the focus tunable lens power for three depth measurement subranges,namely,near,middle and far.In each sub-range,as the test screen moves away from the depth sensor,the horizontal contrast keeps increasing while the vertical contrast keeps decreasing in the reflected image.Therefore,the depth information can be obtained by computing the contrast ratio between features in orthogonal directions.Conclusions The proposed depth map sensor could implement depth measurement for an extended depth range with a co-axial design.
基金supported by the National Natural Science Foundation of China(U21A20515,62102393,62206263,62271467)Beijing Natural Science Foundation(4242053).
文摘Existing GAN-based generative methodsare typically used for semantic image synthesis. Wepose the question of whether GAN-based architecturescan generate plausible depth maps and find thatexisting methods have difficulty in generating depthmaps which reasonably represent 3D scene structuredue to the lack of global geometric correlations.Thus, we propose DepthGAN, a novel method ofgenerating a depth map using a semantic layout asinput to aid construction, and manipulation of wellstructured 3D scene point clouds. Specifically, wefirst build a feature generation model with a cascadeof semantically-aware transformer blocks to obtaindepth features with global structural information.For our semantically aware transformer block, wepropose a mixed attention module and a semanticallyaware layer normalization module to better exploitsemantic consistency for depth features generation.Moreover, we present a novel semantically weighteddepth synthesis module, which generates adaptivedepth intervals for the current scene. We generate thefinal depth map by using a weighted combination ofsemantically aware depth weights for different depthranges. In this manner, we obtain a more accuratedepth map. Extensive experiments on indoor andoutdoor datasets demonstrate that DepthGAN achievessuperior results both quantitatively and visually for thedepth generation task.
基金supported by the National Natural Science Foundation of China(61773272,61976191)the Six Talent Peaks Project of Jiangsu Province,China(XYDXX-053)Suzhou Research Project of Technical Innovation,Jiangsu,China(SYG201711)。
文摘Hand gesture recognition is a popular topic in computer vision and makes human-computer interaction more flexible and convenient.The representation of hand gestures is critical for recognition.In this paper,we propose a new method to measure the similarity between hand gestures and exploit it for hand gesture recognition.The depth maps of hand gestures captured via the Kinect sensors are used in our method,where the 3D hand shapes can be segmented from the cluttered backgrounds.To extract the pattern of salient 3D shape features,we propose a new descriptor-3D Shape Context,for 3D hand gesture representation.The 3D Shape Context information of each 3D point is obtained in multiple scales because both local shape context and global shape distribution are necessary for recognition.The description of all the 3D points constructs the hand gesture representation,and hand gesture recognition is explored via dynamic time warping algorithm.Extensive experiments are conducted on multiple benchmark datasets.The experimental results verify that the proposed method is robust to noise,articulated variations,and rigid transformations.Our method outperforms state-of-the-art methods in the comparisons of accuracy and efficiency.
文摘The Moving Picture Experts Group (MPEG) has been developing a 3D video (3DV) coding standard for depth-based 3DV data representations, especially for multiview video plus depth (MVD) format. With MVD, depth-image-based rendering (DIBR)is used to synthesize virtual views that are based on a few transmitted pairs of texture and depth data. In this paper, we discuss ongoing 3DV standardization and summarize coding tools proposed in the responses to MPEG' s call for proposals on 3DV coding.
文摘The conventional 2D metrics can be used for measuring the quality of depth maps,but none of them is considered to be efficient and is not accurate when used for evaluating 3D quality.In this paper,we propose a new full reference objective metric,called Sparse Representations-Mean Squared Error(SR-MSE),which efficiently evaluates the depth maps compression distortions.It adaptively models the reference and compressed depth maps in a mixed redundant transform domain dedicated to depth features.Then,it computes the mean squared error between the sparse coefficients issued from this modeling.As a benchmark of quality assessment,we perform a subjective evaluation test for depth maps compressed using the latest 3D High Efficiency Video Coding standard at various bitrates.We compare the subjective results with the proposed and conventional objective metrics.Experimental results demonstrate that the proposed SR-MSE,compared to the conventional image quality assessment metrics,yields the highest correlated scores to the subjective ones.
基金supported by the National Natural Science Foundation of China under Grant Nos.61205019 and 61475150
文摘This Letter proposes a high bit-depth coding method to improve depth map resolution and render it suitable to human-eye observation in 3D range-intensity correlation laser imaging. In this method, a high bit-depth CCD camera with a nanosecond-sealed gated intensifier is used as an image sensor; subsequently two high bit-depth gate images with specific range-intensity profiles are obtained to establish the gray depth map and finally the gray depth map is encoded by an equidensity pseudocolor. With this method, a color depth map is generated with higher range resolution. In our experimental work, the range resolution of the depth map is improved by a factor of 1.67.
基金support by China National Science Founda-tion No.61171145Shanghai Educational Research Foundation No.12ZZ083Shanghai University Graduate Students Innovation Foundation No.SHUCX120076.
文摘Depth map contains the space information of objects and is almost free from the influence of light,and it attracts many research interests in the field of machine vision used for human detection.Therefore,hunting a suitable image feature for human detection on depth map is rather attractive.In this paper,we evaluate the performance of the typical features on depth map.A depth map dataset containing various indoor scenes with human is constructed by using Microsoft’s Kinect camera as a quantitative benchmark for the study of methods of human detection on depth map.The depth map is smoothed with pixel filtering and context filtering so as to reduce particulate noise.Then,the performance of five image features and a new feature is studied and compared for human detection on the dataset through theoretic analysis and simulation experiments.Results show that the new feature outperforms other descriptors.
基金supported by the Key Technological Innovation Projects of Hubei Province,China(No.2018AAA062)the National Natural Science Foundation of China(No.61972298)+1 种基金the Ministry of Education of Humanities and Social Sciences Project,China(No.17YJC760124)the Scientific Research Project of Department of Education of Hubei Province,China(No.B2021278).
文摘Reconstructing 3D models for single objects with complex backgrounds has wide applications like 3D printing,AR/VR,and so on.It is necessary to consider the tradeoff between capturing data at low cost and getting high-quality reconstruction results.In this work,we propose a voxel-based modeling pipeline with sparse RGB-D images to effectively and efficiently reconstruct a single real object without the geometrical post-processing operation on background removal.First,referring to the idea of VisualHull,useless and inconsistent voxels of a targeted object are clipped.It helps focus on the target object and rectify the voxel projection information.Second,a modified TSDF calculation and voxel filling operations are proposed to alleviate the problem of depth missing in the depth images.They can improve TSDF value completeness for voxels on the surface of the object.After the mesh is generated by the MarchingCube,texture mapping is optimized with view selection,color optimization,and camera parameters fine-tuning.Experiments on Kinect capturing dataset,TUM public dataset,and virtual environment dataset validate the effectiveness and flexibility of our proposed pipeline.
基金Supported by the National Natural Science Fundation of China under Grand Nos.40176002 and 40231012
文摘A depth map (close to that of the thermocline as defined by 20 ℃) of climatically maximum seatemperature anomaly was created at the subsurface of the tropical Pacific and Indian Ocean, based on which the evolving sea-temperature anomaly at this depth map from 1960 to 2000 was statistically analyzed. It is noted that the evolving sea temperature anomaly at this depth map can be better analyzed than the evolving sea surface one. For example, during the ENSO event in the tropical Pacific, the seatemperature anomaly signals travel counter-clockwise within the range of 10°S-10°N, and while moving, the signals change in intensity or even type. If Dipole is used in the tropical Indian Ocean for analyzing the depth map of maximum sea-temperature anomaly, the sea-temperature anomalies of the eastern and western Indian Oceans would be negatively correlated in statistical sense (Dipole in real physical sense), which is unlike the sea surface temperature anomaly based analysis which demonstrates that the inter-annual positive and negative changes only occur on the gradients of the western and eastern temperature anomalies. Further analysis shows that the development of ENSO and Dipole has a time lag features statistically, with the sea-temperature anomaly in the eastern equatorial Pacific changing earlier (by three months or so). And the linkage between these two changes is a pair of coupled evolving Walker circulations that move reversely in the equatorial Pacific and Indian Oceans.