Accurate vehicle localization is a key technology for autonomous driving tasks in indoor parking lots,such as automated valet parking.Additionally,infrastructure-based cooperative driving systems have become a means t...Accurate vehicle localization is a key technology for autonomous driving tasks in indoor parking lots,such as automated valet parking.Additionally,infrastructure-based cooperative driving systems have become a means to realizing intelligent driving.In this paper,we propose a novel and practical vehicle localization system using infrastructure-based RGB-D cameras for indoor parking lots.In the proposed system,we design a depth data preprocessing method with both simplicity and efficiency to reduce the computational burden resulting from a large amount of data.Meanwhile,the hardware synchronization for all cameras in the sensor network is not implemented owing to the disadvantage that it is extremely cumbersome and would significantly reduce the scalability of our system in mass deployments.Hence,to address the problem of data distortion accompanying vehicle motion,we propose a vehicle localization method by performing template point cloud registration in distributed depth data.Finally,a complete hardware system was built to verify the feasibility of our solution in a real-world environment.Experiments in an indoor parking lot demonstrated the effectiveness and accuracy of the proposed vehicle localization system,with a maximum root mean squared error of 5 cm at 15Hz compared with the ground truth.展开更多
Multi-view dynamic three-dimensional reconstruction has typically required the use of custom shutter-synchronized camera rigs in order to capture scenes containing rapid movements or complex topology changes. In this ...Multi-view dynamic three-dimensional reconstruction has typically required the use of custom shutter-synchronized camera rigs in order to capture scenes containing rapid movements or complex topology changes. In this paper, we demonstrate that multiple unsynchronized low-cost RGB-D cameras can be used for the same purpose. To alleviate issues caused by unsynchronized shutters, we propose a novel depth frame interpolation technique that allows synchronized data capture from highly dynamic 3 D scenes. To manage the resulting huge number of input depth images, we also introduce an efficient moving least squares-based volumetric reconstruction method that generates triangle meshes of the scene. Our approach does not store the reconstruction volume in memory,making it memory-efficient and scalable to large scenes.Our implementation is completely GPU based and works in real time. The results shown herein, obtained with real data, demonstrate the effectiveness of our proposed method and its advantages compared to stateof-the-art approaches.展开更多
针对室内场景图像语义分割结果不精确、显著图粗糙的问题,提出一种基于多模态特征优化提取和双路径引导解码的网络架构(feature regulator and dual-path guidance,FG-Net)。具体来说,设计的特征调节器对每个阶段的多模态特征依次进行...针对室内场景图像语义分割结果不精确、显著图粗糙的问题,提出一种基于多模态特征优化提取和双路径引导解码的网络架构(feature regulator and dual-path guidance,FG-Net)。具体来说,设计的特征调节器对每个阶段的多模态特征依次进行噪声过滤、重加权表示、差异性互补和交互融合,通过强化RGB和深度特征聚合,优化特征提取过程中的多模态特征表示。然后,在解码阶段引入特征交互融合后丰富的跨模态线索,进一步发挥多模态特征的优势。结合双路径协同引导结构,在解码阶段融合多尺度、多层次的特征信息,从而输出更细致的显著图。实验在公开数据集NYUD-v2和SUN RGB-D上进行,在主要评价指标mIoU上达到48.5%,优于其他先进算法。结果表明,该算法实现了更精细的室内场景图像语义分割,表现出了较好的泛化性和鲁棒性。展开更多
Theγ-rays are widely and abundantly present in strong nuclear radiation environments,and when they act on the camera equipment used to obtain environmental visual information on nuclear robots,radiation effects will ...Theγ-rays are widely and abundantly present in strong nuclear radiation environments,and when they act on the camera equipment used to obtain environmental visual information on nuclear robots,radiation effects will occur,which will degrade the performance of the camera system,reduce the imaging quality,and even cause catastrophic consequences.Color reducibility is an important index for evaluating the imaging quality of color camera,but its degradation mechanism in a nuclear radiation environment is still unclear.In this paper,theγ-ray irradiation experiments of CMOS cameras were carried out to analyse the degradation law of the camera’s color reducibility with cumulative irradiation and reveal the degradation mechanism of the color information of the CMOS camera underγ-ray irradiation.The results show that the spectral response of CMOS image sensor(CIS)and the spectral transmittance of lens after irradiation affect the values of a^(*)and b^(*)in the LAB color model.While the full well capacity(FWC)of CIS and transmittance of lens affect the value of L^(*)in the LAB color model,thus increase color difference and reduce brightness,the combined effect of color difference and brightness degradation will reduce the color reducibility of CMOS cameras.Therefore,the degradation of the color information of the CMOS camera afterγ-ray irradiation mainly comes from the changes in the FWC and spectral response of CIS,and the spectral transmittance of lens.展开更多
High-quality 3D reconstruction is an important topic in computer graphics and computer vision with many applications,such as robotics and augmented reality.The advent of consumer RGB-D cameras has made a profound adva...High-quality 3D reconstruction is an important topic in computer graphics and computer vision with many applications,such as robotics and augmented reality.The advent of consumer RGB-D cameras has made a profound advance in indoor scene reconstruction.For the past few years,researchers have spent significant effort to develop algorithms to capture 3D models with RGB-D cameras.As depth images produced by consumer RGB-D cameras are noisy and incomplete when surfaces are shiny,bright,transparent,or far from the camera,obtaining highquality 3D scene models is still a challenge for existing systems.We here review high-quality 3D indoor scene reconstruction methods using consumer RGB-D cameras.In this paper,we make comparisons and analyses from the following aspects:(i)depth processing methods in 3D reconstruction are reviewed in terms of enhancement and completion,(ii)ICP-based,feature-based,and hybrid methods of camera pose estimation methods are reviewed,and(iii)surface reconstruction methods are reviewed in terms of surface fusion,optimization,and completion.The performance of state-of-the-art methods is also compared and analyzed.This survey will be useful for researchers who want to follow best practices in designing new high-quality 3D reconstruction methods.展开更多
The widespread availability of digital multimedia data has led to a new challenge in digital forensics.Traditional source camera identification algorithms usually rely on various traces in the capturing process.Howeve...The widespread availability of digital multimedia data has led to a new challenge in digital forensics.Traditional source camera identification algorithms usually rely on various traces in the capturing process.However,these traces have become increasingly difficult to extract due to wide availability of various image processing algorithms.Convolutional Neural Networks(CNN)-based algorithms have demonstrated good discriminative capabilities for different brands and even different models of camera devices.However,their performances is not ideal in case of distinguishing between individual devices of the same model,because cameras of the same model typically use the same optical lens,image sensor,and image processing algorithms,that result in minimal overall differences.In this paper,we propose a camera forensics algorithm based on multi-scale feature fusion to address these issues.The proposed algorithm extracts different local features from feature maps of different scales and then fuses them to obtain a comprehensive feature representation.This representation is then fed into a subsequent camera fingerprint classification network.Building upon the Swin-T network,we utilize Transformer Blocks and Graph Convolutional Network(GCN)modules to fuse multi-scale features from different stages of the backbone network.Furthermore,we conduct experiments on established datasets to demonstrate the feasibility and effectiveness of the proposed approach.展开更多
An ultrafast framing camera with a pulse-dilation device,a microchannel plate(MCP)imager,and an electronic imaging system were reported.The camera achieved a temporal resolution of 10 ps by using a pulse-dilation devi...An ultrafast framing camera with a pulse-dilation device,a microchannel plate(MCP)imager,and an electronic imaging system were reported.The camera achieved a temporal resolution of 10 ps by using a pulse-dilation device and gated MCP imager,and a spatial resolution of 100μm by using an electronic imaging system comprising combined magnetic lenses.The spatial resolution characteristics of the camera were studied both theoretically and experimentally.The results showed that the camera with combined magnetic lenses reduced the field curvature and acquired a larger working area.A working area with a diameter of 53 mm was created by applying four magnetic lenses to the camera.Furthermore,the camera was used to detect the X-rays produced by the laser-targeting device.The diagnostic results indicated that the width of the X-ray pulse was approximately 18 ps.展开更多
This paper introduces an intelligent computational approach for extracting salient objects fromimages and estimatingtheir distance information with PTZ (Pan-Tilt-Zoom) cameras. PTZ cameras have found wide applications...This paper introduces an intelligent computational approach for extracting salient objects fromimages and estimatingtheir distance information with PTZ (Pan-Tilt-Zoom) cameras. PTZ cameras have found wide applications innumerous public places, serving various purposes such as public securitymanagement, natural disastermonitoring,and crisis alarms, particularly with the rapid development of Artificial Intelligence and global infrastructuralprojects. In this paper, we combine Gauss optical principles with the PTZ camera’s capabilities of horizontal andpitch rotation, as well as optical zoom, to estimate the distance of the object.We present a novel monocular objectdistance estimation model based on the Focal Length-Target Pixel Size (FLTPS) relationship, achieving an accuracyrate of over 95% for objects within a 5 km range. The salient object extraction is achieved through a simplifiedconvolution kernel and the utilization of the object’s RGB features, which offer significantly faster computingspeeds compared to Convolutional Neural Networks (CNNs). Additionally, we introduce the dark channel beforethe fog removal algorithm, resulting in a 20 dB increase in image definition, which significantly benefits distanceestimation. Our system offers the advantages of stability and low device load, making it an asset for public securityaffairs and providing a reference point for future developments in surveillance hardware.展开更多
This paper aims to develop an automatic miscalibration detection and correction framework to maintain accurate calibration of LiDAR and camera for autonomous vehicle after the sensor drift.First,a monitoring algorithm...This paper aims to develop an automatic miscalibration detection and correction framework to maintain accurate calibration of LiDAR and camera for autonomous vehicle after the sensor drift.First,a monitoring algorithm that can continuously detect the miscalibration in each frame is designed,leveraging the rotational motion each individual sensor observes.Then,as sensor drift occurs,the projection constraints between visual feature points and LiDAR 3-D points are used to compute the scaled camera motion,which is further utilized to align the drifted LiDAR scan with the camera image.Finally,the proposed method is sufficiently compared with two representative approaches in the online experiments with varying levels of random drift,then the method is further extended to the offline calibration experiment and is demonstrated by a comparison with two existing benchmark methods.展开更多
It is well known that the accuracy of camera calibration is constrained by the size of the reference plate,it is difficult to fabricate large reference plates with high precision.Therefore,it is non-trivial to calibra...It is well known that the accuracy of camera calibration is constrained by the size of the reference plate,it is difficult to fabricate large reference plates with high precision.Therefore,it is non-trivial to calibrate a camera with large field of view(FOV).In this paper,a method is proposed to construct a virtual large reference plate with high precision.Firstly,a high precision datum plane is constructed with a laser interferometer and one-dimensional air guideway,and then the reference plate is positioned at different locations and orientations in the FOV of the camera.The feature points of reference plate are projected to the datum plane to obtain a virtual large reference plate with high-precision.The camera is moved to several positions to get different virtual reference plates,and the camera is calibrated with the virtual reference plates.The experimental results show that the mean re-projection error of the camera calibrated with the proposed method is 0.062 pixels.The length of a scale bar with standard length of 959.778mm was measured with a vision system composed of two calibrated cameras,and the length measurement error is 0.389mm.展开更多
Real-time indoor camera localization is a significant problem in indoor robot navigation and surveillance systems.The scene can change during the image sequence and plays a vital role in the localization performance o...Real-time indoor camera localization is a significant problem in indoor robot navigation and surveillance systems.The scene can change during the image sequence and plays a vital role in the localization performance of robotic applications in terms of accuracy and speed.This research proposed a real-time indoor camera localization system based on a recurrent neural network that detects scene change during the image sequence.An annotated image dataset trains the proposed system and predicts the camera pose in real-time.The system mainly improved the localization performance of indoor cameras by more accurately predicting the camera pose.It also recognizes the scene changes during the sequence and evaluates the effects of these changes.This system achieved high accuracy and real-time performance.The scene change detection process was performed using visual rhythm and the proposed recurrent deep architecture,which performed camera pose prediction and scene change impact evaluation.Overall,this study proposed a novel real-time localization system for indoor cameras that detects scene changes and shows how they affect localization performance.展开更多
基金the National Natural Science Foundation of China(No.62173228)。
文摘Accurate vehicle localization is a key technology for autonomous driving tasks in indoor parking lots,such as automated valet parking.Additionally,infrastructure-based cooperative driving systems have become a means to realizing intelligent driving.In this paper,we propose a novel and practical vehicle localization system using infrastructure-based RGB-D cameras for indoor parking lots.In the proposed system,we design a depth data preprocessing method with both simplicity and efficiency to reduce the computational burden resulting from a large amount of data.Meanwhile,the hardware synchronization for all cameras in the sensor network is not implemented owing to the disadvantage that it is extremely cumbersome and would significantly reduce the scalability of our system in mass deployments.Hence,to address the problem of data distortion accompanying vehicle motion,we propose a vehicle localization method by performing template point cloud registration in distributed depth data.Finally,a complete hardware system was built to verify the feasibility of our solution in a real-world environment.Experiments in an indoor parking lot demonstrated the effectiveness and accuracy of the proposed vehicle localization system,with a maximum root mean squared error of 5 cm at 15Hz compared with the ground truth.
文摘Multi-view dynamic three-dimensional reconstruction has typically required the use of custom shutter-synchronized camera rigs in order to capture scenes containing rapid movements or complex topology changes. In this paper, we demonstrate that multiple unsynchronized low-cost RGB-D cameras can be used for the same purpose. To alleviate issues caused by unsynchronized shutters, we propose a novel depth frame interpolation technique that allows synchronized data capture from highly dynamic 3 D scenes. To manage the resulting huge number of input depth images, we also introduce an efficient moving least squares-based volumetric reconstruction method that generates triangle meshes of the scene. Our approach does not store the reconstruction volume in memory,making it memory-efficient and scalable to large scenes.Our implementation is completely GPU based and works in real time. The results shown herein, obtained with real data, demonstrate the effectiveness of our proposed method and its advantages compared to stateof-the-art approaches.
基金National Natural Science Foundation of China(11805269)West Light Talent Training Plan of the Chinese Academy of Sciences(2022-XBQNXZ-010)Science and Technology Innovation Leading Talent Project of Xinjiang Uygur Autonomous Region(2022TSYCLJ0042)。
文摘Theγ-rays are widely and abundantly present in strong nuclear radiation environments,and when they act on the camera equipment used to obtain environmental visual information on nuclear robots,radiation effects will occur,which will degrade the performance of the camera system,reduce the imaging quality,and even cause catastrophic consequences.Color reducibility is an important index for evaluating the imaging quality of color camera,but its degradation mechanism in a nuclear radiation environment is still unclear.In this paper,theγ-ray irradiation experiments of CMOS cameras were carried out to analyse the degradation law of the camera’s color reducibility with cumulative irradiation and reveal the degradation mechanism of the color information of the CMOS camera underγ-ray irradiation.The results show that the spectral response of CMOS image sensor(CIS)and the spectral transmittance of lens after irradiation affect the values of a^(*)and b^(*)in the LAB color model.While the full well capacity(FWC)of CIS and transmittance of lens affect the value of L^(*)in the LAB color model,thus increase color difference and reduce brightness,the combined effect of color difference and brightness degradation will reduce the color reducibility of CMOS cameras.Therefore,the degradation of the color information of the CMOS camera afterγ-ray irradiation mainly comes from the changes in the FWC and spectral response of CIS,and the spectral transmittance of lens.
基金National Key R&D Program of China under Grant No.2018YFC2000600Open Projects Program of National Laboratory of Pattern Recognition under Grant No.202100009+1 种基金National Natural Science Foundation of China under Grant No.72071018Fundamental Research Funds for Central Universities under Grant No.2021TD006。
文摘High-quality 3D reconstruction is an important topic in computer graphics and computer vision with many applications,such as robotics and augmented reality.The advent of consumer RGB-D cameras has made a profound advance in indoor scene reconstruction.For the past few years,researchers have spent significant effort to develop algorithms to capture 3D models with RGB-D cameras.As depth images produced by consumer RGB-D cameras are noisy and incomplete when surfaces are shiny,bright,transparent,or far from the camera,obtaining highquality 3D scene models is still a challenge for existing systems.We here review high-quality 3D indoor scene reconstruction methods using consumer RGB-D cameras.In this paper,we make comparisons and analyses from the following aspects:(i)depth processing methods in 3D reconstruction are reviewed in terms of enhancement and completion,(ii)ICP-based,feature-based,and hybrid methods of camera pose estimation methods are reviewed,and(iii)surface reconstruction methods are reviewed in terms of surface fusion,optimization,and completion.The performance of state-of-the-art methods is also compared and analyzed.This survey will be useful for researchers who want to follow best practices in designing new high-quality 3D reconstruction methods.
基金This work was funded by the National Natural Science Foundation of China(Grant No.62172132)Public Welfare Technology Research Project of Zhejiang Province(Grant No.LGF21F020014)the Opening Project of Key Laboratory of Public Security Information Application Based on Big-Data Architecture,Ministry of Public Security of Zhejiang Police College(Grant No.2021DSJSYS002).
文摘The widespread availability of digital multimedia data has led to a new challenge in digital forensics.Traditional source camera identification algorithms usually rely on various traces in the capturing process.However,these traces have become increasingly difficult to extract due to wide availability of various image processing algorithms.Convolutional Neural Networks(CNN)-based algorithms have demonstrated good discriminative capabilities for different brands and even different models of camera devices.However,their performances is not ideal in case of distinguishing between individual devices of the same model,because cameras of the same model typically use the same optical lens,image sensor,and image processing algorithms,that result in minimal overall differences.In this paper,we propose a camera forensics algorithm based on multi-scale feature fusion to address these issues.The proposed algorithm extracts different local features from feature maps of different scales and then fuses them to obtain a comprehensive feature representation.This representation is then fed into a subsequent camera fingerprint classification network.Building upon the Swin-T network,we utilize Transformer Blocks and Graph Convolutional Network(GCN)modules to fuse multi-scale features from different stages of the backbone network.Furthermore,we conduct experiments on established datasets to demonstrate the feasibility and effectiveness of the proposed approach.
基金National Natural Science Foundation of China(NSFC)(No.11775147)Guangdong Basic and Applied Basic Research Foundation(Nos.2019A1515110130 and 2024A1515011832)+1 种基金Shenzhen Key Laboratory of Photonics and Biophotonics(ZDSYS20210623092006020)Shenzhen Science and Technology Program(Nos.JCYJ20210324095007020,JCYJ20200109105201936 and JCYJ20230808105019039).
文摘An ultrafast framing camera with a pulse-dilation device,a microchannel plate(MCP)imager,and an electronic imaging system were reported.The camera achieved a temporal resolution of 10 ps by using a pulse-dilation device and gated MCP imager,and a spatial resolution of 100μm by using an electronic imaging system comprising combined magnetic lenses.The spatial resolution characteristics of the camera were studied both theoretically and experimentally.The results showed that the camera with combined magnetic lenses reduced the field curvature and acquired a larger working area.A working area with a diameter of 53 mm was created by applying four magnetic lenses to the camera.Furthermore,the camera was used to detect the X-rays produced by the laser-targeting device.The diagnostic results indicated that the width of the X-ray pulse was approximately 18 ps.
基金the Social Development Project of Jiangsu Key R&D Program(BE2022680)the National Natural Science Foundation of China(Nos.62371253,52278119).
文摘This paper introduces an intelligent computational approach for extracting salient objects fromimages and estimatingtheir distance information with PTZ (Pan-Tilt-Zoom) cameras. PTZ cameras have found wide applications innumerous public places, serving various purposes such as public securitymanagement, natural disastermonitoring,and crisis alarms, particularly with the rapid development of Artificial Intelligence and global infrastructuralprojects. In this paper, we combine Gauss optical principles with the PTZ camera’s capabilities of horizontal andpitch rotation, as well as optical zoom, to estimate the distance of the object.We present a novel monocular objectdistance estimation model based on the Focal Length-Target Pixel Size (FLTPS) relationship, achieving an accuracyrate of over 95% for objects within a 5 km range. The salient object extraction is achieved through a simplifiedconvolution kernel and the utilization of the object’s RGB features, which offer significantly faster computingspeeds compared to Convolutional Neural Networks (CNNs). Additionally, we introduce the dark channel beforethe fog removal algorithm, resulting in a 20 dB increase in image definition, which significantly benefits distanceestimation. Our system offers the advantages of stability and low device load, making it an asset for public securityaffairs and providing a reference point for future developments in surveillance hardware.
基金Supported by National Natural Science Foundation of China(Grant Nos.52025121,52394263)National Key R&D Plan of China(Grant No.2023YFD2000301).
文摘This paper aims to develop an automatic miscalibration detection and correction framework to maintain accurate calibration of LiDAR and camera for autonomous vehicle after the sensor drift.First,a monitoring algorithm that can continuously detect the miscalibration in each frame is designed,leveraging the rotational motion each individual sensor observes.Then,as sensor drift occurs,the projection constraints between visual feature points and LiDAR 3-D points are used to compute the scaled camera motion,which is further utilized to align the drifted LiDAR scan with the camera image.Finally,the proposed method is sufficiently compared with two representative approaches in the online experiments with varying levels of random drift,then the method is further extended to the offline calibration experiment and is demonstrated by a comparison with two existing benchmark methods.
文摘It is well known that the accuracy of camera calibration is constrained by the size of the reference plate,it is difficult to fabricate large reference plates with high precision.Therefore,it is non-trivial to calibrate a camera with large field of view(FOV).In this paper,a method is proposed to construct a virtual large reference plate with high precision.Firstly,a high precision datum plane is constructed with a laser interferometer and one-dimensional air guideway,and then the reference plate is positioned at different locations and orientations in the FOV of the camera.The feature points of reference plate are projected to the datum plane to obtain a virtual large reference plate with high-precision.The camera is moved to several positions to get different virtual reference plates,and the camera is calibrated with the virtual reference plates.The experimental results show that the mean re-projection error of the camera calibrated with the proposed method is 0.062 pixels.The length of a scale bar with standard length of 959.778mm was measured with a vision system composed of two calibrated cameras,and the length measurement error is 0.389mm.
文摘Real-time indoor camera localization is a significant problem in indoor robot navigation and surveillance systems.The scene can change during the image sequence and plays a vital role in the localization performance of robotic applications in terms of accuracy and speed.This research proposed a real-time indoor camera localization system based on a recurrent neural network that detects scene change during the image sequence.An annotated image dataset trains the proposed system and predicts the camera pose in real-time.The system mainly improved the localization performance of indoor cameras by more accurately predicting the camera pose.It also recognizes the scene changes during the sequence and evaluates the effects of these changes.This system achieved high accuracy and real-time performance.The scene change detection process was performed using visual rhythm and the proposed recurrent deep architecture,which performed camera pose prediction and scene change impact evaluation.Overall,this study proposed a novel real-time localization system for indoor cameras that detects scene changes and shows how they affect localization performance.