Monocular 3D object detection is challenging due to the lack of accurate depth information.Some methods estimate the pixel-wise depth maps from off-the-shelf depth estimators and then use them as an additional input t...Monocular 3D object detection is challenging due to the lack of accurate depth information.Some methods estimate the pixel-wise depth maps from off-the-shelf depth estimators and then use them as an additional input to augment the RGB images.Depth-based methods attempt to convert estimated depth maps to pseudo-LiDAR and then use LiDAR-based object detectors or focus on the perspective of image and depth fusion learning.However,they demonstrate limited performance and efficiency as a result of depth inaccuracy and complex fusion mode with convolutions.Different from these approaches,our proposed depth-guided vision transformer with a normalizing flows(NF-DVT)network uses normalizing flows to build priors in depth maps to achieve more accurate depth information.Then we develop a novel Swin-Transformer-based backbone with a fusion module to process RGB image patches and depth map patches with two separate branches and fuse them using cross-attention to exchange information with each other.Furthermore,with the help of pixel-wise relative depth values in depth maps,we develop new relative position embeddings in the cross-attention mechanism to capture more accurate sequence ordering of input tokens.Our method is the first Swin-Transformer-based backbone architecture for monocular 3D object detection.The experimental results on the KITTI and the challenging Waymo Open datasets show the effectiveness of our proposed method and superior performance over previous counterparts.展开更多
Based on the theory of first-order reaction kinetics,a thermal reaction kinetic model in integral form has been derive.To make the model more applicable,the effects of time and the conversion degree on the reaction ra...Based on the theory of first-order reaction kinetics,a thermal reaction kinetic model in integral form has been derive.To make the model more applicable,the effects of time and the conversion degree on the reaction rate parameters were considered.Two types of undetermined functions were used to compensate for the intrinsic variation of the reaction rate,and two types of correction methods are provided.The model was explained and verified using published experimental data of different polymer thermal reaction systems,and its effectiveness and wide adaptability were confirmed.For the given kinetic model,only one parameter needs to be determined.The proposed empirical model is expected to be used in the numerical simulation of polymer thermal reaction process.展开更多
Metastatic carcinoma of the spleen (MCS) is a rare condition which is frequency misdiagnosed. Research progress on the prevalence, clinicopathological features and diagnosis of MCS from the Chinese and English medical...Metastatic carcinoma of the spleen (MCS) is a rare condition which is frequency misdiagnosed. Research progress on the prevalence, clinicopathological features and diagnosis of MCS from the Chinese and English medical literature was reviewed to increase understanding of all aspects related to MCS. It is hoped that a better comprehension of MCS will increase the diagnotic level and the rate of MCS detection.展开更多
Data augmentation is widely recognized as an effective means of bolstering model robustness.However,when applied to monocular 3D object detection,non-geometric image augmentation neglects the critical link between the...Data augmentation is widely recognized as an effective means of bolstering model robustness.However,when applied to monocular 3D object detection,non-geometric image augmentation neglects the critical link between the image and physical space,resulting in the semantic collapse of the extended scene.To address this issue,we propose two geometric-level data augmentation operators named Geometric-Copy-Paste(Geo-CP)and Geometric-Crop-Shrink(Geo-CS).Both operators introduce geometric consistency based on the principle of perspective projection,complementing the options available for data augmentation in monocular 3D.Specifically,Geo-CP replicates local patches by reordering object depths to mitigate perspective occlusion conflicts,and Geo-CS re-crops local patches for simultaneous scaling of distance and scale to unify appearance and annotation.These operations ameliorate the problem of class imbalance in the monocular paradigm by increasing the quantity and distribution of geometrically consistent samples.Experiments demonstrate that our geometric-level augmentation operators effectively improve robustness and performance in the KITTI and Waymo monocular 3D detection benchmarks.展开更多
Recently,deep learning has been widely utilized for object tracking tasks.However,deep learning encounters limits in tasks such as Autonomous Aerial Refueling(AAR),where the target object can vary substantially in siz...Recently,deep learning has been widely utilized for object tracking tasks.However,deep learning encounters limits in tasks such as Autonomous Aerial Refueling(AAR),where the target object can vary substantially in size,requiring high-precision real-time performance in embedded systems.This paper presents a novel embedded adaptiveness single-object tracking framework based on an improved YOLOv4 detection approach and an n-fold Bernoulli probability theorem.First,an Asymmetric Convolutional Network(ACNet)and dense blocks are combined with the YOLOv4 architecture to detect small objects with high precision when similar objects are in the background.The prior object information,such as its location in the previous frame and its speed,is utilized to adaptively track objects of various sizes.Moreover,based on the n-fold Bernoulli probability theorem,we develop a filter that uses statistical laws to reduce the false positive rate of object tracking.To evaluate the efficiency of our algorithm,a new AAR dataset is collected,and extensive AAR detection and tracking experiments are performed.The results demonstrate that our improved detection algorithm is better than the original YOLOv4 algorithm on small and similar object detection tasks;the object tracking algorithm is better than state-of-the-art object tracking algorithms on refueling drogue tracking tasks.展开更多
Visual information is highly advantageous for the evolutionary success of almost all animals.This information is likewise critical for many computing tasks,and visual computing has achieved tremendous successes in num...Visual information is highly advantageous for the evolutionary success of almost all animals.This information is likewise critical for many computing tasks,and visual computing has achieved tremendous successes in numerous applications over the last 60 years or so.In that time,the development of visual computing has moved forwards with inspiration from biological mechanisms many times.In particular,deep neural networks were inspired by the hierarchical processing mechanisms that exist in the visual cortex of primate brains(including ours),and have achieved huge breakthroughs in many domainspecific visual tasks.In order to better understand biologically inspired visual computing,we will present a survey of the current work,and hope to offer some new avenues for rethinking visual computing and designing novel neural network architectures.展开更多
Automatic object classification in traffic scene videos is an important issue for intelligent visual surveillance with great potential for all kinds of security applications. However, this problem is very challenging ...Automatic object classification in traffic scene videos is an important issue for intelligent visual surveillance with great potential for all kinds of security applications. However, this problem is very challenging for the following reasons. Firstly, regions of interest in videos are of low res- olution and limited size due to the capacity of conventional surveillance cameras. Secondly, the intra-class variations are very large due to changes of view angles, lighting conditions, and environments. Thirdly, real-time performance of algo- rithms is always required for real applications. In this paper, we evaluate the performance of local feature descriptors for automatic object classification in traffic scenes. Image inten- sity or gradient information is directly used to construct ef- fective feature vectors from regions of interest extracted via motion detection. This strategy has great advantages of ef- ficiency compared to various complicated texture features. We not only analyze and evaluate the performance of differ- ent feature descriptors, but also fuse different scales and fea- tures to achieve better performance. Numerous experiments are conducted and experimental results demonstrate the ef- ficiency and effectiveness of this strategy with robustness to noise, variance of view angles, lighting conditions, and environments.展开更多
Structure information plays an important role in both object recognition and detection. This paper studies what visual structure is and addresses the problem of struc- ture modeling and representation from two aspects...Structure information plays an important role in both object recognition and detection. This paper studies what visual structure is and addresses the problem of struc- ture modeling and representation from two aspects: visual feature and topology model. Firstly, at feature level, we pro- pose Local Structured Descriptor to capture the object's local structure effectively, and develop the descriptors from shape and texture information, respectively. Secondly, at topology level, we present a local strnctured model with a boosted fea- ture selection and fusion scheme. All experiments are conducted on the challenging PASCAL Visual Object Classes (VOC) datasets from VOC2007 to VOC2010. Experimental results show that our method achieves very competitive performance.展开更多
The mechanism of hard surfaces worn by soft polymers is not clearly understood.In this paper,a new hypothesis has been proposed,it holds that the stress acting on the hard surface under certain working conditions is t...The mechanism of hard surfaces worn by soft polymers is not clearly understood.In this paper,a new hypothesis has been proposed,it holds that the stress acting on the hard surface under certain working conditions is the main reason for wear of the hard surface by a soft polymer.The hypothesis was investigated by changing the contact form between tribo-pairs.For this,friction tests between six polymer spheres and smooth,rough,and inclined monocrystalline silicon surfaces were carried out.The results show that for the same tribo-pair,the silicon surface will not be worn in some contact forms,but in other contact forms it will be worn.We believe the wear of hard surface by a soft polymer is the result of the combined stress state action on the hard surface.展开更多
Domain adaptation(DA)for semantic segmentation aims to reduce the annotation burden for the dense pixellevel prediction task.It focuses on tackling the domain gap problem and manages to transfer knowledge learned from...Domain adaptation(DA)for semantic segmentation aims to reduce the annotation burden for the dense pixellevel prediction task.It focuses on tackling the domain gap problem and manages to transfer knowledge learned from abundant source data to new target scenes.Although recent works have achieved rapid progress in this field,they still underperform fully supervised models with a large margin due to the absence of any available hints in the target domain.Considering that few-shot labels are cheap to obtain in practical applications,wc attempt to leverage them to mitigate the performance gap between DA and fully supervised methods.The key to this problem is to leverage the few-shot labels to learn robust domain-invariant predictions effectively.To this end,we first design a data perturbation strategy to enhance the robustness of the representations.Furthermore,a transferable prototype module is proposed to bridge the domain gap based on the source data and few-shot targets.By means of these proposed methods,our approach can perform on par with the fully supervised models to some extent.We conduct extensive experiments to demonstrate the effectiveness of the proposed methods and report the state-of-the-art performance on two popular DA tasks,i.e.,from GTA5 to Cityscapes and SYNTHIA to Cityscapes.展开更多
基金supported in part by the Major Project for New Generation of AI (2018AAA0100400)the National Natural Science Foundation of China (61836014,U21B2042,62072457,62006231)the InnoHK Program。
文摘Monocular 3D object detection is challenging due to the lack of accurate depth information.Some methods estimate the pixel-wise depth maps from off-the-shelf depth estimators and then use them as an additional input to augment the RGB images.Depth-based methods attempt to convert estimated depth maps to pseudo-LiDAR and then use LiDAR-based object detectors or focus on the perspective of image and depth fusion learning.However,they demonstrate limited performance and efficiency as a result of depth inaccuracy and complex fusion mode with convolutions.Different from these approaches,our proposed depth-guided vision transformer with a normalizing flows(NF-DVT)network uses normalizing flows to build priors in depth maps to achieve more accurate depth information.Then we develop a novel Swin-Transformer-based backbone with a fusion module to process RGB image patches and depth map patches with two separate branches and fuse them using cross-attention to exchange information with each other.Furthermore,with the help of pixel-wise relative depth values in depth maps,we develop new relative position embeddings in the cross-attention mechanism to capture more accurate sequence ordering of input tokens.Our method is the first Swin-Transformer-based backbone architecture for monocular 3D object detection.The experimental results on the KITTI and the challenging Waymo Open datasets show the effectiveness of our proposed method and superior performance over previous counterparts.
基金supported by the National Key Research and Development Program of China(Grant No.2018YFB2001002)。
文摘Based on the theory of first-order reaction kinetics,a thermal reaction kinetic model in integral form has been derive.To make the model more applicable,the effects of time and the conversion degree on the reaction rate parameters were considered.Two types of undetermined functions were used to compensate for the intrinsic variation of the reaction rate,and two types of correction methods are provided.The model was explained and verified using published experimental data of different polymer thermal reaction systems,and its effectiveness and wide adaptability were confirmed.For the given kinetic model,only one parameter needs to be determined.The proposed empirical model is expected to be used in the numerical simulation of polymer thermal reaction process.
文摘Metastatic carcinoma of the spleen (MCS) is a rare condition which is frequency misdiagnosed. Research progress on the prevalence, clinicopathological features and diagnosis of MCS from the Chinese and English medical literature was reviewed to increase understanding of all aspects related to MCS. It is hoped that a better comprehension of MCS will increase the diagnotic level and the rate of MCS detection.
基金This work was supported in part by the National Key R&D Program of China(No.2022ZD0160102)the National Natural Science Foundation of China(Grant Nos.61836014,U21B2042,62072457,62006231).
文摘Data augmentation is widely recognized as an effective means of bolstering model robustness.However,when applied to monocular 3D object detection,non-geometric image augmentation neglects the critical link between the image and physical space,resulting in the semantic collapse of the extended scene.To address this issue,we propose two geometric-level data augmentation operators named Geometric-Copy-Paste(Geo-CP)and Geometric-Crop-Shrink(Geo-CS).Both operators introduce geometric consistency based on the principle of perspective projection,complementing the options available for data augmentation in monocular 3D.Specifically,Geo-CP replicates local patches by reordering object depths to mitigate perspective occlusion conflicts,and Geo-CS re-crops local patches for simultaneous scaling of distance and scale to unify appearance and annotation.These operations ameliorate the problem of class imbalance in the monocular paradigm by increasing the quantity and distribution of geometrically consistent samples.Experiments demonstrate that our geometric-level augmentation operators effectively improve robustness and performance in the KITTI and Waymo monocular 3D detection benchmarks.
文摘Recently,deep learning has been widely utilized for object tracking tasks.However,deep learning encounters limits in tasks such as Autonomous Aerial Refueling(AAR),where the target object can vary substantially in size,requiring high-precision real-time performance in embedded systems.This paper presents a novel embedded adaptiveness single-object tracking framework based on an improved YOLOv4 detection approach and an n-fold Bernoulli probability theorem.First,an Asymmetric Convolutional Network(ACNet)and dense blocks are combined with the YOLOv4 architecture to detect small objects with high precision when similar objects are in the background.The prior object information,such as its location in the previous frame and its speed,is utilized to adaptively track objects of various sizes.Moreover,based on the n-fold Bernoulli probability theorem,we develop a filter that uses statistical laws to reduce the false positive rate of object tracking.To evaluate the efficiency of our algorithm,a new AAR dataset is collected,and extensive AAR detection and tracking experiments are performed.The results demonstrate that our improved detection algorithm is better than the original YOLOv4 algorithm on small and similar object detection tasks;the object tracking algorithm is better than state-of-the-art object tracking algorithms on refueling drogue tracking tasks.
基金This work was supported in part by the National Key R&D Program of China(2018YFB1004600)the National Natural Science Foundation of China(Grant Nos.61761146004,61773375)+1 种基金the Beijing Municipal Natural Science Foundation(Z181100008918010)Chinese Academy of Sciences(153D31KYSB20160282).
文摘Visual information is highly advantageous for the evolutionary success of almost all animals.This information is likewise critical for many computing tasks,and visual computing has achieved tremendous successes in numerous applications over the last 60 years or so.In that time,the development of visual computing has moved forwards with inspiration from biological mechanisms many times.In particular,deep neural networks were inspired by the hierarchical processing mechanisms that exist in the visual cortex of primate brains(including ours),and have achieved huge breakthroughs in many domainspecific visual tasks.In order to better understand biologically inspired visual computing,we will present a survey of the current work,and hope to offer some new avenues for rethinking visual computing and designing novel neural network architectures.
文摘Automatic object classification in traffic scene videos is an important issue for intelligent visual surveillance with great potential for all kinds of security applications. However, this problem is very challenging for the following reasons. Firstly, regions of interest in videos are of low res- olution and limited size due to the capacity of conventional surveillance cameras. Secondly, the intra-class variations are very large due to changes of view angles, lighting conditions, and environments. Thirdly, real-time performance of algo- rithms is always required for real applications. In this paper, we evaluate the performance of local feature descriptors for automatic object classification in traffic scenes. Image inten- sity or gradient information is directly used to construct ef- fective feature vectors from regions of interest extracted via motion detection. This strategy has great advantages of ef- ficiency compared to various complicated texture features. We not only analyze and evaluate the performance of differ- ent feature descriptors, but also fuse different scales and fea- tures to achieve better performance. Numerous experiments are conducted and experimental results demonstrate the ef- ficiency and effectiveness of this strategy with robustness to noise, variance of view angles, lighting conditions, and environments.
文摘Structure information plays an important role in both object recognition and detection. This paper studies what visual structure is and addresses the problem of struc- ture modeling and representation from two aspects: visual feature and topology model. Firstly, at feature level, we pro- pose Local Structured Descriptor to capture the object's local structure effectively, and develop the descriptors from shape and texture information, respectively. Secondly, at topology level, we present a local strnctured model with a boosted fea- ture selection and fusion scheme. All experiments are conducted on the challenging PASCAL Visual Object Classes (VOC) datasets from VOC2007 to VOC2010. Experimental results show that our method achieves very competitive performance.
基金The work is supported by the National Key R&D Program of China(Grant No.2018YFB2001001)the National Natural Science Foundation of China(Grant Nos.51575300 and 51735006).
文摘The mechanism of hard surfaces worn by soft polymers is not clearly understood.In this paper,a new hypothesis has been proposed,it holds that the stress acting on the hard surface under certain working conditions is the main reason for wear of the hard surface by a soft polymer.The hypothesis was investigated by changing the contact form between tribo-pairs.For this,friction tests between six polymer spheres and smooth,rough,and inclined monocrystalline silicon surfaces were carried out.The results show that for the same tribo-pair,the silicon surface will not be worn in some contact forms,but in other contact forms it will be worn.We believe the wear of hard surface by a soft polymer is the result of the combined stress state action on the hard surface.
基金This work was supported in part by the National Key R&D Program of China(2019QY1604)the Major Project for New Generation of AI(2018AAA0100400)the National Youth Talent Support Program,and the National Natural Science Foundation of China(Grant Nos.U21B2042,62006231,and 62072457).
文摘Domain adaptation(DA)for semantic segmentation aims to reduce the annotation burden for the dense pixellevel prediction task.It focuses on tackling the domain gap problem and manages to transfer knowledge learned from abundant source data to new target scenes.Although recent works have achieved rapid progress in this field,they still underperform fully supervised models with a large margin due to the absence of any available hints in the target domain.Considering that few-shot labels are cheap to obtain in practical applications,wc attempt to leverage them to mitigate the performance gap between DA and fully supervised methods.The key to this problem is to leverage the few-shot labels to learn robust domain-invariant predictions effectively.To this end,we first design a data perturbation strategy to enhance the robustness of the representations.Furthermore,a transferable prototype module is proposed to bridge the domain gap based on the source data and few-shot targets.By means of these proposed methods,our approach can perform on par with the fully supervised models to some extent.We conduct extensive experiments to demonstrate the effectiveness of the proposed methods and report the state-of-the-art performance on two popular DA tasks,i.e.,from GTA5 to Cityscapes and SYNTHIA to Cityscapes.