As the pancreas only occupies a small region in the whole abdominal computed tomography(CT)scans and has high variability in shape,location and size,deep neural networks in automatic pancreas segmentation task can be ...As the pancreas only occupies a small region in the whole abdominal computed tomography(CT)scans and has high variability in shape,location and size,deep neural networks in automatic pancreas segmentation task can be easily confused by the complex and variable background.To alleviate these issues,this paper proposes a novel pancreas segmentation optimization based on the coarse-to-fine structure,in which the coarse stage is responsible for increasing the proportion of the target region in the input image through the minimum bounding box,and the fine is for improving the accuracy of pancreas segmentation by enhancing the data diversity and by introducing a new segmentation model,and reducing the running time by adding a total weights constraint.This optimization is evaluated on the public pancreas segmentation dataset and achieves 87.87%average Dice-Sørensen coefficient(DSC)accuracy,which is 0.94%higher than 86.93%,result of the state-of-the-art pancreas segmentation methods.Moreover,this method has strong generalization that it can be easily applied to other coarse-to-fine or one step organ segmentation tasks.展开更多
Coarse-to-fine pyramid and scale space are two important image structures in the realm of image matching.However,the advantage of coarse-to-fine pyramid is neglected as the pyramid structure is usually constructed wit...Coarse-to-fine pyramid and scale space are two important image structures in the realm of image matching.However,the advantage of coarse-to-fine pyramid is neglected as the pyramid structure is usually constructed with the down sampling method in scale space.In addition,the importance of each lattice is different for one single image.Based on the analyses above,the new multi-pyramid(M-P)image spatial structure is constructed.First,coarse-to-fine pyramid is constructed by partitioning the original image into increasingly finer lattices,and the number of interest points is also adopted to be each lattice’s non-normalized weight on each pyramid level.Second,the scale space of each lattice on each pyramid level is generated with the classic Gaussian kernel.Third,the descriptors of each lattice are generated by regarding the stability of scale space as the description of image.Moreover,the parallel version of M-P algorithm is also presented to accelerate the speed of computation.Finally,the comprehensive experimental results reveal that our multi-pyramid structure which is constructed by the combination of coarse-to-fine spatial pyramid and scale space can generate more effective features,compared with the other related methods.展开更多
An approach to addressing the stereo correspondence problem is presented using genetic algorithms (GAs) to obtain a dense disparity map. Different from previous methods, this approach casts the stereo matching as a mu...An approach to addressing the stereo correspondence problem is presented using genetic algorithms (GAs) to obtain a dense disparity map. Different from previous methods, this approach casts the stereo matching as a multi-extrema optimization problem such that finding the fittest solution from a set of potential disparity maps. Among a wide variety of optimization techniques, GAs are proven to be potentially effective methods for the global optimization problems with large search space. With this idea, each disparity map is viewed as an individual and the disparity values are encoded as chromosomes, so each individual has lots of chromosomes in the approach. Then, several matching constraints are formulated into an objective function, and GAs are used to search the global optimal solution for the problem. Furthermore, the coarse-to-fine strategy has been embedded in the approach so as to reduce the matching ambiguity and the time consumption. Finally, experimental results on synthetic and real images show the performance of the work.展开更多
A minimal generalized time-bandwidth product-based coarse-to-fine strategy is proposed with one novel ideas highlighted: adopting a coarse-to-fine strategy to speed up the searching process. The simulation results on ...A minimal generalized time-bandwidth product-based coarse-to-fine strategy is proposed with one novel ideas highlighted: adopting a coarse-to-fine strategy to speed up the searching process. The simulation results on synthetic and real signals show the validity of the proposed method.展开更多
Near-duplicate image detection is a necessary operation to refine image search results for efficient user exploration. The existences of large amounts of near duplicates require fast and accurate automatic near-duplic...Near-duplicate image detection is a necessary operation to refine image search results for efficient user exploration. The existences of large amounts of near duplicates require fast and accurate automatic near-duplicate detection methods. We have designed a coarse-to-fine near duplicate detection framework to speed-up the process and a multi-modal integra-tion scheme for accurate detection. The duplicate pairs are detected with both global feature (partition based color his-togram) and local feature (CPAM and SIFT Bag-of-Word model). The experiment results on large scale data set proved the effectiveness of the proposed design.展开更多
Humongous amounts of data bring various challenges to face image retrieval. This paper proposes an efficient method to solve those problems. Firstly,we use accurate facial landmark locations as shape features. Secondl...Humongous amounts of data bring various challenges to face image retrieval. This paper proposes an efficient method to solve those problems. Firstly,we use accurate facial landmark locations as shape features. Secondly, we utilise shape priors to provide discriminative texture features for convolutional neural networks. These shape and texture features are fused to make the learned representation more robust.Finally, in order to increase efficiency, a coarse-tofine search mechanism is exploited to efficiently find similar objects. Extensive experiments on the CASIAWeb Face, MSRA-CFW, and LFW datasets illustrate the superiority of our method.展开更多
The images from a monocular camera can be processed to detect depth information regarding obstacles in the blind spot area captured by the side-view camera of a vehicle.The depth information is given as a classificati...The images from a monocular camera can be processed to detect depth information regarding obstacles in the blind spot area captured by the side-view camera of a vehicle.The depth information is given as a classification result“near”or“far”when two blocks in the image are compared with respect to their distances and the depth information can be used for the purpose of blind spot area detection.In this paper,the proposed depth information is inferred from a combination of blur cues and texture cues.The depth information is estimated by comparing the features of two image blocks selected within a single image.A preliminary experiment demonstrates that a convolutional neural network(CNN)model trained by deep learning with a set of relatively ideal images achieves good accuracy.The same CNN model is applied to distinguish near and far obstacles according to a specified threshold in the vehicle blind spot area,and the promising results are obtained.The proposed method uses a standard blind spot camera and can improve safety without other additional sensing devices.Thus,the proposed approach has the potential to be applied in vehicular applications for the detection of objects in the driver’s blind spot.展开更多
A shot presents a contiguous action recorded by an uninterrupted camera operation and frames within a shot keep spatio-temporal coherence. Segmenting a serial video stream file into meaningful shots is the first pass ...A shot presents a contiguous action recorded by an uninterrupted camera operation and frames within a shot keep spatio-temporal coherence. Segmenting a serial video stream file into meaningful shots is the first pass for the task of video analysis, content-based video understanding. In this paper, a novel scheme based on improved two-dimensional entropy is proposed to complete the partition of video shots. Firstly, shot transition candidates are detected using a two-pass algorithm: a coarse searching pass and a fine searching pass. Secondly, with the character of two-dimensional entropy of the image, correctly detected transition candidates are further classified into different transition types whereas those falsely detected shot breaks are distinguished and removed. Finally, the boundary of gradual transition can be precisely located by merging the characters of two-dimensional entropy of the image into the gradual transition. A large number of video sequences are used to test our system performance and promising results are obtained.展开更多
In the domain of point cloud registration,the coarse-to-fine feature matching paradigm has received significant attention due to its impressive performance.This paradigm involves a two-step process:first,the extractio...In the domain of point cloud registration,the coarse-to-fine feature matching paradigm has received significant attention due to its impressive performance.This paradigm involves a two-step process:first,the extraction of multilevel features,and subsequently,the propagation of correspondences from coarse to fine levels.However,this approach faces two notable limitations.Firstly,the use of the Dual Softmax operation may promote one-to-one correspondences between superpoints,inadvertently excluding valuable correspondences.Secondly,it is crucial to closely examine the overlapping areas between point clouds,as only correspondences within these regions decisively determine the actual transformation.Considering these issues,we propose OAAFormer to enhance correspondence quality.On the one hand,we introduce a soft matching mechanism to facilitate the propagation of potentially valuable correspondences from coarse to fine levels.On the other hand,we integrate an overlapping region detection module to minimize mismatches to the greatest extent possible.Furthermore,we introduce a region-wise attention module with linear complexity during the fine-level matching phase,designed to enhance the discriminative capabilities of the extracted features.Tests on the challenging 3DLoMatch benchmark demonstrate that our approach leads to a substantial increase of about 7%in the inlier ratio,as well as an enhancement of 2%-4%in registration recall.Finally,to accelerate the prediction process,we replace the Conventional Random Sample Consensus(RANSAC)algorithm with the selection of a limited yet representative set of high-confidence correspondences,resulting in a 100 times speedup while still maintaining comparable registration performance.展开更多
基金supported by the National Natural Science Foundation of China[61772242,61976106,61572239]the China Postdoctoral Science Foundation[2017M611737]+3 种基金the Six Talent Peaks Project in Jiangsu Province[DZXX-122]the Jiangsu Province EmergencyManagement Science and Technology Project[YJGL-TG-2020-8]the Key Research and Development Plan of Zhenjiang City[SH2020011]Postgraduate Innovation Fund of Jiangsu Province[KYCX18_2257].
文摘As the pancreas only occupies a small region in the whole abdominal computed tomography(CT)scans and has high variability in shape,location and size,deep neural networks in automatic pancreas segmentation task can be easily confused by the complex and variable background.To alleviate these issues,this paper proposes a novel pancreas segmentation optimization based on the coarse-to-fine structure,in which the coarse stage is responsible for increasing the proportion of the target region in the input image through the minimum bounding box,and the fine is for improving the accuracy of pancreas segmentation by enhancing the data diversity and by introducing a new segmentation model,and reducing the running time by adding a total weights constraint.This optimization is evaluated on the public pancreas segmentation dataset and achieves 87.87%average Dice-Sørensen coefficient(DSC)accuracy,which is 0.94%higher than 86.93%,result of the state-of-the-art pancreas segmentation methods.Moreover,this method has strong generalization that it can be easily applied to other coarse-to-fine or one step organ segmentation tasks.
文摘Coarse-to-fine pyramid and scale space are two important image structures in the realm of image matching.However,the advantage of coarse-to-fine pyramid is neglected as the pyramid structure is usually constructed with the down sampling method in scale space.In addition,the importance of each lattice is different for one single image.Based on the analyses above,the new multi-pyramid(M-P)image spatial structure is constructed.First,coarse-to-fine pyramid is constructed by partitioning the original image into increasingly finer lattices,and the number of interest points is also adopted to be each lattice’s non-normalized weight on each pyramid level.Second,the scale space of each lattice on each pyramid level is generated with the classic Gaussian kernel.Third,the descriptors of each lattice are generated by regarding the stability of scale space as the description of image.Moreover,the parallel version of M-P algorithm is also presented to accelerate the speed of computation.Finally,the comprehensive experimental results reveal that our multi-pyramid structure which is constructed by the combination of coarse-to-fine spatial pyramid and scale space can generate more effective features,compared with the other related methods.
文摘An approach to addressing the stereo correspondence problem is presented using genetic algorithms (GAs) to obtain a dense disparity map. Different from previous methods, this approach casts the stereo matching as a multi-extrema optimization problem such that finding the fittest solution from a set of potential disparity maps. Among a wide variety of optimization techniques, GAs are proven to be potentially effective methods for the global optimization problems with large search space. With this idea, each disparity map is viewed as an individual and the disparity values are encoded as chromosomes, so each individual has lots of chromosomes in the approach. Then, several matching constraints are formulated into an objective function, and GAs are used to search the global optimal solution for the problem. Furthermore, the coarse-to-fine strategy has been embedded in the approach so as to reduce the matching ambiguity and the time consumption. Finally, experimental results on synthetic and real images show the performance of the work.
文摘A minimal generalized time-bandwidth product-based coarse-to-fine strategy is proposed with one novel ideas highlighted: adopting a coarse-to-fine strategy to speed up the searching process. The simulation results on synthetic and real signals show the validity of the proposed method.
文摘Near-duplicate image detection is a necessary operation to refine image search results for efficient user exploration. The existences of large amounts of near duplicates require fast and accurate automatic near-duplicate detection methods. We have designed a coarse-to-fine near duplicate detection framework to speed-up the process and a multi-modal integra-tion scheme for accurate detection. The duplicate pairs are detected with both global feature (partition based color his-togram) and local feature (CPAM and SIFT Bag-of-Word model). The experiment results on large scale data set proved the effectiveness of the proposed design.
文摘Humongous amounts of data bring various challenges to face image retrieval. This paper proposes an efficient method to solve those problems. Firstly,we use accurate facial landmark locations as shape features. Secondly, we utilise shape priors to provide discriminative texture features for convolutional neural networks. These shape and texture features are fused to make the learned representation more robust.Finally, in order to increase efficiency, a coarse-tofine search mechanism is exploited to efficiently find similar objects. Extensive experiments on the CASIAWeb Face, MSRA-CFW, and LFW datasets illustrate the superiority of our method.
文摘The images from a monocular camera can be processed to detect depth information regarding obstacles in the blind spot area captured by the side-view camera of a vehicle.The depth information is given as a classification result“near”or“far”when two blocks in the image are compared with respect to their distances and the depth information can be used for the purpose of blind spot area detection.In this paper,the proposed depth information is inferred from a combination of blur cues and texture cues.The depth information is estimated by comparing the features of two image blocks selected within a single image.A preliminary experiment demonstrates that a convolutional neural network(CNN)model trained by deep learning with a set of relatively ideal images achieves good accuracy.The same CNN model is applied to distinguish near and far obstacles according to a specified threshold in the vehicle blind spot area,and the promising results are obtained.The proposed method uses a standard blind spot camera and can improve safety without other additional sensing devices.Thus,the proposed approach has the potential to be applied in vehicular applications for the detection of objects in the driver’s blind spot.
基金Supported by the National Natural Science Foundation of China (Grant No.60675017)National Basic Research Program of China (Grant No.2006CB303103)
文摘A shot presents a contiguous action recorded by an uninterrupted camera operation and frames within a shot keep spatio-temporal coherence. Segmenting a serial video stream file into meaningful shots is the first pass for the task of video analysis, content-based video understanding. In this paper, a novel scheme based on improved two-dimensional entropy is proposed to complete the partition of video shots. Firstly, shot transition candidates are detected using a two-pass algorithm: a coarse searching pass and a fine searching pass. Secondly, with the character of two-dimensional entropy of the image, correctly detected transition candidates are further classified into different transition types whereas those falsely detected shot breaks are distinguished and removed. Finally, the boundary of gradual transition can be precisely located by merging the characters of two-dimensional entropy of the image into the gradual transition. A large number of video sequences are used to test our system performance and promising results are obtained.
基金supported by the National Natural Science Foundation of China under Grant Nos.62272277,U23A20312,and 62072284the National Key Technology Research and Development Program of the Ministry of Science and Technology of China under Grant No.2022YFB3303200the Natural Science Foundation of Shandong Province of China under Grant No.ZR2020MF036.
文摘In the domain of point cloud registration,the coarse-to-fine feature matching paradigm has received significant attention due to its impressive performance.This paradigm involves a two-step process:first,the extraction of multilevel features,and subsequently,the propagation of correspondences from coarse to fine levels.However,this approach faces two notable limitations.Firstly,the use of the Dual Softmax operation may promote one-to-one correspondences between superpoints,inadvertently excluding valuable correspondences.Secondly,it is crucial to closely examine the overlapping areas between point clouds,as only correspondences within these regions decisively determine the actual transformation.Considering these issues,we propose OAAFormer to enhance correspondence quality.On the one hand,we introduce a soft matching mechanism to facilitate the propagation of potentially valuable correspondences from coarse to fine levels.On the other hand,we integrate an overlapping region detection module to minimize mismatches to the greatest extent possible.Furthermore,we introduce a region-wise attention module with linear complexity during the fine-level matching phase,designed to enhance the discriminative capabilities of the extracted features.Tests on the challenging 3DLoMatch benchmark demonstrate that our approach leads to a substantial increase of about 7%in the inlier ratio,as well as an enhancement of 2%-4%in registration recall.Finally,to accelerate the prediction process,we replace the Conventional Random Sample Consensus(RANSAC)algorithm with the selection of a limited yet representative set of high-confidence correspondences,resulting in a 100 times speedup while still maintaining comparable registration performance.