The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregula...The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregular and multi-scale nature of food images.Addressing these complexities,our study introduces an advanced model that leverages multiple attention mechanisms and multi-stage local fusion,grounded in the ConvNeXt architecture.Our model employs hybrid attention(HA)mechanisms to pinpoint critical discriminative regions within images,substantially mitigating the influence of background noise.Furthermore,it introduces a multi-stage local fusion(MSLF)module,fostering long-distance dependencies between feature maps at varying stages.This approach facilitates the assimilation of complementary features across scales,significantly bolstering the model’s capacity for feature extraction.Furthermore,we constructed a dataset named Roushi60,which consists of 60 different categories of common meat dishes.Empirical evaluation of the ETH Food-101,ChineseFoodNet,and Roushi60 datasets reveals that our model achieves recognition accuracies of 91.12%,82.86%,and 92.50%,respectively.These figures not only mark an improvement of 1.04%,3.42%,and 1.36%over the foundational ConvNeXt network but also surpass the performance of most contemporary food image recognition methods.Such advancements underscore the efficacy of our proposed model in navigating the intricate landscape of food image recognition,setting a new benchmark for the field.展开更多
The demand for image retrieval with text manipulation exists in many fields, such as e-commerce and Internet search. Deep metric learning methods are used by most researchers to calculate the similarity between the qu...The demand for image retrieval with text manipulation exists in many fields, such as e-commerce and Internet search. Deep metric learning methods are used by most researchers to calculate the similarity between the query and the candidate image by fusing the global feature of the query image and the text feature. However, the text usually corresponds to the local feature of the query image rather than the global feature. Therefore, in this paper, we propose a framework of image retrieval with text manipulation by local feature modification(LFM-IR) which can focus on the related image regions and attributes and perform modification. A spatial attention module and a channel attention module are designed to realize the semantic mapping between image and text. We achieve excellent performance on three benchmark datasets, namely Color-Shape-Size(CSS), Massachusetts Institute of Technology(MIT) States and Fashion200K(+8.3%, +0.7% and +4.6% in R@1).展开更多
The appearance of pedestrians can vary greatly from image to image,and different pedestrians may look similar in a given image.Such similarities and variabilities in the appearance and clothing of individuals make the...The appearance of pedestrians can vary greatly from image to image,and different pedestrians may look similar in a given image.Such similarities and variabilities in the appearance and clothing of individuals make the task of pedestrian re-identification very challenging.Here,a pedestrian re-identification method based on the fusion of local features and gait energy image(GEI)features is proposed.In this method,the human body is divided into four regions according to joint points.The color and texture of each region of the human body are extracted as local features,and GEI features of the pedestrian gait are also obtained.These features are then fused with the local and GEI features of the person.Independent distance measure learning using the cross-view quadratic discriminant analysis(XQDA)method is used to obtain the similarity of the metric function of the image pairs,and the final similarity is acquired by weight matching.Evaluation of experimental results by cumulative matching characteristic(CMC)curves reveals that,after fusion of local and GEI features,the pedestrian re-identification effect is improved compared with existing methods and is notably better than the recognition rate of pedestrian re-identification with a single feature.展开更多
Obtaining a 3D feature description with high descriptiveness and robustness under complicated nuisances is a significant and challenging task in 3D feature matching.This paper proposes a novel feature description cons...Obtaining a 3D feature description with high descriptiveness and robustness under complicated nuisances is a significant and challenging task in 3D feature matching.This paper proposes a novel feature description consisting of a stable local reference frame(LRF)and a feature descriptor based on local spatial voxels.First,an improved LRF was designed by incorporating distance weights into Z-and X-axis calculations.Subsequently,based on the LRF and voxel segmentation,a feature descriptor based on voxel homogenization was proposed.Moreover,uniform segmentation of cube voxels was performed,considering the eigenvalues of each voxel and its neighboring voxels,thereby enhancing the stability of the description.The performance of the descriptor was strictly tested and evaluated on three public datasets,which exhibited high descriptiveness,robustness,and superior performance compared with other current methods.Furthermore,the descriptor was applied to a 3D registration trial,and the results demonstrated the reliability of our approach.展开更多
To fully describe the structure information of the point cloud when the LIDAR-object distance is long,a joint global and local feature(JGLF)descriptor is constructed.Compared with five typical descriptors,the object r...To fully describe the structure information of the point cloud when the LIDAR-object distance is long,a joint global and local feature(JGLF)descriptor is constructed.Compared with five typical descriptors,the object recognition rate of JGLF is higher when the LIDAR-object distances change.Under the situation that airborne LIDAR is getting close to the object,the particle filtering(PF)algorithm is used as the tracking frame.Particle weight is updated by comparing the difference between JGLFs to track the object.It is verified that the proposed algorithm performs 13.95%more accurately and stably than the basic PF algorithm.展开更多
Object representation based on local features is a topical subject in the domain of image understanding and computer vision. We discuss the defects of global features in present methods and the advantages of local fea...Object representation based on local features is a topical subject in the domain of image understanding and computer vision. We discuss the defects of global features in present methods and the advantages of local features in object recognition, and briefly explore state-of-the-art recognition methods using local features, especially the main approaches of local feature extraction and object representation. To clearly explain these methods, the problem of local feature extraction is divided into feature region detection, feature region description, and feature space optimization. The main components and merits of these steps are presented. Technologies for object presentation are classified into three types: vector space, sliding window, and structure relationship models. Future development trends are discussed briefly.展开更多
3D model retrieval virtual reality applications. In can benefit many downstream this paper, we propose a new sketch-based 3D model retrieval framework by coupling local features and manifold ranking. At technical fron...3D model retrieval virtual reality applications. In can benefit many downstream this paper, we propose a new sketch-based 3D model retrieval framework by coupling local features and manifold ranking. At technical fronts, we exploit spatial pyramids based local structures to facilitate the efficient construction of feature descriptors. Meanwhile, we propose an improved manifold ranking method, wherein all the categories between arbitrary model pairs will be taken into account. Since the smooth and detail-preserving line drawings of 3D model are important for sketch-based 3D model retrieval, the Difference of Gaussians (DOG) method is employed to extract the line drawings over the projected depth images of 3D model, and Bezier Curve is then adopted to further optimize the extracted line drawing. On that basis, we develop a 3D model retrieval engine to verify our method. We have conducted extensive experiments over various public benchmarks, and have made comprehensive comparisons with some state-of-the-art 3D retrieval methods. All the evaluation results based on the widely-used indicators prove the superiority of our method in accuracy, reliability, robustness, and versatility.展开更多
Background The classification of Alzheimer's disease (AD) from magnetic resonance imaging (MRI) has been challenged by lack of effective and reliable biomarkers due to inter-subject variability. This article pres...Background The classification of Alzheimer's disease (AD) from magnetic resonance imaging (MRI) has been challenged by lack of effective and reliable biomarkers due to inter-subject variability. This article presents a classification method for AD based on kernel density estimation (KDE) of local features. Methods First, a large number of local features were extracted from stable image blobs to represent various anatomical patterns for potential effective biomarkers. Based on distinctive descriptors and locations, the local features were robustly clustered to identify correspondences of the same underlying patterns. Then, the KDE was used to estimate distribution parameters of the correspondences by weighting contributions according to their distances. Thus, biomarkers could be reliably quantified by reducing the effects of further away correspondences which were more likely noises from inter-subject variability. Finally, the Bayes classifier was applied on the distribution parameters for the classification of AD. Results Experiments were performed on different divisions of a publicly available database to investigate the accuracy and the effects of age and AD severity. Our method achieved an equal error classification rate of 0.85 for subject aged 60-80 years exhibiting mild AD and outperformed a recent local feature-based work regardless of both effects. Conclusions We proposed a volumetric brain MRI classification method for neurodegenerative disease based on statistics of local features using KDE. The method may be potentially useful for the computer-aided diagnosis in clinical settings.展开更多
Simultaneous Localization and Mapping(SLAM)has been widely used in emergency response,self-driving and city-scale 3D mapping and navigation.Recent deep-learning based feature point extractors have demonstrated superio...Simultaneous Localization and Mapping(SLAM)has been widely used in emergency response,self-driving and city-scale 3D mapping and navigation.Recent deep-learning based feature point extractors have demonstrated superior performance in dealing with the complex environmental challenges(e.g.extreme lighting)while the traditional extractors are struggling.In this paper,we have successfully improved the robustness and accuracy of a monocular visual SLAM system under various complex scenes by adding a deep learning based visual localization thread as an augmentation to the visual SLAM framework.In this thread,our feature extractor with an efficient lightweight deep neural network is used for absolute pose and scale estimation in real time using the highly accurate georeferenced prior map database at 20cm geometric accuracy created by our in-house and low-cost LiDAR and camera integrated device.The closed-loop error provided by our SLAM system with and without this enhancement is 1.03m and 18.28m respectively.The scale estimation of the monocular visual SLAM is also significantly improved(0.01 versus 0.98).In addition,a novel camera-LiDAR calibration workflow is also provided for large-scale 3D mapping.This paper demonstrates the application and research potential of deep-learning based vision SLAM with image and LiDAR sensors.展开更多
This paper proposes a novel robust image watermarking scheme for digital images using local invariant features and Independent Component Analysis (ICA). Most present watermarking algorithms are unable to resist geom...This paper proposes a novel robust image watermarking scheme for digital images using local invariant features and Independent Component Analysis (ICA). Most present watermarking algorithms are unable to resist geometric distortions that desynchronize the location. The method we propose here is robust to geometric attacks. In order to resist geometric distortions, we use a local invariant feature of the image called the scale invariant feature transform, which is invariant to translation and scaling distortions. The watermark is inserted into the circular patches generated by scale-invariant key point extractor. Rotation invariance is achieved using the translation property of the polar-mapped circular patches. Our method belongs to the blind watermark category, because we use Independent Component Analysis for detection that does not need the original image during detection. Experimental results show that our method is robust against geometric distortion attacks as well as signal-processing attacks.展开更多
In this paper, we present a tire defect detection algorithm based on sparse representation. The dictionary learned from reference images can efficiently represent the test image. As the representation coefficients of ...In this paper, we present a tire defect detection algorithm based on sparse representation. The dictionary learned from reference images can efficiently represent the test image. As the representation coefficients of normal images have a specific distribution, the local feature can be estimate by comparing representation coefficient distribution. Meanwhile, a coding length is used to measure the global features of representation coefficients. The tire defect is located by both these local and global features. Experimental results demonstrate that the proposed method can accurately detect and locate the tire defects.展开更多
Local invariant algorithm applied in downward-looking image registration,usually computes the camera's pose relative to visual landmarks.Generally,there are three requirements in the process of image registration whe...Local invariant algorithm applied in downward-looking image registration,usually computes the camera's pose relative to visual landmarks.Generally,there are three requirements in the process of image registration when using these approaches.First,the algorithm is apt to be influenced by illumination.Second,algorithm should have less computational complexity.Third,the depth information of images needs to be estimated without other sensors.This paper investigates a famous local invariant feature named speeded up robust feature(SURF),and proposes a highspeed and robust image registration and localization algorithm based on it.With supports from feature tracking and pose estimation methods,the proposed algorithm can compute camera poses under different conditions of scale,viewpoint and rotation so as to precisely localize object's position.At last,the study makes registration experiment by scale invariant feature transform(SIFT),SURF and the proposed algorithm,and designs a method to evaluate their performances.Furthermore,this study makes object retrieval test on remote sensing video.For there is big deformation on remote sensing frames,the registration algorithm absorbs the Kanade-Lucas-Tomasi(KLT) 3-D coplanar calibration feature tracker methods,which can localize interesting targets precisely and efficiently.The experimental results prove that the proposed method has a higher localization speed and lower localization error rate than traditional visual simultaneous localization and mapping(vSLAM) in a period of time.展开更多
In this paper, we propose a product image retrieval method based on the object contour corners, image texture and color. The product image mainly highlights the object and its background is very simple. According to t...In this paper, we propose a product image retrieval method based on the object contour corners, image texture and color. The product image mainly highlights the object and its background is very simple. According to these characteristics, we represent the object using its contour, and detect the corners of contour to reduce the number of pixels. Every corner is described using its approximate curvature based on distance. In addition, the Block Difference of Inverse Probabilities (BDIP) and Block Variation of Local Correlation (BVLC) texture features and color moment are extracted from image's HIS color space. Finally, dynamic time warping method is used to match features with different length. In order to demonstrate the effect of the proposed method, we carry out experiments in Mi-crosoft product image database, and compare it with other feature descriptors. The retrieval precision and recall curves show that our method is feasible.展开更多
Person re-identification has emerged as a hotspot for computer vision research due to the growing demands of social public safety requirements and the quick development of intelligent surveillance networks.Person re-i...Person re-identification has emerged as a hotspot for computer vision research due to the growing demands of social public safety requirements and the quick development of intelligent surveillance networks.Person re-identification(Re-ID)in video surveillance system can track and identify suspicious people,track and statistically analyze persons.The purpose of person re-identification is to recognize the same person in different cameras.Deep learning-based person re-identification research has produced numerous remarkable outcomes as a result of deep learning's growing popularity.The purpose of this paperis to help researchers better understand where person re-identification research is at the moment and where it is headed.Firstly,this paper arranges the widely used datasets and assessment criteria in person re-identification and reviews the pertinent research on deep learning-based person re-identification techniques conducted in the last several years.Then,the commonly used method techniques are also discussed from four aspects:appearance features,metric learning,local features,and adversarial learning.Finally,future research directions in the field of person re-identification are outlooked.展开更多
In pedestrian re-recognition,the traditional pedestrian re-recognition method will be affected by the changes of background,veil,clothing and so on,which will make the recognition effect decline.In order to reduce the...In pedestrian re-recognition,the traditional pedestrian re-recognition method will be affected by the changes of background,veil,clothing and so on,which will make the recognition effect decline.In order to reduce the impact of background,veil,clothing and other changes on the recognition effect,this paper proposes a pedestrian re-recognition method based on the cycle-consistent generative adversarial network and multifeature fusion.By comparing the measured distance between two pedestrians,pedestrian re-recognition is accomplished.Firstly,this paper uses Cycle GAN to transform and expand the data set,so as to reduce the influence of pedestrian posture changes as much as possible.The method consists of two branches:global feature extraction and local feature extraction.Then the global feature and local feature are fused.The fused features are used for comparison measurement learning,and the similarity scores are calculated to sort the samples.A large number of experimental results on large data sets CUHK03 and VIPER show that this new method reduces the influence of background,veil,clothing and other changes on the recognition effect.展开更多
Target detection of small samples with a complex background is always difficult in the classification of remote sensing images.We propose a new small sample target detection method combining local features and a convo...Target detection of small samples with a complex background is always difficult in the classification of remote sensing images.We propose a new small sample target detection method combining local features and a convolutional neural network(LF-CNN)with the aim of detecting small numbers of unevenly distributed ground object targets in remote sensing images.The k-nearest neighbor method is used to construct the local neighborhood of each point and the local neighborhoods of the features are extracted one by one from the convolution layer.All the local features are aggregated by maximum pooling to obtain global feature representation.The classification probability of each category is then calculated and classified using the scaled expected linear units function and the full connection layer.The experimental results show that the proposed LF-CNN method has a high accuracy of target detection and classification for hyperspectral imager remote sensing data under the condition of small samples.Despite drawbacks in both time and complexity,the proposed LF-CNN method can more effectively integrate the local features of ground object samples and improve the accuracy of target identification and detection in small samples of remote sensing images than traditional target detection methods.展开更多
A two-stage object recognition algorithm with the presence of occlusion is presented for microassembly. Coarse localization determines whether template is in image or not and approximately where it is, and fine locali...A two-stage object recognition algorithm with the presence of occlusion is presented for microassembly. Coarse localization determines whether template is in image or not and approximately where it is, and fine localization gives its accurate position. In coarse localization, local feature, which is invariant to translation, rotation and occlusion, is used to form signatures. By comparing signature of template with that of image, approximate transformation parameter from template to image is obtained, which is used as initial parameter value for fine localization. An objective function, which is a function of transformation parameter, is constructed in fine localization and minimized to realize sub-pixel localization accuracy. The occluded pixels are not taken into account in objective function, so the localization accuracy will not be influenced by the occlusion.展开更多
The matching of local descriptors represents at this moment a key tool in computer vision, with a wide variety of methods designed for tasks such as image classification, object recognition and tracking, image stitchi...The matching of local descriptors represents at this moment a key tool in computer vision, with a wide variety of methods designed for tasks such as image classification, object recognition and tracking, image stitching, or data mining relying on it. Local feature description techniques are usually developed so as to provide invariance to photometric variations specific to the acquisition of natural images, but are nonetheless used in association with biomedical imaging as well. It has been previously shown that the matching of gradient based descriptors is affected by image modifications specific to Confocal Scanning Laser Microscopy (CSLM). In this paper we extend our previous work in this direction and show how specific acquisition or post-processing methods alleviate or accentuate this problem.展开更多
A survey of the population densities of rice planthoppers is important for forecasting decisions and efficient control. Tra- ditional manual surveying of rice planthoppers is time-consuming, fatiguing, and subjective....A survey of the population densities of rice planthoppers is important for forecasting decisions and efficient control. Tra- ditional manual surveying of rice planthoppers is time-consuming, fatiguing, and subjective. A new three-layer detection method was proposed to detect and identify white-backed planthoppers (WBPHs, Sogatella furcifera (Horvath)) and their developmental stages using image processing. In the first two detection layers, we used an AdaBoost classifier that was trained on a histogram of oriented gradient (HOG) features and a support vector machine (SVM) classifier that was trained on Gabor and Local Binary Pattern (LBP) features to detect WBPHs and remove impurities. We achieved a detection rate of 85.6% and a false detection rate of 10.2%. In the third detection layer, a SVM classifier that was trained on the HOG features was used to identify the different developmental stages of the WBPHs, and we achieved an identification rate of 73.1%, a false identification rate of 23.3%, and a 5.6% false detection rate for the images without WBPHs. The proposed three-layer detection method is feasible and effective for the identification of different developmental stages of planthoppers on rice plants in paddy fields.展开更多
The quality of egg is mainly influenced by the dirt adhering to its shell.Even with good farm-management practices and careful handling,a small percentage of dirty eggs will be produced.The purpose of this research wa...The quality of egg is mainly influenced by the dirt adhering to its shell.Even with good farm-management practices and careful handling,a small percentage of dirty eggs will be produced.The purpose of this research was to detect the egg stains by using image processing technique.Compared to the color values,the local texture was found to be much more adept at accurately segmenting of the complex and miscellaneous dirt stains on the egg shell.Firstly,the global threshold of the image was obtained by two-peak method.The irrelevant background was removed by using the global threshold and the interested region was acquired.The local texture information extracted from the interested region was taken as the input of fuzzy C-means clustering for segmentation of the dirt stains.According to the principle of projection,the area of dirt stains on the curved egg surface was accurately calculated.The validation experimental results showed that the proposed method for classifying eggs in terms of stain has the specificity of 91.4%for white eggs and 89.5%for brown eggs.展开更多
基金The support of this research was by Hubei Provincial Natural Science Foundation(2022CFB449)Science Research Foundation of Education Department of Hubei Province(B2020061),are gratefully acknowledged.
文摘The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregular and multi-scale nature of food images.Addressing these complexities,our study introduces an advanced model that leverages multiple attention mechanisms and multi-stage local fusion,grounded in the ConvNeXt architecture.Our model employs hybrid attention(HA)mechanisms to pinpoint critical discriminative regions within images,substantially mitigating the influence of background noise.Furthermore,it introduces a multi-stage local fusion(MSLF)module,fostering long-distance dependencies between feature maps at varying stages.This approach facilitates the assimilation of complementary features across scales,significantly bolstering the model’s capacity for feature extraction.Furthermore,we constructed a dataset named Roushi60,which consists of 60 different categories of common meat dishes.Empirical evaluation of the ETH Food-101,ChineseFoodNet,and Roushi60 datasets reveals that our model achieves recognition accuracies of 91.12%,82.86%,and 92.50%,respectively.These figures not only mark an improvement of 1.04%,3.42%,and 1.36%over the foundational ConvNeXt network but also surpass the performance of most contemporary food image recognition methods.Such advancements underscore the efficacy of our proposed model in navigating the intricate landscape of food image recognition,setting a new benchmark for the field.
基金Foundation items:Shanghai Sailing Program,China (No. 21YF1401300)Shanghai Science and Technology Innovation Action Plan,China (No.19511101802)Fundamental Research Funds for the Central Universities,China (No.2232021D-25)。
文摘The demand for image retrieval with text manipulation exists in many fields, such as e-commerce and Internet search. Deep metric learning methods are used by most researchers to calculate the similarity between the query and the candidate image by fusing the global feature of the query image and the text feature. However, the text usually corresponds to the local feature of the query image rather than the global feature. Therefore, in this paper, we propose a framework of image retrieval with text manipulation by local feature modification(LFM-IR) which can focus on the related image regions and attributes and perform modification. A spatial attention module and a channel attention module are designed to realize the semantic mapping between image and text. We achieve excellent performance on three benchmark datasets, namely Color-Shape-Size(CSS), Massachusetts Institute of Technology(MIT) States and Fashion200K(+8.3%, +0.7% and +4.6% in R@1).
基金This research was funded by the Science and Technology Support Plan Project of Hebei Province(grant numbers 17210803D and 19273703D)the Science and Technology Spark Project of the Hebei Seismological Bureau(grant number DZ20180402056)+1 种基金the Education Department of Hebei Province(grant number QN2018095)the Polytechnic College of Hebei University of Science and Technology.
文摘The appearance of pedestrians can vary greatly from image to image,and different pedestrians may look similar in a given image.Such similarities and variabilities in the appearance and clothing of individuals make the task of pedestrian re-identification very challenging.Here,a pedestrian re-identification method based on the fusion of local features and gait energy image(GEI)features is proposed.In this method,the human body is divided into four regions according to joint points.The color and texture of each region of the human body are extracted as local features,and GEI features of the pedestrian gait are also obtained.These features are then fused with the local and GEI features of the person.Independent distance measure learning using the cross-view quadratic discriminant analysis(XQDA)method is used to obtain the similarity of the metric function of the image pairs,and the final similarity is acquired by weight matching.Evaluation of experimental results by cumulative matching characteristic(CMC)curves reveals that,after fusion of local and GEI features,the pedestrian re-identification effect is improved compared with existing methods and is notably better than the recognition rate of pedestrian re-identification with a single feature.
基金the National Natural Science Foundation of China,No.51705469the Zhengzhou University Youth Talent Enterprise Cooperative Innovation Team Support Program Project(2021,2022).
文摘Obtaining a 3D feature description with high descriptiveness and robustness under complicated nuisances is a significant and challenging task in 3D feature matching.This paper proposes a novel feature description consisting of a stable local reference frame(LRF)and a feature descriptor based on local spatial voxels.First,an improved LRF was designed by incorporating distance weights into Z-and X-axis calculations.Subsequently,based on the LRF and voxel segmentation,a feature descriptor based on voxel homogenization was proposed.Moreover,uniform segmentation of cube voxels was performed,considering the eigenvalues of each voxel and its neighboring voxels,thereby enhancing the stability of the description.The performance of the descriptor was strictly tested and evaluated on three public datasets,which exhibited high descriptiveness,robustness,and superior performance compared with other current methods.Furthermore,the descriptor was applied to a 3D registration trial,and the results demonstrated the reliability of our approach.
基金This work was supported by the National Natural Science Foundation of China(Nos.61271353 and 61871389)Foundation of State Key Laboratory of Pulsed Power Laser Technology(No.SKL2018ZR09)Major Funding Projects of National University of Defense Technology(No.ZK18-01-02).
文摘To fully describe the structure information of the point cloud when the LIDAR-object distance is long,a joint global and local feature(JGLF)descriptor is constructed.Compared with five typical descriptors,the object recognition rate of JGLF is higher when the LIDAR-object distances change.Under the situation that airborne LIDAR is getting close to the object,the particle filtering(PF)algorithm is used as the tracking frame.Particle weight is updated by comparing the difference between JGLFs to track the object.It is verified that the proposed algorithm performs 13.95%more accurately and stably than the basic PF algorithm.
基金supported by the National Basic Research Program (973) of China (No. 2012CB821206)the National Natural Science Foundation of China (No. 71201004)+1 种基金the Scientific Research Common Program of Beijing Municipal Commission of Education (No. KM201310011009)the Funding Project for Innovation on Science, Technology and Graduate Education in Institutions of Higher Learning under the Jurisdiction of Beijing Municipality (Nos. PXM2012_014213_000037 and PXM2012_014213_000079)
文摘Object representation based on local features is a topical subject in the domain of image understanding and computer vision. We discuss the defects of global features in present methods and the advantages of local features in object recognition, and briefly explore state-of-the-art recognition methods using local features, especially the main approaches of local feature extraction and object representation. To clearly explain these methods, the problem of local feature extraction is divided into feature region detection, feature region description, and feature space optimization. The main components and merits of these steps are presented. Technologies for object presentation are classified into three types: vector space, sliding window, and structure relationship models. Future development trends are discussed briefly.
基金The authors would like to thank Zhang Dongdong for his great help in experiments. This work was supported by the National Natural Science Foundation of China (Grant No. 61602324), the Scientific Research Project of Beijing Educational Committeen (KM201710028018), the open funding project of State Key Laboratory of Virtual Reality Technology and Systems, Beihang University (BUAA-VR-17KF-12) and Beijing Advanced Innovation Center for Imaging Technology (BAlCIT-2016004).
文摘3D model retrieval virtual reality applications. In can benefit many downstream this paper, we propose a new sketch-based 3D model retrieval framework by coupling local features and manifold ranking. At technical fronts, we exploit spatial pyramids based local structures to facilitate the efficient construction of feature descriptors. Meanwhile, we propose an improved manifold ranking method, wherein all the categories between arbitrary model pairs will be taken into account. Since the smooth and detail-preserving line drawings of 3D model are important for sketch-based 3D model retrieval, the Difference of Gaussians (DOG) method is employed to extract the line drawings over the projected depth images of 3D model, and Bezier Curve is then adopted to further optimize the extracted line drawing. On that basis, we develop a 3D model retrieval engine to verify our method. We have conducted extensive experiments over various public benchmarks, and have made comprehensive comparisons with some state-of-the-art 3D retrieval methods. All the evaluation results based on the widely-used indicators prove the superiority of our method in accuracy, reliability, robustness, and versatility.
基金grants from Fundamental Research Funds for the Central University,National Natural Science Foundation of China,Beijing Nova Program,National Science and Technology Major Project of China,Beijing Natural Science Foundation,Major Project of National Social Science Foundation
文摘Background The classification of Alzheimer's disease (AD) from magnetic resonance imaging (MRI) has been challenged by lack of effective and reliable biomarkers due to inter-subject variability. This article presents a classification method for AD based on kernel density estimation (KDE) of local features. Methods First, a large number of local features were extracted from stable image blobs to represent various anatomical patterns for potential effective biomarkers. Based on distinctive descriptors and locations, the local features were robustly clustered to identify correspondences of the same underlying patterns. Then, the KDE was used to estimate distribution parameters of the correspondences by weighting contributions according to their distances. Thus, biomarkers could be reliably quantified by reducing the effects of further away correspondences which were more likely noises from inter-subject variability. Finally, the Bayes classifier was applied on the distribution parameters for the classification of AD. Results Experiments were performed on different divisions of a publicly available database to investigate the accuracy and the effects of age and AD severity. Our method achieved an equal error classification rate of 0.85 for subject aged 60-80 years exhibiting mild AD and outperformed a recent local feature-based work regardless of both effects. Conclusions We proposed a volumetric brain MRI classification method for neurodegenerative disease based on statistics of local features using KDE. The method may be potentially useful for the computer-aided diagnosis in clinical settings.
基金supported by the National Key Research and Development Program of China under[Grant number 2019YFC1511304]supported by the Pilot Fund of Frontier Science and Disruptive Technology of Aerospace Information Research Institute,Chinese Academy of Sciences under[Grant number E0Z21101].
文摘Simultaneous Localization and Mapping(SLAM)has been widely used in emergency response,self-driving and city-scale 3D mapping and navigation.Recent deep-learning based feature point extractors have demonstrated superior performance in dealing with the complex environmental challenges(e.g.extreme lighting)while the traditional extractors are struggling.In this paper,we have successfully improved the robustness and accuracy of a monocular visual SLAM system under various complex scenes by adding a deep learning based visual localization thread as an augmentation to the visual SLAM framework.In this thread,our feature extractor with an efficient lightweight deep neural network is used for absolute pose and scale estimation in real time using the highly accurate georeferenced prior map database at 20cm geometric accuracy created by our in-house and low-cost LiDAR and camera integrated device.The closed-loop error provided by our SLAM system with and without this enhancement is 1.03m and 18.28m respectively.The scale estimation of the monocular visual SLAM is also significantly improved(0.01 versus 0.98).In addition,a novel camera-LiDAR calibration workflow is also provided for large-scale 3D mapping.This paper demonstrates the application and research potential of deep-learning based vision SLAM with image and LiDAR sensors.
基金Supported by the National Natural Science Foun-dation of China (60373062 ,60573045)
文摘This paper proposes a novel robust image watermarking scheme for digital images using local invariant features and Independent Component Analysis (ICA). Most present watermarking algorithms are unable to resist geometric distortions that desynchronize the location. The method we propose here is robust to geometric attacks. In order to resist geometric distortions, we use a local invariant feature of the image called the scale invariant feature transform, which is invariant to translation and scaling distortions. The watermark is inserted into the circular patches generated by scale-invariant key point extractor. Rotation invariance is achieved using the translation property of the polar-mapped circular patches. Our method belongs to the blind watermark category, because we use Independent Component Analysis for detection that does not need the original image during detection. Experimental results show that our method is robust against geometric distortion attacks as well as signal-processing attacks.
基金Supported by Project of Shandong Province Higher Educational Science and Technology Program(No.J11LG77)
文摘In this paper, we present a tire defect detection algorithm based on sparse representation. The dictionary learned from reference images can efficiently represent the test image. As the representation coefficients of normal images have a specific distribution, the local feature can be estimate by comparing representation coefficient distribution. Meanwhile, a coding length is used to measure the global features of representation coefficients. The tire defect is located by both these local and global features. Experimental results demonstrate that the proposed method can accurately detect and locate the tire defects.
基金supported by the National Natural Science Foundation of China (60802043)the National Basic Research Program of China(973 Program) (2010CB327900)
文摘Local invariant algorithm applied in downward-looking image registration,usually computes the camera's pose relative to visual landmarks.Generally,there are three requirements in the process of image registration when using these approaches.First,the algorithm is apt to be influenced by illumination.Second,algorithm should have less computational complexity.Third,the depth information of images needs to be estimated without other sensors.This paper investigates a famous local invariant feature named speeded up robust feature(SURF),and proposes a highspeed and robust image registration and localization algorithm based on it.With supports from feature tracking and pose estimation methods,the proposed algorithm can compute camera poses under different conditions of scale,viewpoint and rotation so as to precisely localize object's position.At last,the study makes registration experiment by scale invariant feature transform(SIFT),SURF and the proposed algorithm,and designs a method to evaluate their performances.Furthermore,this study makes object retrieval test on remote sensing video.For there is big deformation on remote sensing frames,the registration algorithm absorbs the Kanade-Lucas-Tomasi(KLT) 3-D coplanar calibration feature tracker methods,which can localize interesting targets precisely and efficiently.The experimental results prove that the proposed method has a higher localization speed and lower localization error rate than traditional visual simultaneous localization and mapping(vSLAM) in a period of time.
基金Supported by the Major Program of National Natural Science Foundation of China (No. 70890080 and No. 70890083)
文摘In this paper, we propose a product image retrieval method based on the object contour corners, image texture and color. The product image mainly highlights the object and its background is very simple. According to these characteristics, we represent the object using its contour, and detect the corners of contour to reduce the number of pixels. Every corner is described using its approximate curvature based on distance. In addition, the Block Difference of Inverse Probabilities (BDIP) and Block Variation of Local Correlation (BVLC) texture features and color moment are extracted from image's HIS color space. Finally, dynamic time warping method is used to match features with different length. In order to demonstrate the effect of the proposed method, we carry out experiments in Mi-crosoft product image database, and compare it with other feature descriptors. The retrieval precision and recall curves show that our method is feasible.
文摘Person re-identification has emerged as a hotspot for computer vision research due to the growing demands of social public safety requirements and the quick development of intelligent surveillance networks.Person re-identification(Re-ID)in video surveillance system can track and identify suspicious people,track and statistically analyze persons.The purpose of person re-identification is to recognize the same person in different cameras.Deep learning-based person re-identification research has produced numerous remarkable outcomes as a result of deep learning's growing popularity.The purpose of this paperis to help researchers better understand where person re-identification research is at the moment and where it is headed.Firstly,this paper arranges the widely used datasets and assessment criteria in person re-identification and reviews the pertinent research on deep learning-based person re-identification techniques conducted in the last several years.Then,the commonly used method techniques are also discussed from four aspects:appearance features,metric learning,local features,and adversarial learning.Finally,future research directions in the field of person re-identification are outlooked.
文摘In pedestrian re-recognition,the traditional pedestrian re-recognition method will be affected by the changes of background,veil,clothing and so on,which will make the recognition effect decline.In order to reduce the impact of background,veil,clothing and other changes on the recognition effect,this paper proposes a pedestrian re-recognition method based on the cycle-consistent generative adversarial network and multifeature fusion.By comparing the measured distance between two pedestrians,pedestrian re-recognition is accomplished.Firstly,this paper uses Cycle GAN to transform and expand the data set,so as to reduce the influence of pedestrian posture changes as much as possible.The method consists of two branches:global feature extraction and local feature extraction.Then the global feature and local feature are fused.The fused features are used for comparison measurement learning,and the similarity scores are calculated to sort the samples.A large number of experimental results on large data sets CUHK03 and VIPER show that this new method reduces the influence of background,veil,clothing and other changes on the recognition effect.
基金This work was partially supported by the Key Laboratory for Digital Land and Resources of Jiangxi Province,East China University of Technology(DLLJ202103)Science and Technology Commission Shanghai Municipality(No.19142201600)Graduate Innovation and Entrepreneurship Program in Shanghai University in China(No.2019GY04).
文摘Target detection of small samples with a complex background is always difficult in the classification of remote sensing images.We propose a new small sample target detection method combining local features and a convolutional neural network(LF-CNN)with the aim of detecting small numbers of unevenly distributed ground object targets in remote sensing images.The k-nearest neighbor method is used to construct the local neighborhood of each point and the local neighborhoods of the features are extracted one by one from the convolution layer.All the local features are aggregated by maximum pooling to obtain global feature representation.The classification probability of each category is then calculated and classified using the scaled expected linear units function and the full connection layer.The experimental results show that the proposed LF-CNN method has a high accuracy of target detection and classification for hyperspectral imager remote sensing data under the condition of small samples.Despite drawbacks in both time and complexity,the proposed LF-CNN method can more effectively integrate the local features of ground object samples and improve the accuracy of target identification and detection in small samples of remote sensing images than traditional target detection methods.
基金This project is supported by National Natural Science Foundation of China (No. 50275078)
文摘A two-stage object recognition algorithm with the presence of occlusion is presented for microassembly. Coarse localization determines whether template is in image or not and approximately where it is, and fine localization gives its accurate position. In coarse localization, local feature, which is invariant to translation, rotation and occlusion, is used to form signatures. By comparing signature of template with that of image, approximate transformation parameter from template to image is obtained, which is used as initial parameter value for fine localization. An objective function, which is a function of transformation parameter, is constructed in fine localization and minimized to realize sub-pixel localization accuracy. The occluded pixels are not taken into account in objective function, so the localization accuracy will not be influenced by the occlusion.
基金The UEFISCDIPN-II-PT-PCCA-2011-3.2-1162 Research Grant The CRUS SCIEX NMS-CH Fellowship nr. 12.135
文摘The matching of local descriptors represents at this moment a key tool in computer vision, with a wide variety of methods designed for tasks such as image classification, object recognition and tracking, image stitching, or data mining relying on it. Local feature description techniques are usually developed so as to provide invariance to photometric variations specific to the acquisition of natural images, but are nonetheless used in association with biomedical imaging as well. It has been previously shown that the matching of gradient based descriptors is affected by image modifications specific to Confocal Scanning Laser Microscopy (CSLM). In this paper we extend our previous work in this direction and show how specific acquisition or post-processing methods alleviate or accentuate this problem.
基金financially supported by the National High Technology Research and Development Program of China (863 Program, 2013AA102402)the 521 Talent Project of Zhejiang Sci-Tech University, Chinathe Key Research and Development Program of Zhejiang Province, China (2015C03023)
文摘A survey of the population densities of rice planthoppers is important for forecasting decisions and efficient control. Tra- ditional manual surveying of rice planthoppers is time-consuming, fatiguing, and subjective. A new three-layer detection method was proposed to detect and identify white-backed planthoppers (WBPHs, Sogatella furcifera (Horvath)) and their developmental stages using image processing. In the first two detection layers, we used an AdaBoost classifier that was trained on a histogram of oriented gradient (HOG) features and a support vector machine (SVM) classifier that was trained on Gabor and Local Binary Pattern (LBP) features to detect WBPHs and remove impurities. We achieved a detection rate of 85.6% and a false detection rate of 10.2%. In the third detection layer, a SVM classifier that was trained on the HOG features was used to identify the different developmental stages of the WBPHs, and we achieved an identification rate of 73.1%, a false identification rate of 23.3%, and a 5.6% false detection rate for the images without WBPHs. The proposed three-layer detection method is feasible and effective for the identification of different developmental stages of planthoppers on rice plants in paddy fields.
基金The authors gratefully acknowledge the financial support of the National Science&Technology Pillar Program(2015BAD19B05).
文摘The quality of egg is mainly influenced by the dirt adhering to its shell.Even with good farm-management practices and careful handling,a small percentage of dirty eggs will be produced.The purpose of this research was to detect the egg stains by using image processing technique.Compared to the color values,the local texture was found to be much more adept at accurately segmenting of the complex and miscellaneous dirt stains on the egg shell.Firstly,the global threshold of the image was obtained by two-peak method.The irrelevant background was removed by using the global threshold and the interested region was acquired.The local texture information extracted from the interested region was taken as the input of fuzzy C-means clustering for segmentation of the dirt stains.According to the principle of projection,the area of dirt stains on the curved egg surface was accurately calculated.The validation experimental results showed that the proposed method for classifying eggs in terms of stain has the specificity of 91.4%for white eggs and 89.5%for brown eggs.