Road traffic monitoring is an imperative topic widely discussed among researchers.Systems used to monitor traffic frequently rely on cameras mounted on bridges or roadsides.However,aerial images provide the flexibilit...Road traffic monitoring is an imperative topic widely discussed among researchers.Systems used to monitor traffic frequently rely on cameras mounted on bridges or roadsides.However,aerial images provide the flexibility to use mobile platforms to detect the location and motion of the vehicle over a larger area.To this end,different models have shown the ability to recognize and track vehicles.However,these methods are not mature enough to produce accurate results in complex road scenes.Therefore,this paper presents an algorithm that combines state-of-the-art techniques for identifying and tracking vehicles in conjunction with image bursts.The extracted frames were converted to grayscale,followed by the application of a georeferencing algorithm to embed coordinate information into the images.The masking technique eliminated irrelevant data and reduced the computational cost of the overall monitoring system.Next,Sobel edge detection combined with Canny edge detection and Hough line transform has been applied for noise reduction.After preprocessing,the blob detection algorithm helped detect the vehicles.Vehicles of varying sizes have been detected by implementing a dynamic thresholding scheme.Detection was done on the first image of every burst.Then,to track vehicles,the model of each vehicle was made to find its matches in the succeeding images using the template matching algorithm.To further improve the tracking accuracy by incorporating motion information,Scale Invariant Feature Transform(SIFT)features have been used to find the best possible match among multiple matches.An accuracy rate of 87%for detection and 80%accuracy for tracking in the A1 Motorway Netherland dataset has been achieved.For the Vehicle Aerial Imaging from Drone(VAID)dataset,an accuracy rate of 86%for detection and 78%accuracy for tracking has been achieved.展开更多
In minimally invasive surgery,endoscopes or laparoscopes equipped with miniature cameras and tools are used to enter the human body for therapeutic purposes through small incisions or natural cavities.However,in clini...In minimally invasive surgery,endoscopes or laparoscopes equipped with miniature cameras and tools are used to enter the human body for therapeutic purposes through small incisions or natural cavities.However,in clinical operating environments,endoscopic images often suffer from challenges such as low texture,uneven illumination,and non-rigid structures,which affect feature observation and extraction.This can severely impact surgical navigation or clinical diagnosis due to missing feature points in endoscopic images,leading to treatment and postoperative recovery issues for patients.To address these challenges,this paper introduces,for the first time,a Cross-Channel Multi-Modal Adaptive Spatial Feature Fusion(ASFF)module based on the lightweight architecture of EfficientViT.Additionally,a novel lightweight feature extraction and matching network based on attention mechanism is proposed.This network dynamically adjusts attention weights for cross-modal information from grayscale images and optical flow images through a dual-branch Siamese network.It extracts static and dynamic information features ranging from low-level to high-level,and from local to global,ensuring robust feature extraction across different widths,noise levels,and blur scenarios.Global and local matching are performed through a multi-level cascaded attention mechanism,with cross-channel attention introduced to simultaneously extract low-level and high-level features.Extensive ablation experiments and comparative studies are conducted on the HyperKvasir,EAD,M2caiSeg,CVC-ClinicDB,and UCL synthetic datasets.Experimental results demonstrate that the proposed network improves upon the baseline EfficientViT-B3 model by 75.4%in accuracy(Acc),while also enhancing runtime performance and storage efficiency.When compared with the complex DenseDescriptor feature extraction network,the difference in Acc is less than 7.22%,and IoU calculation results on specific datasets outperform complex dense models.Furthermore,this method increases the F1 score by 33.2%and accelerates runtime by 70.2%.It is noteworthy that the speed of CMMCAN surpasses that of comparative lightweight models,with feature extraction and matching performance comparable to existing complex models but with faster speed and higher cost-effectiveness.展开更多
Augmented solar images were used to research the adaptability of four representative image extraction and matching algorithms in space weather domain.These include the scale-invariant feature transform algorithm,speed...Augmented solar images were used to research the adaptability of four representative image extraction and matching algorithms in space weather domain.These include the scale-invariant feature transform algorithm,speeded-up robust features algorithm,binary robust invariant scalable keypoints algorithm,and oriented fast and rotated brief algorithm.The performance of these algorithms was estimated in terms of matching accuracy,feature point richness,and running time.The experiment result showed that no algorithm achieved high accuracy while keeping low running time,and all algorithms are not suitable for image feature extraction and matching of augmented solar images.To solve this problem,an improved method was proposed by using two-frame matching to utilize the accuracy advantage of the scale-invariant feature transform algorithm and the speed advantage of the oriented fast and rotated brief algorithm.Furthermore,our method and the four representative algorithms were applied to augmented solar images.Our application experiments proved that our method achieved a similar high recognition rate to the scale-invariant feature transform algorithm which is significantly higher than other algorithms.Our method also obtained a similar low running time to the oriented fast and rotated brief algorithm,which is significantly lower than other algorithms.展开更多
Background Image matching is crucial in numerous computer vision tasks such as 3D reconstruction and simultaneous visual localization and mapping.The accuracy of the matching significantly impacted subsequent studies....Background Image matching is crucial in numerous computer vision tasks such as 3D reconstruction and simultaneous visual localization and mapping.The accuracy of the matching significantly impacted subsequent studies.Because of their local similarity,when image pairs contain comparable patterns but feature pairs are positioned differently,incorrect recognition can occur as global motion consistency is disregarded.Methods This study proposes an image-matching filtering algorithm based on global motion consistency.It can be used as a subsequent matching filter for the initial matching results generated by other matching algorithms based on the principle of motion smoothness.A particular matching algorithm can first be used to perform the initial matching;then,the rotation and movement information of the global feature vectors are combined to effectively identify outlier matches.The principle is that if the matching result is accurate,the feature vectors formed by any matched point should have similar rotation angles and moving distances.Thus,global motion direction and global motion distance consistencies were used to reject outliers caused by similar patterns in different locations.Results Four datasets were used to test the effectiveness of the proposed method.Three datasets with similar patterns in different locations were used to test the results for similar images that could easily be incorrectly matched by other algorithms,and one commonly used dataset was used to test the results for the general image-matching problem.The experimental results suggest that the proposed method is more accurate than other state-of-the-art algorithms in identifying mismatches in the initial matching set.Conclusions The proposed outlier rejection matching method can significantly improve the matching accuracy for similar images with locally similar feature pairs in different locations and can provide more accurate matching results for subsequent computer vision tasks.展开更多
Based on the inertial navigation system, the influences of the excursion of the inertial navigation system and the measurement error of the wireless pressure altimeter on the rotation and scale of the real image are q...Based on the inertial navigation system, the influences of the excursion of the inertial navigation system and the measurement error of the wireless pressure altimeter on the rotation and scale of the real image are quantitatively analyzed in scene matching. The log-polar transform (LPT) is utilized and an anti-rotation and anti- scale image matching algorithm is proposed based on the image edge feature point extraction. In the algorithm, the center point is combined with its four-neighbor points, and the corresponding computing process is put forward. Simulation results show that in the image rotation and scale variation range resulted from the navigation system error and the measurement error of the wireless pressure altimeter, the proposed image matching algo- rithm can satisfy the accuracy demands of the scene aided navigation system and provide the location error-correcting information of the system.展开更多
Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural net...Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural network models and semantic matching techniques.Experiments conducted on the Flickr8k and AraImg2k benchmark datasets,featuring images and descriptions in English and Arabic,showcase remarkable performance improvements over state-of-the-art methods.Our model,equipped with the Image&Cross-Language Semantic Matching module and the Target Language Domain Evaluation module,significantly enhances the semantic relevance of generated image descriptions.For English-to-Arabic and Arabic-to-English cross-language image descriptions,our approach achieves a CIDEr score for English and Arabic of 87.9%and 81.7%,respectively,emphasizing the substantial contributions of our methodology.Comparative analyses with previous works further affirm the superior performance of our approach,and visual results underscore that our model generates image captions that are both semantically accurate and stylistically consistent with the target language.In summary,this study advances the field of cross-lingual image description,offering an effective solution for generating image captions across languages,with the potential to impact multilingual communication and accessibility.Future research directions include expanding to more languages and incorporating diverse visual and textual data sources.展开更多
Medical imaging plays a key role within modern hospital management systems for diagnostic purposes.Compression methodologies are extensively employed to mitigate storage demands and enhance transmission speed,all whil...Medical imaging plays a key role within modern hospital management systems for diagnostic purposes.Compression methodologies are extensively employed to mitigate storage demands and enhance transmission speed,all while upholding image quality.Moreover,an increasing number of hospitals are embracing cloud computing for patient data storage,necessitating meticulous scrutiny of server security and privacy protocols.Nevertheless,considering the widespread availability of multimedia tools,the preservation of digital data integrity surpasses the significance of compression alone.In response to this concern,we propose a secure storage and transmission solution for compressed medical image sequences,such as ultrasound images,utilizing a motion vector watermarking scheme.The watermark is generated employing an error-correcting code known as Bose-Chaudhuri-Hocquenghem(BCH)and is subsequently embedded into the compressed sequence via block-based motion vectors.In the process of watermark embedding,motion vectors are selected based on their magnitude and phase angle.When embedding watermarks,no specific spatial area,such as a region of interest(ROI),is used in the images.The embedding of watermark bits is dependent on motion vectors.Although reversible watermarking allows the restoration of the original image sequences,we use the irreversible watermarking method.The reason for this is that the use of reversible watermarks may impede the claims of ownership and legal rights.The restoration of original data or images may call into question ownership or other legal claims.The peak signal-to-noise ratio(PSNR)and structural similarity index(SSIM)serve as metrics for evaluating the watermarked image quality.Across all images,the PSNR value exceeds 46 dB,and the SSIM value exceeds 0.92.Experimental results substantiate the efficacy of the proposed technique in preserving data integrity.展开更多
The task of indoor visual localization, utilizing camera visual information for user pose calculation, was a core component of Augmented Reality (AR) and Simultaneous Localization and Mapping (SLAM). Existing indoor l...The task of indoor visual localization, utilizing camera visual information for user pose calculation, was a core component of Augmented Reality (AR) and Simultaneous Localization and Mapping (SLAM). Existing indoor localization technologies generally used scene-specific 3D representations or were trained on specific datasets, making it challenging to balance accuracy and cost when applied to new scenes. Addressing this issue, this paper proposed a universal indoor visual localization method based on efficient image retrieval. Initially, a Multi-Layer Perceptron (MLP) was employed to aggregate features from intermediate layers of a convolutional neural network, obtaining a global representation of the image. This approach ensured accurate and rapid retrieval of reference images. Subsequently, a new mechanism using Random Sample Consensus (RANSAC) was designed to resolve relative pose ambiguity caused by the essential matrix decomposition based on the five-point method. Finally, the absolute pose of the queried user image was computed, thereby achieving indoor user pose estimation. The proposed indoor localization method was characterized by its simplicity, flexibility, and excellent cross-scene generalization. Experimental results demonstrated a positioning error of 0.09 m and 2.14° on the 7Scenes dataset, and 0.15 m and 6.37° on the 12Scenes dataset. These results convincingly illustrated the outstanding performance of the proposed indoor localization method.展开更多
Fusion methods based on multi-scale transforms have become the mainstream of the pixel-level image fusion. However,most of these methods cannot fully exploit spatial domain information of source images, which lead to ...Fusion methods based on multi-scale transforms have become the mainstream of the pixel-level image fusion. However,most of these methods cannot fully exploit spatial domain information of source images, which lead to the degradation of image.This paper presents a fusion framework based on block-matching and 3D(BM3D) multi-scale transform. The algorithm first divides the image into different blocks and groups these 2D image blocks into 3D arrays by their similarity. Then it uses a 3D transform which consists of a 2D multi-scale and a 1D transform to transfer the arrays into transform coefficients, and then the obtained low-and high-coefficients are fused by different fusion rules. The final fused image is obtained from a series of fused 3D image block groups after the inverse transform by using an aggregation process. In the experimental part, we comparatively analyze some existing algorithms and the using of different transforms, e.g. non-subsampled Contourlet transform(NSCT), non-subsampled Shearlet transform(NSST), in the 3D transform step. Experimental results show that the proposed fusion framework can not only improve subjective visual effect, but also obtain better objective evaluation criteria than state-of-the-art methods.展开更多
A novel algorithm is presented to make the results of image matching more reliable and accurate based on SIFT (Scale Invariant Feature Transform). SIFT algorithm has been identified as the most resistant matching algo...A novel algorithm is presented to make the results of image matching more reliable and accurate based on SIFT (Scale Invariant Feature Transform). SIFT algorithm has been identified as the most resistant matching algorithm to common image deformations; however, if there are similar regions in images, SIFT algorithm still generates some analogical descriptors and provides many mismatches. This paper examines the local image descriptor used by SIFT and presents a new algorithm by integrating SIFT with two-dimensional moment invariants and disparity gradient to improve the matching results. In the new algorithm, decision tree is used, and the whole matching process is divided into three levels with different primitives. Matching points are considered as correct ones only when they satisfy all the three similarity measurements. Experiment results demonstrate that the new approach is more reliable and accurate.展开更多
Image matching technology is theoretically significant and practically promising in the field of autonomous navigation.Addressing shortcomings of existing image matching navigation technologies,the concept of high-dim...Image matching technology is theoretically significant and practically promising in the field of autonomous navigation.Addressing shortcomings of existing image matching navigation technologies,the concept of high-dimensional combined feature is presented based on sequence image matching navigation.To balance between the distribution of high-dimensional combined features and the shortcomings of the only use of geometric relations,we propose a method based on Delaunay triangulation to improve the feature,and add the regional characteristics of the features together with their geometric characteristics.Finally,k-nearest neighbor(KNN)algorithm is adopted to optimize searching process.Simulation results show that the matching can be realized at the rotation angle of-8°to 8°and the scale factor of 0.9 to 1.1,and when the image size is 160 pixel×160 pixel,the matching time is less than 0.5 s.Therefore,the proposed algorithm can substantially reduce computational complexity,improve the matching speed,and exhibit robustness to the rotation and scale changes.展开更多
In view of the fact that the traditional Hausdorff image matching algorithm is very sensitive to the image size as well as the unsatisfactory real-time performance in practical applications,an image matching algorithm...In view of the fact that the traditional Hausdorff image matching algorithm is very sensitive to the image size as well as the unsatisfactory real-time performance in practical applications,an image matching algorithm is proposed based on the combination of Yolov3.Firstly,the features of the reference image are selected for pretraining,and then the training results are used to extract the features of the real images before the coordinates of the center points of the feature area are used to complete the coarse matching.Finally,the Hausdorff algorithm is used to complete the fine image matching.Experiments show that the proposed algorithm significantly improves the speed and accuracy of image matching.Also,it is robust to rotation changes.展开更多
Histogram of collinear gradient-enhanced coding (HCGEC), a robust key point descriptor for multi-spectral image matching, is proposed. The HCGEC mainly encodes rough structures within an image and suppresses detaile...Histogram of collinear gradient-enhanced coding (HCGEC), a robust key point descriptor for multi-spectral image matching, is proposed. The HCGEC mainly encodes rough structures within an image and suppresses detailed textural information, which is desirable in multi-spectral image matching. Experiments on two multi-spectral data sets demonstrate that the proposed descriptor can yield significantly better results than some state-of- the-art descriptors.展开更多
For the purpose of identifying the stern of the SWATH (Small Waterplane Area Twin Hull) availably and perfecting the detection technique of the SWATH ship's performance, this paper presents a novel bidirectional im...For the purpose of identifying the stern of the SWATH (Small Waterplane Area Twin Hull) availably and perfecting the detection technique of the SWATH ship's performance, this paper presents a novel bidirectional image registration strategy and mosaicing technique based on the scale invariant feature transform (SIFT) algorithm. The proposed method can help us observe the stern with a great visual angle for analyzing the performance of the control fins of the SWATH. SIFT is one of the most effective local features of the scale, rotation and illumination invariant. However, there are a few false match rates in this algorithm. In terms of underwater machine vision, only by acquiring an accurate match rate can we find an underwater robot rapidly and identify the location of the object. Therefore, firstly, the selection of the match ratio principle is put forward in this paper; secondly, some advantages of the bidirectional registration algorithm are concluded by analyzing the characteristics of the unidirectional matching method. Finally, an automatic underwater image splicing method is proposed on the basis of fixed dimension, and then the edge of the image's overlapping section is merged by the principal components analysis algorithm. The experimental results achieve a better registration and smooth mosaicing effect, demonstrating that the proposed method is effective.展开更多
Image matching based on scale invariant feature transform(SIFT) is one of the most popular image matching algorithms, which exhibits high robustness and accuracy. Grayscale images rather than color images are genera...Image matching based on scale invariant feature transform(SIFT) is one of the most popular image matching algorithms, which exhibits high robustness and accuracy. Grayscale images rather than color images are generally used to get SIFT descriptors in order to reduce the complexity. The regions which have a similar grayscale level but different hues tend to produce wrong matching results in this case. Therefore, the loss of color information may result in decreasing of matching ratio. An image matching algorithm based on SIFT is proposed, which adds a color offset and an exposure offset when converting color images to grayscale images in order to enhance the matching ratio. Experimental results show that the proposed algorithm can effectively differentiate the regions with different colors but the similar grayscale level, and increase the matching ratio of image matching based on SIFT. Furthermore, it does not introduce much complexity than the traditional SIFT.展开更多
To improve the performance of the scale invariant feature transform ( SIFT), a modified SIFT (M-SIFT) descriptor is proposed to realize fast and robust key-point extraction and matching. In descriptor generation, ...To improve the performance of the scale invariant feature transform ( SIFT), a modified SIFT (M-SIFT) descriptor is proposed to realize fast and robust key-point extraction and matching. In descriptor generation, 3 rotation-invariant concentric-ring grids around the key-point location are used instead of 16 square grids used in the original SIFT. Then, 10 orientations are accumulated for each grid, which results in a 30-dimension descriptor. In descriptor matching, rough rejection mismatches is proposed based on the difference of grey information between matching points. The per- formance of the proposed method is tested for image mosaic on simulated and real-worid images. Experimental results show that the M-SIFT descriptor inherits the SIFT' s ability of being invariant to image scale and rotation, illumination change and affine distortion. Besides the time cost of feature extraction is reduced by 50% compared with the original SIFT. And the rough rejection mismatches can reject at least 70% of mismatches. The results also demonstrate that the performance of the pro- posed M-SIFT method is superior to other improved SIFT methods in speed and robustness.展开更多
A simple and effective greedy algorithm for image approximation is proposed. Based on the matching pursuit approach, it is characterized by a reduced computational complexity benefiting from two major modifications. F...A simple and effective greedy algorithm for image approximation is proposed. Based on the matching pursuit approach, it is characterized by a reduced computational complexity benefiting from two major modifications. First, it iteratively finds an approximation by selecting M atoms instead of one at a time. Second, the inner product computations are confined within only a fraction of dictionary atoms at each iteration. The modifications are implemented very efficiently due to the spatial incoherence of the dictionary. Experimental results show that compared with full search matching pursuit, the proposed algorithm achieves a speed-up gain of 14.4-36.7 times while maintaining the approximation quality.展开更多
Due to requirements and necessities in digital image research, image matching is considered as a key, essential and complicating point especially for machine learning. According to its convenience and facility, the mo...Due to requirements and necessities in digital image research, image matching is considered as a key, essential and complicating point especially for machine learning. According to its convenience and facility, the most applied algorithm for image feature point extraction and matching is Speeded-Up Robust Feature (SURF). The enhancement for scale invariant feature transform (SIFT) algorithm promotes the effectiveness of the algorithm as well as facilitates the possibility, while the application of the algorithm is being applied in a present time computer vision system. In this research work, the aim of SURF algorithm is to extract image features, and we have incorporated RANSAC algorithm to filter matching points. The images were juxtaposed and asserted experiments utilizing pertinent image improvement methods. The idea based on merging improvement technology through SURF algorithm is put forward to get better quality of feature points matching the efficiency and appropriate image improvement methods are adopted for different feature images which are compared and verified by experiments. Some results have been explained there which are the effects of lighting on the underexposed and overexposed images.展开更多
To obtain the sparse decomposition and flexible representation of traffic images,this paper proposes a fast matching pursuit for traffic images using differential evolution. According to the structural features of tra...To obtain the sparse decomposition and flexible representation of traffic images,this paper proposes a fast matching pursuit for traffic images using differential evolution. According to the structural features of traffic images,the introduced algorithm selects the image atoms in a fast and flexible way from an over-complete image dictionary to adaptively match the local structures of traffic images and therefore to implement the sparse decomposition. As compared with the traditional method and a genetic algorithm of matching pursuit by using extensive experiments,the differential evolution achieves much higher quality of traffic images with much less computational time,which indicates the effectiveness of the proposed algorithm.展开更多
Three-dimensional(3D)reconstruction based on aerial images has broad prospects,and feature matching is an important step of it.However,for high-resolution aerial images,there are usually problems such as long time,mis...Three-dimensional(3D)reconstruction based on aerial images has broad prospects,and feature matching is an important step of it.However,for high-resolution aerial images,there are usually problems such as long time,mismatching and sparse feature pairs using traditional algorithms.Therefore,an algorithm is proposed to realize fast,accurate and dense feature matching.The algorithm consists of four steps.Firstly,we achieve a balance between the feature matching time and the number of matching pairs by appropriately reducing the image resolution.Secondly,to realize further screening of the mismatches,a feature screening algorithm based on similarity judgment or local optimization is proposed.Thirdly,to make the algorithm more widely applicable,we combine the results of different algorithms to get dense results.Finally,all matching feature pairs in the low-resolution images are restored to the original images.Comparisons between the original algorithms and our algorithm show that the proposed algorithm can effectively reduce the matching time,screen out the mismatches,and improve the number of matches.展开更多
基金supported by a grant from the Basic Science Research Program through the National Research Foundation(NRF)(2021R1F1A1063634)funded by the Ministry of Science and ICT(MSIT),Republic of KoreaThe authors are thankful to the Deanship of Scientific Research at Najran University for funding this work under the Research Group Funding Program Grant Code(NU/RG/SERC/13/40)+2 种基金Also,the authors are thankful to Prince Satam bin Abdulaziz University for supporting this study via funding from Prince Satam bin Abdulaziz University project number(PSAU/2024/R/1445)This work was also supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2023R54)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Road traffic monitoring is an imperative topic widely discussed among researchers.Systems used to monitor traffic frequently rely on cameras mounted on bridges or roadsides.However,aerial images provide the flexibility to use mobile platforms to detect the location and motion of the vehicle over a larger area.To this end,different models have shown the ability to recognize and track vehicles.However,these methods are not mature enough to produce accurate results in complex road scenes.Therefore,this paper presents an algorithm that combines state-of-the-art techniques for identifying and tracking vehicles in conjunction with image bursts.The extracted frames were converted to grayscale,followed by the application of a georeferencing algorithm to embed coordinate information into the images.The masking technique eliminated irrelevant data and reduced the computational cost of the overall monitoring system.Next,Sobel edge detection combined with Canny edge detection and Hough line transform has been applied for noise reduction.After preprocessing,the blob detection algorithm helped detect the vehicles.Vehicles of varying sizes have been detected by implementing a dynamic thresholding scheme.Detection was done on the first image of every burst.Then,to track vehicles,the model of each vehicle was made to find its matches in the succeeding images using the template matching algorithm.To further improve the tracking accuracy by incorporating motion information,Scale Invariant Feature Transform(SIFT)features have been used to find the best possible match among multiple matches.An accuracy rate of 87%for detection and 80%accuracy for tracking in the A1 Motorway Netherland dataset has been achieved.For the Vehicle Aerial Imaging from Drone(VAID)dataset,an accuracy rate of 86%for detection and 78%accuracy for tracking has been achieved.
基金This work was supported by Science and Technology Cooperation Special Project of Shijiazhuang(SJZZXA23005).
文摘In minimally invasive surgery,endoscopes or laparoscopes equipped with miniature cameras and tools are used to enter the human body for therapeutic purposes through small incisions or natural cavities.However,in clinical operating environments,endoscopic images often suffer from challenges such as low texture,uneven illumination,and non-rigid structures,which affect feature observation and extraction.This can severely impact surgical navigation or clinical diagnosis due to missing feature points in endoscopic images,leading to treatment and postoperative recovery issues for patients.To address these challenges,this paper introduces,for the first time,a Cross-Channel Multi-Modal Adaptive Spatial Feature Fusion(ASFF)module based on the lightweight architecture of EfficientViT.Additionally,a novel lightweight feature extraction and matching network based on attention mechanism is proposed.This network dynamically adjusts attention weights for cross-modal information from grayscale images and optical flow images through a dual-branch Siamese network.It extracts static and dynamic information features ranging from low-level to high-level,and from local to global,ensuring robust feature extraction across different widths,noise levels,and blur scenarios.Global and local matching are performed through a multi-level cascaded attention mechanism,with cross-channel attention introduced to simultaneously extract low-level and high-level features.Extensive ablation experiments and comparative studies are conducted on the HyperKvasir,EAD,M2caiSeg,CVC-ClinicDB,and UCL synthetic datasets.Experimental results demonstrate that the proposed network improves upon the baseline EfficientViT-B3 model by 75.4%in accuracy(Acc),while also enhancing runtime performance and storage efficiency.When compared with the complex DenseDescriptor feature extraction network,the difference in Acc is less than 7.22%,and IoU calculation results on specific datasets outperform complex dense models.Furthermore,this method increases the F1 score by 33.2%and accelerates runtime by 70.2%.It is noteworthy that the speed of CMMCAN surpasses that of comparative lightweight models,with feature extraction and matching performance comparable to existing complex models but with faster speed and higher cost-effectiveness.
基金Supported by the Key Research Program of the Chinese Academy of Sciences(ZDRE-KT-2021-3)。
文摘Augmented solar images were used to research the adaptability of four representative image extraction and matching algorithms in space weather domain.These include the scale-invariant feature transform algorithm,speeded-up robust features algorithm,binary robust invariant scalable keypoints algorithm,and oriented fast and rotated brief algorithm.The performance of these algorithms was estimated in terms of matching accuracy,feature point richness,and running time.The experiment result showed that no algorithm achieved high accuracy while keeping low running time,and all algorithms are not suitable for image feature extraction and matching of augmented solar images.To solve this problem,an improved method was proposed by using two-frame matching to utilize the accuracy advantage of the scale-invariant feature transform algorithm and the speed advantage of the oriented fast and rotated brief algorithm.Furthermore,our method and the four representative algorithms were applied to augmented solar images.Our application experiments proved that our method achieved a similar high recognition rate to the scale-invariant feature transform algorithm which is significantly higher than other algorithms.Our method also obtained a similar low running time to the oriented fast and rotated brief algorithm,which is significantly lower than other algorithms.
基金Supported by the Natural Science Foundation of China(62072388,62276146)the Industry Guidance Project Foundation of Science technology Bureau of Fujian province(2020H0047)+2 种基金the Natural Science Foundation of Science Technology Bureau of Fujian province(2019J01601)the Creation Fund project of Science Technology Bureau of Fujian province(JAT190596)Putian University Research Project(2022034)。
文摘Background Image matching is crucial in numerous computer vision tasks such as 3D reconstruction and simultaneous visual localization and mapping.The accuracy of the matching significantly impacted subsequent studies.Because of their local similarity,when image pairs contain comparable patterns but feature pairs are positioned differently,incorrect recognition can occur as global motion consistency is disregarded.Methods This study proposes an image-matching filtering algorithm based on global motion consistency.It can be used as a subsequent matching filter for the initial matching results generated by other matching algorithms based on the principle of motion smoothness.A particular matching algorithm can first be used to perform the initial matching;then,the rotation and movement information of the global feature vectors are combined to effectively identify outlier matches.The principle is that if the matching result is accurate,the feature vectors formed by any matched point should have similar rotation angles and moving distances.Thus,global motion direction and global motion distance consistencies were used to reject outliers caused by similar patterns in different locations.Results Four datasets were used to test the effectiveness of the proposed method.Three datasets with similar patterns in different locations were used to test the results for similar images that could easily be incorrectly matched by other algorithms,and one commonly used dataset was used to test the results for the general image-matching problem.The experimental results suggest that the proposed method is more accurate than other state-of-the-art algorithms in identifying mismatches in the initial matching set.Conclusions The proposed outlier rejection matching method can significantly improve the matching accuracy for similar images with locally similar feature pairs in different locations and can provide more accurate matching results for subsequent computer vision tasks.
文摘Based on the inertial navigation system, the influences of the excursion of the inertial navigation system and the measurement error of the wireless pressure altimeter on the rotation and scale of the real image are quantitatively analyzed in scene matching. The log-polar transform (LPT) is utilized and an anti-rotation and anti- scale image matching algorithm is proposed based on the image edge feature point extraction. In the algorithm, the center point is combined with its four-neighbor points, and the corresponding computing process is put forward. Simulation results show that in the image rotation and scale variation range resulted from the navigation system error and the measurement error of the wireless pressure altimeter, the proposed image matching algo- rithm can satisfy the accuracy demands of the scene aided navigation system and provide the location error-correcting information of the system.
文摘Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural network models and semantic matching techniques.Experiments conducted on the Flickr8k and AraImg2k benchmark datasets,featuring images and descriptions in English and Arabic,showcase remarkable performance improvements over state-of-the-art methods.Our model,equipped with the Image&Cross-Language Semantic Matching module and the Target Language Domain Evaluation module,significantly enhances the semantic relevance of generated image descriptions.For English-to-Arabic and Arabic-to-English cross-language image descriptions,our approach achieves a CIDEr score for English and Arabic of 87.9%and 81.7%,respectively,emphasizing the substantial contributions of our methodology.Comparative analyses with previous works further affirm the superior performance of our approach,and visual results underscore that our model generates image captions that are both semantically accurate and stylistically consistent with the target language.In summary,this study advances the field of cross-lingual image description,offering an effective solution for generating image captions across languages,with the potential to impact multilingual communication and accessibility.Future research directions include expanding to more languages and incorporating diverse visual and textual data sources.
基金supported by the Yayasan Universiti Teknologi PETRONAS Grants,YUTP-PRG(015PBC-027)YUTP-FRG(015LC0-311),Hilmi Hasan,www.utp.edu.my.
文摘Medical imaging plays a key role within modern hospital management systems for diagnostic purposes.Compression methodologies are extensively employed to mitigate storage demands and enhance transmission speed,all while upholding image quality.Moreover,an increasing number of hospitals are embracing cloud computing for patient data storage,necessitating meticulous scrutiny of server security and privacy protocols.Nevertheless,considering the widespread availability of multimedia tools,the preservation of digital data integrity surpasses the significance of compression alone.In response to this concern,we propose a secure storage and transmission solution for compressed medical image sequences,such as ultrasound images,utilizing a motion vector watermarking scheme.The watermark is generated employing an error-correcting code known as Bose-Chaudhuri-Hocquenghem(BCH)and is subsequently embedded into the compressed sequence via block-based motion vectors.In the process of watermark embedding,motion vectors are selected based on their magnitude and phase angle.When embedding watermarks,no specific spatial area,such as a region of interest(ROI),is used in the images.The embedding of watermark bits is dependent on motion vectors.Although reversible watermarking allows the restoration of the original image sequences,we use the irreversible watermarking method.The reason for this is that the use of reversible watermarks may impede the claims of ownership and legal rights.The restoration of original data or images may call into question ownership or other legal claims.The peak signal-to-noise ratio(PSNR)and structural similarity index(SSIM)serve as metrics for evaluating the watermarked image quality.Across all images,the PSNR value exceeds 46 dB,and the SSIM value exceeds 0.92.Experimental results substantiate the efficacy of the proposed technique in preserving data integrity.
文摘The task of indoor visual localization, utilizing camera visual information for user pose calculation, was a core component of Augmented Reality (AR) and Simultaneous Localization and Mapping (SLAM). Existing indoor localization technologies generally used scene-specific 3D representations or were trained on specific datasets, making it challenging to balance accuracy and cost when applied to new scenes. Addressing this issue, this paper proposed a universal indoor visual localization method based on efficient image retrieval. Initially, a Multi-Layer Perceptron (MLP) was employed to aggregate features from intermediate layers of a convolutional neural network, obtaining a global representation of the image. This approach ensured accurate and rapid retrieval of reference images. Subsequently, a new mechanism using Random Sample Consensus (RANSAC) was designed to resolve relative pose ambiguity caused by the essential matrix decomposition based on the five-point method. Finally, the absolute pose of the queried user image was computed, thereby achieving indoor user pose estimation. The proposed indoor localization method was characterized by its simplicity, flexibility, and excellent cross-scene generalization. Experimental results demonstrated a positioning error of 0.09 m and 2.14° on the 7Scenes dataset, and 0.15 m and 6.37° on the 12Scenes dataset. These results convincingly illustrated the outstanding performance of the proposed indoor localization method.
基金supported by the National Natural Science Foundation of China(6157206361401308)+6 种基金the Fundamental Research Funds for the Central Universities(2016YJS039)the Natural Science Foundation of Hebei Province(F2016201142F2016201187)the Natural Social Foundation of Hebei Province(HB15TQ015)the Science Research Project of Hebei Province(QN2016085ZC2016040)the Natural Science Foundation of Hebei University(2014-303)
文摘Fusion methods based on multi-scale transforms have become the mainstream of the pixel-level image fusion. However,most of these methods cannot fully exploit spatial domain information of source images, which lead to the degradation of image.This paper presents a fusion framework based on block-matching and 3D(BM3D) multi-scale transform. The algorithm first divides the image into different blocks and groups these 2D image blocks into 3D arrays by their similarity. Then it uses a 3D transform which consists of a 2D multi-scale and a 1D transform to transfer the arrays into transform coefficients, and then the obtained low-and high-coefficients are fused by different fusion rules. The final fused image is obtained from a series of fused 3D image block groups after the inverse transform by using an aggregation process. In the experimental part, we comparatively analyze some existing algorithms and the using of different transforms, e.g. non-subsampled Contourlet transform(NSCT), non-subsampled Shearlet transform(NSST), in the 3D transform step. Experimental results show that the proposed fusion framework can not only improve subjective visual effect, but also obtain better objective evaluation criteria than state-of-the-art methods.
文摘A novel algorithm is presented to make the results of image matching more reliable and accurate based on SIFT (Scale Invariant Feature Transform). SIFT algorithm has been identified as the most resistant matching algorithm to common image deformations; however, if there are similar regions in images, SIFT algorithm still generates some analogical descriptors and provides many mismatches. This paper examines the local image descriptor used by SIFT and presents a new algorithm by integrating SIFT with two-dimensional moment invariants and disparity gradient to improve the matching results. In the new algorithm, decision tree is used, and the whole matching process is divided into three levels with different primitives. Matching points are considered as correct ones only when they satisfy all the three similarity measurements. Experiment results demonstrate that the new approach is more reliable and accurate.
基金supported by the National Natural Science Foundations of China(Nos.51205193,51475221)
文摘Image matching technology is theoretically significant and practically promising in the field of autonomous navigation.Addressing shortcomings of existing image matching navigation technologies,the concept of high-dimensional combined feature is presented based on sequence image matching navigation.To balance between the distribution of high-dimensional combined features and the shortcomings of the only use of geometric relations,we propose a method based on Delaunay triangulation to improve the feature,and add the regional characteristics of the features together with their geometric characteristics.Finally,k-nearest neighbor(KNN)algorithm is adopted to optimize searching process.Simulation results show that the matching can be realized at the rotation angle of-8°to 8°and the scale factor of 0.9 to 1.1,and when the image size is 160 pixel×160 pixel,the matching time is less than 0.5 s.Therefore,the proposed algorithm can substantially reduce computational complexity,improve the matching speed,and exhibit robustness to the rotation and scale changes.
基金supported by the Foundation of Graduate Innovation Center in Nanjing University of Aeronautics and Astronautics(No.kfjj20191506)。
文摘In view of the fact that the traditional Hausdorff image matching algorithm is very sensitive to the image size as well as the unsatisfactory real-time performance in practical applications,an image matching algorithm is proposed based on the combination of Yolov3.Firstly,the features of the reference image are selected for pretraining,and then the training results are used to extract the features of the real images before the coordinates of the center points of the feature area are used to complete the coarse matching.Finally,the Hausdorff algorithm is used to complete the fine image matching.Experiments show that the proposed algorithm significantly improves the speed and accuracy of image matching.Also,it is robust to rotation changes.
文摘Histogram of collinear gradient-enhanced coding (HCGEC), a robust key point descriptor for multi-spectral image matching, is proposed. The HCGEC mainly encodes rough structures within an image and suppresses detailed textural information, which is desirable in multi-spectral image matching. Experiments on two multi-spectral data sets demonstrate that the proposed descriptor can yield significantly better results than some state-of- the-art descriptors.
基金Supported by the "Liaoning Baiqianwan" Talents Program(No.200718625)the Program of Scientific Research Project of Liao Ning Province Education Commission(No.LS2010046)the National Commonweal Industry Scientific Research Project(No.201003024)
文摘For the purpose of identifying the stern of the SWATH (Small Waterplane Area Twin Hull) availably and perfecting the detection technique of the SWATH ship's performance, this paper presents a novel bidirectional image registration strategy and mosaicing technique based on the scale invariant feature transform (SIFT) algorithm. The proposed method can help us observe the stern with a great visual angle for analyzing the performance of the control fins of the SWATH. SIFT is one of the most effective local features of the scale, rotation and illumination invariant. However, there are a few false match rates in this algorithm. In terms of underwater machine vision, only by acquiring an accurate match rate can we find an underwater robot rapidly and identify the location of the object. Therefore, firstly, the selection of the match ratio principle is put forward in this paper; secondly, some advantages of the bidirectional registration algorithm are concluded by analyzing the characteristics of the unidirectional matching method. Finally, an automatic underwater image splicing method is proposed on the basis of fixed dimension, and then the edge of the image's overlapping section is merged by the principal components analysis algorithm. The experimental results achieve a better registration and smooth mosaicing effect, demonstrating that the proposed method is effective.
基金supported by the National Natural Science Foundation of China(61271315)the State Scholarship Fund of China
文摘Image matching based on scale invariant feature transform(SIFT) is one of the most popular image matching algorithms, which exhibits high robustness and accuracy. Grayscale images rather than color images are generally used to get SIFT descriptors in order to reduce the complexity. The regions which have a similar grayscale level but different hues tend to produce wrong matching results in this case. Therefore, the loss of color information may result in decreasing of matching ratio. An image matching algorithm based on SIFT is proposed, which adds a color offset and an exposure offset when converting color images to grayscale images in order to enhance the matching ratio. Experimental results show that the proposed algorithm can effectively differentiate the regions with different colors but the similar grayscale level, and increase the matching ratio of image matching based on SIFT. Furthermore, it does not introduce much complexity than the traditional SIFT.
基金Supported by the National Natural Science Foundation of China(60905012)
文摘To improve the performance of the scale invariant feature transform ( SIFT), a modified SIFT (M-SIFT) descriptor is proposed to realize fast and robust key-point extraction and matching. In descriptor generation, 3 rotation-invariant concentric-ring grids around the key-point location are used instead of 16 square grids used in the original SIFT. Then, 10 orientations are accumulated for each grid, which results in a 30-dimension descriptor. In descriptor matching, rough rejection mismatches is proposed based on the difference of grey information between matching points. The per- formance of the proposed method is tested for image mosaic on simulated and real-worid images. Experimental results show that the M-SIFT descriptor inherits the SIFT' s ability of being invariant to image scale and rotation, illumination change and affine distortion. Besides the time cost of feature extraction is reduced by 50% compared with the original SIFT. And the rough rejection mismatches can reject at least 70% of mismatches. The results also demonstrate that the performance of the pro- posed M-SIFT method is superior to other improved SIFT methods in speed and robustness.
文摘A simple and effective greedy algorithm for image approximation is proposed. Based on the matching pursuit approach, it is characterized by a reduced computational complexity benefiting from two major modifications. First, it iteratively finds an approximation by selecting M atoms instead of one at a time. Second, the inner product computations are confined within only a fraction of dictionary atoms at each iteration. The modifications are implemented very efficiently due to the spatial incoherence of the dictionary. Experimental results show that compared with full search matching pursuit, the proposed algorithm achieves a speed-up gain of 14.4-36.7 times while maintaining the approximation quality.
文摘Due to requirements and necessities in digital image research, image matching is considered as a key, essential and complicating point especially for machine learning. According to its convenience and facility, the most applied algorithm for image feature point extraction and matching is Speeded-Up Robust Feature (SURF). The enhancement for scale invariant feature transform (SIFT) algorithm promotes the effectiveness of the algorithm as well as facilitates the possibility, while the application of the algorithm is being applied in a present time computer vision system. In this research work, the aim of SURF algorithm is to extract image features, and we have incorporated RANSAC algorithm to filter matching points. The images were juxtaposed and asserted experiments utilizing pertinent image improvement methods. The idea based on merging improvement technology through SURF algorithm is put forward to get better quality of feature points matching the efficiency and appropriate image improvement methods are adopted for different feature images which are compared and verified by experiments. Some results have been explained there which are the effects of lighting on the underexposed and overexposed images.
文摘To obtain the sparse decomposition and flexible representation of traffic images,this paper proposes a fast matching pursuit for traffic images using differential evolution. According to the structural features of traffic images,the introduced algorithm selects the image atoms in a fast and flexible way from an over-complete image dictionary to adaptively match the local structures of traffic images and therefore to implement the sparse decomposition. As compared with the traditional method and a genetic algorithm of matching pursuit by using extensive experiments,the differential evolution achieves much higher quality of traffic images with much less computational time,which indicates the effectiveness of the proposed algorithm.
基金This work was supported by the Equipment Pre-Research Foundation of China(6140001020310).
文摘Three-dimensional(3D)reconstruction based on aerial images has broad prospects,and feature matching is an important step of it.However,for high-resolution aerial images,there are usually problems such as long time,mismatching and sparse feature pairs using traditional algorithms.Therefore,an algorithm is proposed to realize fast,accurate and dense feature matching.The algorithm consists of four steps.Firstly,we achieve a balance between the feature matching time and the number of matching pairs by appropriately reducing the image resolution.Secondly,to realize further screening of the mismatches,a feature screening algorithm based on similarity judgment or local optimization is proposed.Thirdly,to make the algorithm more widely applicable,we combine the results of different algorithms to get dense results.Finally,all matching feature pairs in the low-resolution images are restored to the original images.Comparisons between the original algorithms and our algorithm show that the proposed algorithm can effectively reduce the matching time,screen out the mismatches,and improve the number of matches.