In minimally invasive surgery,endoscopes or laparoscopes equipped with miniature cameras and tools are used to enter the human body for therapeutic purposes through small incisions or natural cavities.However,in clini...In minimally invasive surgery,endoscopes or laparoscopes equipped with miniature cameras and tools are used to enter the human body for therapeutic purposes through small incisions or natural cavities.However,in clinical operating environments,endoscopic images often suffer from challenges such as low texture,uneven illumination,and non-rigid structures,which affect feature observation and extraction.This can severely impact surgical navigation or clinical diagnosis due to missing feature points in endoscopic images,leading to treatment and postoperative recovery issues for patients.To address these challenges,this paper introduces,for the first time,a Cross-Channel Multi-Modal Adaptive Spatial Feature Fusion(ASFF)module based on the lightweight architecture of EfficientViT.Additionally,a novel lightweight feature extraction and matching network based on attention mechanism is proposed.This network dynamically adjusts attention weights for cross-modal information from grayscale images and optical flow images through a dual-branch Siamese network.It extracts static and dynamic information features ranging from low-level to high-level,and from local to global,ensuring robust feature extraction across different widths,noise levels,and blur scenarios.Global and local matching are performed through a multi-level cascaded attention mechanism,with cross-channel attention introduced to simultaneously extract low-level and high-level features.Extensive ablation experiments and comparative studies are conducted on the HyperKvasir,EAD,M2caiSeg,CVC-ClinicDB,and UCL synthetic datasets.Experimental results demonstrate that the proposed network improves upon the baseline EfficientViT-B3 model by 75.4%in accuracy(Acc),while also enhancing runtime performance and storage efficiency.When compared with the complex DenseDescriptor feature extraction network,the difference in Acc is less than 7.22%,and IoU calculation results on specific datasets outperform complex dense models.Furthermore,this method increases the F1 score by 33.2%and accelerates runtime by 70.2%.It is noteworthy that the speed of CMMCAN surpasses that of comparative lightweight models,with feature extraction and matching performance comparable to existing complex models but with faster speed and higher cost-effectiveness.展开更多
Augmented solar images were used to research the adaptability of four representative image extraction and matching algorithms in space weather domain.These include the scale-invariant feature transform algorithm,speed...Augmented solar images were used to research the adaptability of four representative image extraction and matching algorithms in space weather domain.These include the scale-invariant feature transform algorithm,speeded-up robust features algorithm,binary robust invariant scalable keypoints algorithm,and oriented fast and rotated brief algorithm.The performance of these algorithms was estimated in terms of matching accuracy,feature point richness,and running time.The experiment result showed that no algorithm achieved high accuracy while keeping low running time,and all algorithms are not suitable for image feature extraction and matching of augmented solar images.To solve this problem,an improved method was proposed by using two-frame matching to utilize the accuracy advantage of the scale-invariant feature transform algorithm and the speed advantage of the oriented fast and rotated brief algorithm.Furthermore,our method and the four representative algorithms were applied to augmented solar images.Our application experiments proved that our method achieved a similar high recognition rate to the scale-invariant feature transform algorithm which is significantly higher than other algorithms.Our method also obtained a similar low running time to the oriented fast and rotated brief algorithm,which is significantly lower than other algorithms.展开更多
Human activity recognition(HAR)can play a vital role in the monitoring of human activities,particularly for healthcare conscious individuals.The accuracy of HAR systems is completely reliant on the extraction of promi...Human activity recognition(HAR)can play a vital role in the monitoring of human activities,particularly for healthcare conscious individuals.The accuracy of HAR systems is completely reliant on the extraction of prominent features.Existing methods find it very challenging to extract optimal features due to the dynamic nature of activities,thereby reducing recognition performance.In this paper,we propose a robust feature extraction method for HAR systems based on template matching.Essentially,in this method,we want to associate a template of an activity frame or sub-frame comprising the corresponding silhouette.In this regard,the template is placed on the frame pixels to calculate the equivalent number of pixels in the template correspondent those in the frame.This process is replicated for the whole frame,and the pixel is directed to the optimum match.The best count is estimated to be the pixel where the silhouette(provided via the template)presented inside the frame.In this way,the feature vector is generated.After feature vector generation,the hiddenMarkovmodel(HMM)has been utilized to label the incoming activity.We utilized different publicly available standard datasets for experiments.The proposed method achieved the best accuracy against existing state-of-the-art systems.展开更多
To improve the performance of the scale invariant feature transform ( SIFT), a modified SIFT (M-SIFT) descriptor is proposed to realize fast and robust key-point extraction and matching. In descriptor generation, ...To improve the performance of the scale invariant feature transform ( SIFT), a modified SIFT (M-SIFT) descriptor is proposed to realize fast and robust key-point extraction and matching. In descriptor generation, 3 rotation-invariant concentric-ring grids around the key-point location are used instead of 16 square grids used in the original SIFT. Then, 10 orientations are accumulated for each grid, which results in a 30-dimension descriptor. In descriptor matching, rough rejection mismatches is proposed based on the difference of grey information between matching points. The per- formance of the proposed method is tested for image mosaic on simulated and real-worid images. Experimental results show that the M-SIFT descriptor inherits the SIFT' s ability of being invariant to image scale and rotation, illumination change and affine distortion. Besides the time cost of feature extraction is reduced by 50% compared with the original SIFT. And the rough rejection mismatches can reject at least 70% of mismatches. The results also demonstrate that the performance of the pro- posed M-SIFT method is superior to other improved SIFT methods in speed and robustness.展开更多
An android-based lace image retrieval system based on content-based image retrieval (CBIR) technique is presented. This paper applies shape and texture features of lace image in our system and proposes a hierarchical ...An android-based lace image retrieval system based on content-based image retrieval (CBIR) technique is presented. This paper applies shape and texture features of lace image in our system and proposes a hierarchical multifeature scheme to facilitate coarseto-fine matching for efficient lace image retrieval in a large database. Experimental results demonstrate the feasibility and effectiveness of the proposed system meet the requirements of realtime.展开更多
Automatic image classification is the first step toward semantic understanding of an object in the computer vision area.The key challenge of problem for accurate object recognition is the ability to extract the robust...Automatic image classification is the first step toward semantic understanding of an object in the computer vision area.The key challenge of problem for accurate object recognition is the ability to extract the robust features from various viewpoint images and rapidly calculate similarity between features in the image database or video stream.In order to solve these problems,an effective and rapid image classification method was presented for the object recognition based on the video learning technique.The optical-flow and RANSAC algorithm were used to acquire scene images from each video sequence.After the selection of scene images,the local maximum points on comer of object around local area were found using the Harris comer detection algorithm and the several attributes from local block around each feature point were calculated by using scale invariant feature transform (SIFT) for extracting local descriptor.Finally,the extracted local descriptor was learned to the three-dimensional pyramid match kernel.Experimental results show that our method can extract features in various multi-viewpoint images from query video and calculate a similarity between a query image and images in the database.展开更多
Purpose–Precise vehicle localization is a basic and critical technique for various intelligent transportation system(ITS)applications.It also needs to adapt to the complex road environments in real-time.The global po...Purpose–Precise vehicle localization is a basic and critical technique for various intelligent transportation system(ITS)applications.It also needs to adapt to the complex road environments in real-time.The global positioning system and the strap-down inertial navigation system are two common techniques in thefield of vehicle localization.However,the localization accuracy,reliability and real-time performance of these two techniques can not satisfy the requirement of some critical ITS applications such as collision avoiding,vision enhancement and automatic parking.Aiming at the problems above,this paper aims to propose a precise vehicle ego-localization method based on image matching.Design/methodology/approach–This study included three steps,Step 1,extraction of feature points.After getting the image,the local features in the pavement images were extracted using an improved speeded up robust features algorithm.Step 2,eliminate mismatch points.Using a random sample consensus algorithm to eliminate mismatched points of road image and make match point pairs more robust.Step 3,matching of feature points and trajectory generation.Findings–Through the matching and validation of the extracted local feature points,the relative translation and rotation offsets between two consecutive pavement images were calculated,eventually,the trajectory of the vehicle was generated.Originality/value–The experimental results show that the studied algorithm has an accuracy at decimeter-level and it fully meets the demand of the lane-level positioning in some critical ITS applications.展开更多
Facial expression recognition(FER) in video has attracted the increasing interest and many approaches have been made.The crucial problem of classifying a given video sequence into several basic emotions is how to fuse...Facial expression recognition(FER) in video has attracted the increasing interest and many approaches have been made.The crucial problem of classifying a given video sequence into several basic emotions is how to fuse facial features of individual frames.In this paper, a frame-level attention module is integrated into an improved VGG-based frame work and a lightweight facial expression recognition method is proposed.The proposed network takes a sub video cut from an experimental video sequence as its input and generates a fixed-dimension representation.The VGG-based network with an enhanced branch embeds face images into feature vectors.The frame-level attention module learns weights which are used to adaptively aggregate the feature vectors to form a single discriminative video representation.Finally, a regression module outputs the classification results.The experimental results on CK+and AFEW databases show that the recognition rates of the proposed method can achieve the state-of-the-art performance.展开更多
基金This work was supported by Science and Technology Cooperation Special Project of Shijiazhuang(SJZZXA23005).
文摘In minimally invasive surgery,endoscopes or laparoscopes equipped with miniature cameras and tools are used to enter the human body for therapeutic purposes through small incisions or natural cavities.However,in clinical operating environments,endoscopic images often suffer from challenges such as low texture,uneven illumination,and non-rigid structures,which affect feature observation and extraction.This can severely impact surgical navigation or clinical diagnosis due to missing feature points in endoscopic images,leading to treatment and postoperative recovery issues for patients.To address these challenges,this paper introduces,for the first time,a Cross-Channel Multi-Modal Adaptive Spatial Feature Fusion(ASFF)module based on the lightweight architecture of EfficientViT.Additionally,a novel lightweight feature extraction and matching network based on attention mechanism is proposed.This network dynamically adjusts attention weights for cross-modal information from grayscale images and optical flow images through a dual-branch Siamese network.It extracts static and dynamic information features ranging from low-level to high-level,and from local to global,ensuring robust feature extraction across different widths,noise levels,and blur scenarios.Global and local matching are performed through a multi-level cascaded attention mechanism,with cross-channel attention introduced to simultaneously extract low-level and high-level features.Extensive ablation experiments and comparative studies are conducted on the HyperKvasir,EAD,M2caiSeg,CVC-ClinicDB,and UCL synthetic datasets.Experimental results demonstrate that the proposed network improves upon the baseline EfficientViT-B3 model by 75.4%in accuracy(Acc),while also enhancing runtime performance and storage efficiency.When compared with the complex DenseDescriptor feature extraction network,the difference in Acc is less than 7.22%,and IoU calculation results on specific datasets outperform complex dense models.Furthermore,this method increases the F1 score by 33.2%and accelerates runtime by 70.2%.It is noteworthy that the speed of CMMCAN surpasses that of comparative lightweight models,with feature extraction and matching performance comparable to existing complex models but with faster speed and higher cost-effectiveness.
基金Supported by the Key Research Program of the Chinese Academy of Sciences(ZDRE-KT-2021-3)。
文摘Augmented solar images were used to research the adaptability of four representative image extraction and matching algorithms in space weather domain.These include the scale-invariant feature transform algorithm,speeded-up robust features algorithm,binary robust invariant scalable keypoints algorithm,and oriented fast and rotated brief algorithm.The performance of these algorithms was estimated in terms of matching accuracy,feature point richness,and running time.The experiment result showed that no algorithm achieved high accuracy while keeping low running time,and all algorithms are not suitable for image feature extraction and matching of augmented solar images.To solve this problem,an improved method was proposed by using two-frame matching to utilize the accuracy advantage of the scale-invariant feature transform algorithm and the speed advantage of the oriented fast and rotated brief algorithm.Furthermore,our method and the four representative algorithms were applied to augmented solar images.Our application experiments proved that our method achieved a similar high recognition rate to the scale-invariant feature transform algorithm which is significantly higher than other algorithms.Our method also obtained a similar low running time to the oriented fast and rotated brief algorithm,which is significantly lower than other algorithms.
基金The authors extend their appreciation to the Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia for funding this work through the Project Number“375213500”.
文摘Human activity recognition(HAR)can play a vital role in the monitoring of human activities,particularly for healthcare conscious individuals.The accuracy of HAR systems is completely reliant on the extraction of prominent features.Existing methods find it very challenging to extract optimal features due to the dynamic nature of activities,thereby reducing recognition performance.In this paper,we propose a robust feature extraction method for HAR systems based on template matching.Essentially,in this method,we want to associate a template of an activity frame or sub-frame comprising the corresponding silhouette.In this regard,the template is placed on the frame pixels to calculate the equivalent number of pixels in the template correspondent those in the frame.This process is replicated for the whole frame,and the pixel is directed to the optimum match.The best count is estimated to be the pixel where the silhouette(provided via the template)presented inside the frame.In this way,the feature vector is generated.After feature vector generation,the hiddenMarkovmodel(HMM)has been utilized to label the incoming activity.We utilized different publicly available standard datasets for experiments.The proposed method achieved the best accuracy against existing state-of-the-art systems.
基金Supported by the National Natural Science Foundation of China(60905012)
文摘To improve the performance of the scale invariant feature transform ( SIFT), a modified SIFT (M-SIFT) descriptor is proposed to realize fast and robust key-point extraction and matching. In descriptor generation, 3 rotation-invariant concentric-ring grids around the key-point location are used instead of 16 square grids used in the original SIFT. Then, 10 orientations are accumulated for each grid, which results in a 30-dimension descriptor. In descriptor matching, rough rejection mismatches is proposed based on the difference of grey information between matching points. The per- formance of the proposed method is tested for image mosaic on simulated and real-worid images. Experimental results show that the M-SIFT descriptor inherits the SIFT' s ability of being invariant to image scale and rotation, illumination change and affine distortion. Besides the time cost of feature extraction is reduced by 50% compared with the original SIFT. And the rough rejection mismatches can reject at least 70% of mismatches. The results also demonstrate that the performance of the pro- posed M-SIFT method is superior to other improved SIFT methods in speed and robustness.
基金the Innovation Fund Projects of Cooperation among Industries,Universities & Research Institutes of Jiangsu Province,China(Nos.BY2015019-11,BY2015019-20)the Fundamental Research Funds for the Central Universities,China(Nos.JUSRP51404A,JUSRP211A38)Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD),China(No.[2014].37)
文摘An android-based lace image retrieval system based on content-based image retrieval (CBIR) technique is presented. This paper applies shape and texture features of lace image in our system and proposes a hierarchical multifeature scheme to facilitate coarseto-fine matching for efficient lace image retrieval in a large database. Experimental results demonstrate the feasibility and effectiveness of the proposed system meet the requirements of realtime.
文摘Automatic image classification is the first step toward semantic understanding of an object in the computer vision area.The key challenge of problem for accurate object recognition is the ability to extract the robust features from various viewpoint images and rapidly calculate similarity between features in the image database or video stream.In order to solve these problems,an effective and rapid image classification method was presented for the object recognition based on the video learning technique.The optical-flow and RANSAC algorithm were used to acquire scene images from each video sequence.After the selection of scene images,the local maximum points on comer of object around local area were found using the Harris comer detection algorithm and the several attributes from local block around each feature point were calculated by using scale invariant feature transform (SIFT) for extracting local descriptor.Finally,the extracted local descriptor was learned to the three-dimensional pyramid match kernel.Experimental results show that our method can extract features in various multi-viewpoint images from query video and calculate a similarity between a query image and images in the database.
文摘Purpose–Precise vehicle localization is a basic and critical technique for various intelligent transportation system(ITS)applications.It also needs to adapt to the complex road environments in real-time.The global positioning system and the strap-down inertial navigation system are two common techniques in thefield of vehicle localization.However,the localization accuracy,reliability and real-time performance of these two techniques can not satisfy the requirement of some critical ITS applications such as collision avoiding,vision enhancement and automatic parking.Aiming at the problems above,this paper aims to propose a precise vehicle ego-localization method based on image matching.Design/methodology/approach–This study included three steps,Step 1,extraction of feature points.After getting the image,the local features in the pavement images were extracted using an improved speeded up robust features algorithm.Step 2,eliminate mismatch points.Using a random sample consensus algorithm to eliminate mismatched points of road image and make match point pairs more robust.Step 3,matching of feature points and trajectory generation.Findings–Through the matching and validation of the extracted local feature points,the relative translation and rotation offsets between two consecutive pavement images were calculated,eventually,the trajectory of the vehicle was generated.Originality/value–The experimental results show that the studied algorithm has an accuracy at decimeter-level and it fully meets the demand of the lane-level positioning in some critical ITS applications.
基金Supported by the Future Network Scientific Research Fund Project of Jiangsu Province (No. FNSRFP2021YB26)the Jiangsu Key R&D Fund on Social Development (No. BE2022789)the Science Foundation of Nanjing Institute of Technology (No. ZKJ202003)。
文摘Facial expression recognition(FER) in video has attracted the increasing interest and many approaches have been made.The crucial problem of classifying a given video sequence into several basic emotions is how to fuse facial features of individual frames.In this paper, a frame-level attention module is integrated into an improved VGG-based frame work and a lightweight facial expression recognition method is proposed.The proposed network takes a sub video cut from an experimental video sequence as its input and generates a fixed-dimension representation.The VGG-based network with an enhanced branch embeds face images into feature vectors.The frame-level attention module learns weights which are used to adaptively aggregate the feature vectors to form a single discriminative video representation.Finally, a regression module outputs the classification results.The experimental results on CK+and AFEW databases show that the recognition rates of the proposed method can achieve the state-of-the-art performance.