With the development of the society,people's requirements for clothing matching are constantly increasing when developing clothing recommendation system.This requires that the algorithm for understanding the cloth...With the development of the society,people's requirements for clothing matching are constantly increasing when developing clothing recommendation system.This requires that the algorithm for understanding the clothing images should be sufficiently efficient and robust.Therefore,we detect the keypoints in clothing accurately to capture the details of clothing images.Since the joint points of the garment are similar to those of the human body,this paper utilizes a kind of deep neural network called cascaded pyramid network(CPN)about estimating the posture of human body to solve the problem of keypoints detection in clothing.In this paper,we first introduce the structure and characteristic of this neural network when detecting keypoints.Then we evaluate the results of the experiments and verify effectiveness of detecting keypoints of clothing with CPN,with normalized error about 5%7%.Finally,we analyze the influence of different backbones when detecting keypoints in this network.展开更多
Big data is a comprehensive result of the development of the Internet of Things and information systems.Computer vision requires a lot of data as the basis for research.Because skeleton data can adapt well to dynamic ...Big data is a comprehensive result of the development of the Internet of Things and information systems.Computer vision requires a lot of data as the basis for research.Because skeleton data can adapt well to dynamic environment and complex background,it is used in action recognition tasks.In recent years,skeleton-based action recognition has received more and more attention in the field of computer vision.Therefore,the keypoints of human skeletons are essential for describing the pose estimation of human and predicting the action recognition of the human.This paper proposes a skeleton point extraction method combined with object detection,which can focus on the extraction of skeleton keypoints.After a large number of experiments,our model can be combined with object detection for skeleton points extraction,and the detection efficiency is improved.展开更多
Copy-move offense is considerably used to conceal or hide several data in the digital image for specific aim, and onto this offense some portion of the genuine image is reduplicated and pasted in the same image. There...Copy-move offense is considerably used to conceal or hide several data in the digital image for specific aim, and onto this offense some portion of the genuine image is reduplicated and pasted in the same image. Therefore, Copy-Move forgery is a very significant problem and active research area to check the confirmation of the image. In this paper, a system for Copy Move Forgery detection is proposed. The proposed system is composed of two stages: one is called the detection stages and the second is called the refine detection stage. The detection stage is executed using Speeded-Up Robust Feature (SURF) and Binary Robust Invariant Scalable Keypoints (BRISK) for feature detection and in the refine detection stage, image registration using non-linear transformation is used to enhance detection efficiency. Initially, the genuine image is picked, and then both SURF and BRISK feature extractions are used in parallel to detect the interest keypoints. This gives an appropriate number of interest points and gives the assurance for finding the majority of the manipulated regions. RANSAC is employed to find the superior group of matches to differentiate the manipulated parts. Then, non-linear transformation between the best-matched sets from both extraction features is used as an optimization to get the best-matched set and detect the copied regions. A number of numerical experiments performed using many benchmark datasets such as, the CASIA v2.0, MICC-220, MICC-F600 and MICC-F2000 datasets. With the proposed algorithm, an overall average detection accuracy of 95.33% is obtained for evaluation carried out with the aforementioned databases. Forgery detection achieved True Positive Rate of 97.4% for tampered images with object translation, different degree of rotation and enlargement. Thus, results from different datasets have been set, proving that the proposed algorithm can individuate the altered areas, with high reliability and dealing with multiple cloning.展开更多
In the modern era of a growing population,it is arduous for humans to monitor every aspect of sports,events occurring around us,and scenarios or conditions.This recognition of different types of sports and events has ...In the modern era of a growing population,it is arduous for humans to monitor every aspect of sports,events occurring around us,and scenarios or conditions.This recognition of different types of sports and events has increasingly incorporated the use of machine learning and artificial intelligence.This research focuses on detecting and recognizing events in sequential photos characterized by several factors,including the size,location,and position of people’s body parts in those pictures,and the influence around those people.Common approaches utilized,here are feature descriptors such as MSER(Maximally Stable Extremal Regions),SIFT(Scale-Invariant Feature Transform),and DOF(degree of freedom)between the joint points are applied to the skeleton points.Moreover,for the same purposes,other features such as BRISK(Binary Robust Invariant Scalable Keypoints),ORB(Oriented FAST and Rotated BRIEF),and HOG(Histogram of Oriented Gradients)are applied on full body or silhouettes.The integration of these techniques increases the discriminative nature of characteristics retrieved in the identification process of the event,hence improving the efficiency and reliability of the entire procedure.These extracted features are passed to the early fusion and DBscan for feature fusion and optimization.Then deep belief,network is employed for recognition.Experimental results demonstrate a separate experiment’s detection average recognition rate of 87%in the HMDB51 video database and 89%in the YouTube database,showing a better perspective than the current methods in sports and event identification.展开更多
The research aims to improve the performance of image recognition methods based on a description in the form of a set of keypoint descriptors.The main focus is on increasing the speed of establishing the relevance of ...The research aims to improve the performance of image recognition methods based on a description in the form of a set of keypoint descriptors.The main focus is on increasing the speed of establishing the relevance of object and etalon descriptions while maintaining the required level of classification efficiency.The class to be recognized is represented by an infinite set of images obtained from the etalon by applying arbitrary geometric transformations.It is proposed to reduce the descriptions for the etalon database by selecting the most significant descriptor components according to the information content criterion.The informativeness of an etalon descriptor is estimated by the difference of the closest distances to its own and other descriptions.The developed method determines the relevance of the full description of the recognized object with the reduced description of the etalons.Several practical models of the classifier with different options for establishing the correspondence between object descriptors and etalons are considered.The results of the experimental modeling of the proposed methods for a database including images of museum jewelry are presented.The test sample is formed as a set of images from the etalon database and out of the database with the application of geometric transformations of scale and rotation in the field of view.The practical problems of determining the threshold for the number of votes,based on which a classification decision is made,have been researched.Modeling has revealed the practical possibility of tenfold reducing descriptions with full preservation of classification accuracy.Reducing the descriptions by twenty times in the experiment leads to slightly decreased accuracy.The speed of the analysis increases in proportion to the degree of reduction.The use of reduction by the informativeness criterion confirmed the possibility of obtaining the most significant subset of features for classification,which guarantees a decent level of accuracy.展开更多
Human pose estimation aims to localize the body joints from image or video data.With the development of deeplearning,pose estimation has become a hot research topic in the field of computer vision.In recent years,huma...Human pose estimation aims to localize the body joints from image or video data.With the development of deeplearning,pose estimation has become a hot research topic in the field of computer vision.In recent years,humanpose estimation has achieved great success in multiple fields such as animation and sports.However,to obtainaccurate positioning results,existing methods may suffer from large model sizes,a high number of parameters,and increased complexity,leading to high computing costs.In this paper,we propose a new lightweight featureencoder to construct a high-resolution network that reduces the number of parameters and lowers the computingcost.We also introduced a semantic enhancement module that improves global feature extraction and networkperformance by combining channel and spatial dimensions.Furthermore,we propose a dense connected spatialpyramid pooling module to compensate for the decrease in image resolution and information loss in the network.Finally,ourmethod effectively reduces the number of parameters and complexitywhile ensuring high performance.Extensive experiments show that our method achieves a competitive performance while dramatically reducing thenumber of parameters,and operational complexity.Specifically,our method can obtain 89.9%AP score on MPIIVAL,while the number of parameters and the complexity of operations were reduced by 41%and 36%,respectively.展开更多
Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we...Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we propose a U-shaped keypoint detection network(DAUNet)based on an improved ResNet subsampling structure and spatial grouping mechanism.This network addresses key challenges in traditional methods,such as information loss,large network redundancy,and insufficient sensitivity to low-resolution features.DAUNet is composed of three main components.First,we introduce an improved BottleNeck block that employs partial convolution and strip pooling to reduce computational load and mitigate feature loss.Second,after upsampling,the network eliminates redundant features,improving the overall efficiency.Finally,a lightweight spatial grouping attention mechanism is applied to enhance low-resolution semantic features within the feature map,allowing for better restoration of the original image size and higher accuracy.Experimental results demonstrate that DAUNet achieves superior accuracy compared to most existing keypoint detection models,with a mean PCKh@0.5 score of 91.6%on the MPII dataset and an AP of 76.1%on the COCO dataset.Moreover,real-world experiments further validate the robustness and generalizability of DAUNet for detecting human bodies in unknown environments,highlighting its potential for broader applications.展开更多
To address the problem that traditional keypoint detection methods are susceptible to complex backgrounds and local similarity of images resulting in inaccurate descriptor matching and bias in visual localization, key...To address the problem that traditional keypoint detection methods are susceptible to complex backgrounds and local similarity of images resulting in inaccurate descriptor matching and bias in visual localization, keypoints and descriptors based on cross-modality fusion are proposed and applied to the study of camera motion estimation. A convolutional neural network is used to detect the positions of keypoints and generate the corresponding descriptors, and the pyramid convolution is used to extract multi-scale features in the network. The problem of local similarity of images is solved by capturing local and global feature information and fusing the geometric position information of keypoints to generate descriptors. According to our experiments, the repeatability of our method is improved by 3.7%, and the homography estimation is improved by 1.6%. To demonstrate the practicability of the method, the visual odometry part of simultaneous localization and mapping is constructed and our method is 35% higher positioning accuracy than the traditional method.展开更多
A severe problem in modern information systems is Digital media tampering along with fake information.Even though there is an enhancement in image development,image forgery,either by the photographer or via image mani...A severe problem in modern information systems is Digital media tampering along with fake information.Even though there is an enhancement in image development,image forgery,either by the photographer or via image manipulations,is also done in parallel.Numerous researches have been concentrated on how to identify such manipulated media or information manually along with automatically;thus conquering the complicated forgery methodologies with effortlessly obtainable technologically enhanced instruments.However,high complexity affects the developed methods.Presently,it is complicated to resolve the issue of the speed-accuracy trade-off.For tackling these challenges,this article put forward a quick and effective Copy-Move Forgery Detection(CMFD)system utilizing a novel Quad-sort Moth Flame(QMF)Light Gradient Boosting Machine(QMF-Light GBM).Utilizing Borel Transform(BT)-based Wiener Filter(BWF)and resizing,the input images are initially pre-processed by eliminating the noise in the proposed system.After that,by utilizing the Orientation Preserving Simple Linear Iterative Clustering(OPSLIC),the pre-processed images,partitioned into a number of grids,are segmented.Next,as of the segmented images,the significant features are extracted along with the feature’s distance is calculated and matched with the input images.Next,utilizing the Union Topological Measure of Pattern Diversity(UTMOPD)method,the false positive matches that took place throughout the matching process are eliminated.After that,utilizing the QMF-Light GBM visualization,the visualization of forged in conjunction with non-forged images is performed.The extensive experiments revealed that concerning detection accuracy,the proposed system could be extremely precise when contrasted to some top-notch approaches.展开更多
This paper proposes a novel object detection method in which a set of local features inside the superpixels are extracted from the image under analysis acquired by a 3D visual sensor. To increase the segmentation accu...This paper proposes a novel object detection method in which a set of local features inside the superpixels are extracted from the image under analysis acquired by a 3D visual sensor. To increase the segmentation accuracy, the proposed method firstly performs the segmentation of the image, under analysis, using the Simple Linear Iterative Clustering (SLIC) superpixels method. Next the key points inside each superpixel are estimated using the Speed-Up Robust Feature (SURF). These key points are then used to carry out the matching task for every detected keypoints of a scene inside the estimated superpixels. In addition, a probability map is introduced to describe the accuracy of the object detection results. Experimental results show that the proposed approach provides fairly good object detection and confirms the superior performance of proposed scene compared with other recently proposed methods such as the scheme proposed by Mae et al.展开更多
This article presents a method for the description of key points using simple statistics for regions controlled by neighboring key points to remedy the gap in existing descriptors.Usually,the existent descriptors such...This article presents a method for the description of key points using simple statistics for regions controlled by neighboring key points to remedy the gap in existing descriptors.Usually,the existent descriptors such as speeded up robust features(SURF),Kaze,binary robust invariant scalable keypoints(BRISK),features from accelerated segment test(FAST),and oriented FAST and rotated BRIEF(ORB)can competently detect,describe,and match images in the presence of some artifacts such as blur,compression,and illumination.However,the performance and reliability of these descriptors decrease for some imaging variations such as point of view,zoom(scale),and rotation.The intro-duced description method improves image matching in the event of such distor-tions.It utilizes a contourlet-based detector to detect the strongest key points within a specified window size.The selected key points and their neighbors con-trol the size and orientation of the surrounding regions,which are mapped on rec-tangular shapes using polar transformation.The resulting rectangular matrices are subjected to two-directional statistical operations that involve calculating the mean and standard deviation.Consequently,the descriptor obtained is invariant(translation,rotation,and scale)because of the two methods;the extraction of the region and the polar transformation techniques used in this paper.The descrip-tion method introduced in this article is tested against well-established and well-known descriptors,such as SURF,Kaze,BRISK,FAST,and ORB,techniques using the standard OXFORD dataset.The presented methodology demonstrated its ability to improve the match between distorted images compared to other descriptors in the literature.展开更多
The results of the development of the new fast-speed method of classification images using a structural approach are presented.The method is based on the system of hierarchical features,based on the bitwise data distr...The results of the development of the new fast-speed method of classification images using a structural approach are presented.The method is based on the system of hierarchical features,based on the bitwise data distribution for the set of descriptors of image description.The article also proposes the use of the spatial data processing apparatus,which simplifies and accelerates the classification process.Experiments have shown that the time of calculation of the relevance for two descriptions according to their distributions is about 1000 times less than for the traditional voting procedure,for which the sets of descriptors are compared.The introduction of the system of hierarchical features allows to further reduce the calculation time by 2–3 times while ensuring high efficiency of classification.The noise immunity of the method to additive noise has been experimentally studied.According to the results of the research,the marginal degree of the hierarchy of features for reliable classification with the standard deviation of noise less than 30 is the 8-bit distribution.Computing costs increase proportionally with decreasing bit distribution.The method can be used for application tasks where object identification time is critical.展开更多
The repeatability rate is an important measure for evaluating and comparing the performance of keypoint detectors.Several repeatability rate measurementswere used in the literature to assess the effectiveness of keypo...The repeatability rate is an important measure for evaluating and comparing the performance of keypoint detectors.Several repeatability rate measurementswere used in the literature to assess the effectiveness of keypoint detectors.While these repeatability rates are calculated for pairs of images,the general assumption is that the reference image is often known and unchanging compared to other images in the same dataset.So,these rates are asymmetrical as they require calculations in only one direction.In addition,the image domain in which these computations take place substantially affects their values.The presented scatter diagram plots illustrate how these directional repeatability rates vary in relation to the size of the neighboring region in each pair of images.Therefore,both directional repeatability rates for the same image pair must be included when comparing different keypoint detectors.This paper,firstly,examines several commonly utilized repeatability rate measures for keypoint detector evaluations.The researcher then suggests computing a two-fold repeatability rate to assess keypoint detector performance on similar scene images.Next,the symmetric mean repeatability rate metric is computed using the given two-fold repeatability rates.Finally,these measurements are validated using well-known keypoint detectors on different image groups with various geometric and photometric attributes.展开更多
The problem of image recognition in the computer vision systems is being studied.The results of the development of efficient classification methods,given the figure of processing speed,based on the analysis of the seg...The problem of image recognition in the computer vision systems is being studied.The results of the development of efficient classification methods,given the figure of processing speed,based on the analysis of the segment representation of the structural description in the form of a set of descriptors are provided.We propose three versions of the classifier according to the following principles:“object-etalon”,“object descriptor-etalon”and“vector description of the object-etalon”,which are not similar in level of integration of researched data analysis.The options for constructing clusters over the whole set of descriptions of the etalon database,separately for each of the etalons,as well as the optimal method to compare sets of segment centers for the etalons and object,are implemented.An experimental rating of the efficiency of the created classifiers in terms of productivity,processing time,and classification quality has been realized of the applied.The proposed methods classify the set of etalons without error.We have formed the inference about the efficiency of classification approaches based on segment centers.The time of image processing according to the developedmethods is hundreds of times less than according to the traditional one,without reducing the accuracy.展开更多
Image keypoint detection and description is a popular method to find pixel-level connections between images,which is a basic and critical step in many computer vision tasks.The existing methods are far from optimal in...Image keypoint detection and description is a popular method to find pixel-level connections between images,which is a basic and critical step in many computer vision tasks.The existing methods are far from optimal in terms of keypoint positioning accuracy and generation of robust and discriminative descriptors.This paper proposes a new end-to-end selfsupervised training deep learning network.The network uses a backbone feature encoder to extract multi-level feature maps,then performs joint image keypoint detection and description in a forward pass.On the one hand,in order to enhance the localization accuracy of keypoints and restore the local shape structure,the detector detects keypoints on feature maps of the same resolution as the original image.On the other hand,in order to enhance the ability to percept local shape details,the network utilizes multi-level features to generate robust feature descriptors with rich local shape information.A detailed comparison with traditional feature-based methods Scale Invariant Feature Transform(SIFT),Speeded Up Robust Features(SURF)and deep learning methods on HPatches proves the effectiveness and robustness of the method proposed in this paper.展开更多
基金National Key Research and Development Program,China(No.2019YFC1521300)。
文摘With the development of the society,people's requirements for clothing matching are constantly increasing when developing clothing recommendation system.This requires that the algorithm for understanding the clothing images should be sufficiently efficient and robust.Therefore,we detect the keypoints in clothing accurately to capture the details of clothing images.Since the joint points of the garment are similar to those of the human body,this paper utilizes a kind of deep neural network called cascaded pyramid network(CPN)about estimating the posture of human body to solve the problem of keypoints detection in clothing.In this paper,we first introduce the structure and characteristic of this neural network when detecting keypoints.Then we evaluate the results of the experiments and verify effectiveness of detecting keypoints of clothing with CPN,with normalized error about 5%7%.Finally,we analyze the influence of different backbones when detecting keypoints in this network.
基金supported by Hainan Provincial Key Research and Development Program(NO:ZDYF2020018)Hainan Provincial Natural Science Foundation of China(NO:2019RC100)Haikou key research and development program(NO:2020-049).
文摘Big data is a comprehensive result of the development of the Internet of Things and information systems.Computer vision requires a lot of data as the basis for research.Because skeleton data can adapt well to dynamic environment and complex background,it is used in action recognition tasks.In recent years,skeleton-based action recognition has received more and more attention in the field of computer vision.Therefore,the keypoints of human skeletons are essential for describing the pose estimation of human and predicting the action recognition of the human.This paper proposes a skeleton point extraction method combined with object detection,which can focus on the extraction of skeleton keypoints.After a large number of experiments,our model can be combined with object detection for skeleton points extraction,and the detection efficiency is improved.
文摘Copy-move offense is considerably used to conceal or hide several data in the digital image for specific aim, and onto this offense some portion of the genuine image is reduplicated and pasted in the same image. Therefore, Copy-Move forgery is a very significant problem and active research area to check the confirmation of the image. In this paper, a system for Copy Move Forgery detection is proposed. The proposed system is composed of two stages: one is called the detection stages and the second is called the refine detection stage. The detection stage is executed using Speeded-Up Robust Feature (SURF) and Binary Robust Invariant Scalable Keypoints (BRISK) for feature detection and in the refine detection stage, image registration using non-linear transformation is used to enhance detection efficiency. Initially, the genuine image is picked, and then both SURF and BRISK feature extractions are used in parallel to detect the interest keypoints. This gives an appropriate number of interest points and gives the assurance for finding the majority of the manipulated regions. RANSAC is employed to find the superior group of matches to differentiate the manipulated parts. Then, non-linear transformation between the best-matched sets from both extraction features is used as an optimization to get the best-matched set and detect the copied regions. A number of numerical experiments performed using many benchmark datasets such as, the CASIA v2.0, MICC-220, MICC-F600 and MICC-F2000 datasets. With the proposed algorithm, an overall average detection accuracy of 95.33% is obtained for evaluation carried out with the aforementioned databases. Forgery detection achieved True Positive Rate of 97.4% for tampered images with object translation, different degree of rotation and enlargement. Thus, results from different datasets have been set, proving that the proposed algorithm can individuate the altered areas, with high reliability and dealing with multiple cloning.
基金the MSIT(Ministry of Science and ICT),Korea,under the ICAN(ICT Challenge and Advanced Network of HRD)Program(IITP-2024-RS-2022-00156326)the IITP(Institute of Information&Communications Technology Planning&Evaluation).Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2024R440)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.This research was supported by the Deanship of Scientific Research at Najran University,under the Research Group Funding program grant code(NU/RG/SERC/13/30).
文摘In the modern era of a growing population,it is arduous for humans to monitor every aspect of sports,events occurring around us,and scenarios or conditions.This recognition of different types of sports and events has increasingly incorporated the use of machine learning and artificial intelligence.This research focuses on detecting and recognizing events in sequential photos characterized by several factors,including the size,location,and position of people’s body parts in those pictures,and the influence around those people.Common approaches utilized,here are feature descriptors such as MSER(Maximally Stable Extremal Regions),SIFT(Scale-Invariant Feature Transform),and DOF(degree of freedom)between the joint points are applied to the skeleton points.Moreover,for the same purposes,other features such as BRISK(Binary Robust Invariant Scalable Keypoints),ORB(Oriented FAST and Rotated BRIEF),and HOG(Histogram of Oriented Gradients)are applied on full body or silhouettes.The integration of these techniques increases the discriminative nature of characteristics retrieved in the identification process of the event,hence improving the efficiency and reliability of the entire procedure.These extracted features are passed to the early fusion and DBscan for feature fusion and optimization.Then deep belief,network is employed for recognition.Experimental results demonstrate a separate experiment’s detection average recognition rate of 87%in the HMDB51 video database and 89%in the YouTube database,showing a better perspective than the current methods in sports and event identification.
基金This research was funded by Prince Sattam bin Abdulaziz University(Project Number PSAU/2023/01/25387).
文摘The research aims to improve the performance of image recognition methods based on a description in the form of a set of keypoint descriptors.The main focus is on increasing the speed of establishing the relevance of object and etalon descriptions while maintaining the required level of classification efficiency.The class to be recognized is represented by an infinite set of images obtained from the etalon by applying arbitrary geometric transformations.It is proposed to reduce the descriptions for the etalon database by selecting the most significant descriptor components according to the information content criterion.The informativeness of an etalon descriptor is estimated by the difference of the closest distances to its own and other descriptions.The developed method determines the relevance of the full description of the recognized object with the reduced description of the etalons.Several practical models of the classifier with different options for establishing the correspondence between object descriptors and etalons are considered.The results of the experimental modeling of the proposed methods for a database including images of museum jewelry are presented.The test sample is formed as a set of images from the etalon database and out of the database with the application of geometric transformations of scale and rotation in the field of view.The practical problems of determining the threshold for the number of votes,based on which a classification decision is made,have been researched.Modeling has revealed the practical possibility of tenfold reducing descriptions with full preservation of classification accuracy.Reducing the descriptions by twenty times in the experiment leads to slightly decreased accuracy.The speed of the analysis increases in proportion to the degree of reduction.The use of reduction by the informativeness criterion confirmed the possibility of obtaining the most significant subset of features for classification,which guarantees a decent level of accuracy.
基金the National Natural Science Foundation of China(Grant Number 62076246).
文摘Human pose estimation aims to localize the body joints from image or video data.With the development of deeplearning,pose estimation has become a hot research topic in the field of computer vision.In recent years,humanpose estimation has achieved great success in multiple fields such as animation and sports.However,to obtainaccurate positioning results,existing methods may suffer from large model sizes,a high number of parameters,and increased complexity,leading to high computing costs.In this paper,we propose a new lightweight featureencoder to construct a high-resolution network that reduces the number of parameters and lowers the computingcost.We also introduced a semantic enhancement module that improves global feature extraction and networkperformance by combining channel and spatial dimensions.Furthermore,we propose a dense connected spatialpyramid pooling module to compensate for the decrease in image resolution and information loss in the network.Finally,ourmethod effectively reduces the number of parameters and complexitywhile ensuring high performance.Extensive experiments show that our method achieves a competitive performance while dramatically reducing thenumber of parameters,and operational complexity.Specifically,our method can obtain 89.9%AP score on MPIIVAL,while the number of parameters and the complexity of operations were reduced by 41%and 36%,respectively.
基金supported by the Natural Science Foundation of Hubei Province of China under grant number 2022CFB536the National Natural Science Foundation of China under grant number 62367006the 15th Graduate Education Innovation Fund of Wuhan Institute of Technology under grant number CX2023579.
文摘Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we propose a U-shaped keypoint detection network(DAUNet)based on an improved ResNet subsampling structure and spatial grouping mechanism.This network addresses key challenges in traditional methods,such as information loss,large network redundancy,and insufficient sensitivity to low-resolution features.DAUNet is composed of three main components.First,we introduce an improved BottleNeck block that employs partial convolution and strip pooling to reduce computational load and mitigate feature loss.Second,after upsampling,the network eliminates redundant features,improving the overall efficiency.Finally,a lightweight spatial grouping attention mechanism is applied to enhance low-resolution semantic features within the feature map,allowing for better restoration of the original image size and higher accuracy.Experimental results demonstrate that DAUNet achieves superior accuracy compared to most existing keypoint detection models,with a mean PCKh@0.5 score of 91.6%on the MPII dataset and an AP of 76.1%on the COCO dataset.Moreover,real-world experiments further validate the robustness and generalizability of DAUNet for detecting human bodies in unknown environments,highlighting its potential for broader applications.
基金Supported by the National Natural Science Foundation of China (61802253)。
文摘To address the problem that traditional keypoint detection methods are susceptible to complex backgrounds and local similarity of images resulting in inaccurate descriptor matching and bias in visual localization, keypoints and descriptors based on cross-modality fusion are proposed and applied to the study of camera motion estimation. A convolutional neural network is used to detect the positions of keypoints and generate the corresponding descriptors, and the pyramid convolution is used to extract multi-scale features in the network. The problem of local similarity of images is solved by capturing local and global feature information and fusing the geometric position information of keypoints to generate descriptors. According to our experiments, the repeatability of our method is improved by 3.7%, and the homography estimation is improved by 1.6%. To demonstrate the practicability of the method, the visual odometry part of simultaneous localization and mapping is constructed and our method is 35% higher positioning accuracy than the traditional method.
文摘A severe problem in modern information systems is Digital media tampering along with fake information.Even though there is an enhancement in image development,image forgery,either by the photographer or via image manipulations,is also done in parallel.Numerous researches have been concentrated on how to identify such manipulated media or information manually along with automatically;thus conquering the complicated forgery methodologies with effortlessly obtainable technologically enhanced instruments.However,high complexity affects the developed methods.Presently,it is complicated to resolve the issue of the speed-accuracy trade-off.For tackling these challenges,this article put forward a quick and effective Copy-Move Forgery Detection(CMFD)system utilizing a novel Quad-sort Moth Flame(QMF)Light Gradient Boosting Machine(QMF-Light GBM).Utilizing Borel Transform(BT)-based Wiener Filter(BWF)and resizing,the input images are initially pre-processed by eliminating the noise in the proposed system.After that,by utilizing the Orientation Preserving Simple Linear Iterative Clustering(OPSLIC),the pre-processed images,partitioned into a number of grids,are segmented.Next,as of the segmented images,the significant features are extracted along with the feature’s distance is calculated and matched with the input images.Next,utilizing the Union Topological Measure of Pattern Diversity(UTMOPD)method,the false positive matches that took place throughout the matching process are eliminated.After that,utilizing the QMF-Light GBM visualization,the visualization of forged in conjunction with non-forged images is performed.The extensive experiments revealed that concerning detection accuracy,the proposed system could be extremely precise when contrasted to some top-notch approaches.
文摘This paper proposes a novel object detection method in which a set of local features inside the superpixels are extracted from the image under analysis acquired by a 3D visual sensor. To increase the segmentation accuracy, the proposed method firstly performs the segmentation of the image, under analysis, using the Simple Linear Iterative Clustering (SLIC) superpixels method. Next the key points inside each superpixel are estimated using the Speed-Up Robust Feature (SURF). These key points are then used to carry out the matching task for every detected keypoints of a scene inside the estimated superpixels. In addition, a probability map is introduced to describe the accuracy of the object detection results. Experimental results show that the proposed approach provides fairly good object detection and confirms the superior performance of proposed scene compared with other recently proposed methods such as the scheme proposed by Mae et al.
文摘This article presents a method for the description of key points using simple statistics for regions controlled by neighboring key points to remedy the gap in existing descriptors.Usually,the existent descriptors such as speeded up robust features(SURF),Kaze,binary robust invariant scalable keypoints(BRISK),features from accelerated segment test(FAST),and oriented FAST and rotated BRIEF(ORB)can competently detect,describe,and match images in the presence of some artifacts such as blur,compression,and illumination.However,the performance and reliability of these descriptors decrease for some imaging variations such as point of view,zoom(scale),and rotation.The intro-duced description method improves image matching in the event of such distor-tions.It utilizes a contourlet-based detector to detect the strongest key points within a specified window size.The selected key points and their neighbors con-trol the size and orientation of the surrounding regions,which are mapped on rec-tangular shapes using polar transformation.The resulting rectangular matrices are subjected to two-directional statistical operations that involve calculating the mean and standard deviation.Consequently,the descriptor obtained is invariant(translation,rotation,and scale)because of the two methods;the extraction of the region and the polar transformation techniques used in this paper.The descrip-tion method introduced in this article is tested against well-established and well-known descriptors,such as SURF,Kaze,BRISK,FAST,and ORB,techniques using the standard OXFORD dataset.The presented methodology demonstrated its ability to improve the match between distorted images compared to other descriptors in the literature.
文摘The results of the development of the new fast-speed method of classification images using a structural approach are presented.The method is based on the system of hierarchical features,based on the bitwise data distribution for the set of descriptors of image description.The article also proposes the use of the spatial data processing apparatus,which simplifies and accelerates the classification process.Experiments have shown that the time of calculation of the relevance for two descriptions according to their distributions is about 1000 times less than for the traditional voting procedure,for which the sets of descriptors are compared.The introduction of the system of hierarchical features allows to further reduce the calculation time by 2–3 times while ensuring high efficiency of classification.The noise immunity of the method to additive noise has been experimentally studied.According to the results of the research,the marginal degree of the hierarchy of features for reliable classification with the standard deviation of noise less than 30 is the 8-bit distribution.Computing costs increase proportionally with decreasing bit distribution.The method can be used for application tasks where object identification time is critical.
文摘The repeatability rate is an important measure for evaluating and comparing the performance of keypoint detectors.Several repeatability rate measurementswere used in the literature to assess the effectiveness of keypoint detectors.While these repeatability rates are calculated for pairs of images,the general assumption is that the reference image is often known and unchanging compared to other images in the same dataset.So,these rates are asymmetrical as they require calculations in only one direction.In addition,the image domain in which these computations take place substantially affects their values.The presented scatter diagram plots illustrate how these directional repeatability rates vary in relation to the size of the neighboring region in each pair of images.Therefore,both directional repeatability rates for the same image pair must be included when comparing different keypoint detectors.This paper,firstly,examines several commonly utilized repeatability rate measures for keypoint detector evaluations.The researcher then suggests computing a two-fold repeatability rate to assess keypoint detector performance on similar scene images.Next,the symmetric mean repeatability rate metric is computed using the given two-fold repeatability rates.Finally,these measurements are validated using well-known keypoint detectors on different image groups with various geometric and photometric attributes.
基金The authors received specific funding for this research-Project Number IF-PSAU-2021/01/18487.
文摘The problem of image recognition in the computer vision systems is being studied.The results of the development of efficient classification methods,given the figure of processing speed,based on the analysis of the segment representation of the structural description in the form of a set of descriptors are provided.We propose three versions of the classifier according to the following principles:“object-etalon”,“object descriptor-etalon”and“vector description of the object-etalon”,which are not similar in level of integration of researched data analysis.The options for constructing clusters over the whole set of descriptions of the etalon database,separately for each of the etalons,as well as the optimal method to compare sets of segment centers for the etalons and object,are implemented.An experimental rating of the efficiency of the created classifiers in terms of productivity,processing time,and classification quality has been realized of the applied.The proposed methods classify the set of etalons without error.We have formed the inference about the efficiency of classification approaches based on segment centers.The time of image processing according to the developedmethods is hundreds of times less than according to the traditional one,without reducing the accuracy.
基金This work was supported by the National Natural Science Foundation of China(61871046,SM,http://www.nsfc.gov.cn/).
文摘Image keypoint detection and description is a popular method to find pixel-level connections between images,which is a basic and critical step in many computer vision tasks.The existing methods are far from optimal in terms of keypoint positioning accuracy and generation of robust and discriminative descriptors.This paper proposes a new end-to-end selfsupervised training deep learning network.The network uses a backbone feature encoder to extract multi-level feature maps,then performs joint image keypoint detection and description in a forward pass.On the one hand,in order to enhance the localization accuracy of keypoints and restore the local shape structure,the detector detects keypoints on feature maps of the same resolution as the original image.On the other hand,in order to enhance the ability to percept local shape details,the network utilizes multi-level features to generate robust feature descriptors with rich local shape information.A detailed comparison with traditional feature-based methods Scale Invariant Feature Transform(SIFT),Speeded Up Robust Features(SURF)and deep learning methods on HPatches proves the effectiveness and robustness of the method proposed in this paper.