Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unman...Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unmanned aerial vehicle(UAV)imagery.Addressing these limitations,we propose a hybrid transformer-based detector,H-DETR,and enhance it for dense small objects,leading to an accurate and efficient model.Firstly,we introduce a hybrid transformer encoder,which integrates a convolutional neural network-based cross-scale fusion module with the original encoder to handle multi-scale feature sequences more efficiently.Furthermore,we propose two novel strategies to enhance detection performance without incurring additional inference computation.Query filter is designed to cope with the dense clustering inherent in drone-captured images by counteracting similar queries with a training-aware non-maximum suppression.Adversarial denoising learning is a novel enhancement method inspired by adversarial learning,which improves the detection of numerous small targets by counteracting the effects of artificial spatial and semantic noise.Extensive experiments on the VisDrone and UAVDT datasets substantiate the effectiveness of our approach,achieving a significant improvement in accuracy with a reduction in computational complexity.Our method achieves 31.9%and 21.1%AP on the VisDrone and UAVDT datasets,respectively,and has a faster inference speed,making it a competitive model in UAV image object detection.展开更多
Increasingly advanced image processing technology has made digital image editing easier and easier.With image processing software at one’s fingertips,one can easily alter the content of an image,and the altered image...Increasingly advanced image processing technology has made digital image editing easier and easier.With image processing software at one’s fingertips,one can easily alter the content of an image,and the altered image is so realistic that it is illegible to the naked eye.These tampered images have posed a serious threat to personal privacy,social order,and national security.Therefore,detecting and locating tampered areas in images has important practical significance,and has become an important research topic in the field of multimedia information security.In recent years,deep learning technology has been widely used in image tampering localization,and the achieved performance has significantly surpassed traditional tampering forensics methods.This paper mainly sorts out the relevant knowledge and latest methods in the field of image tampering detection based on deep learning.According to the two types of tampering detection based on deep learning,the detection tasks of the method are detailed separately,and the problems and future prospects in this field are discussed.It is quite different from the existing work:(1)This paper mainly focuses on the problem of image tampering detection,so it does not elaborate on various forensic methods.(2)This paper focuses on the detectionmethod of image tampering based on deep learning.(3)This paper is driven by the needs of tampering targets,so it pays more attention to sorting out methods for different tampering detection tasks.展开更多
Diagnosing various diseases such as glaucoma,age-related macular degeneration,cardiovascular conditions,and diabetic retinopathy involves segmenting retinal blood vessels.The task is particularly challenging when deal...Diagnosing various diseases such as glaucoma,age-related macular degeneration,cardiovascular conditions,and diabetic retinopathy involves segmenting retinal blood vessels.The task is particularly challenging when dealing with color fundus images due to issues like non-uniformillumination,low contrast,and variations in vessel appearance,especially in the presence of different pathologies.Furthermore,the speed of the retinal vessel segmentation system is of utmost importance.With the surge of now available big data,the speed of the algorithm becomes increasingly important,carrying almost equivalent weightage to the accuracy of the algorithm.To address these challenges,we present a novel approach for retinal vessel segmentation,leveraging efficient and robust techniques based on multiscale line detection and mathematical morphology.Our algorithm’s performance is evaluated on two publicly available datasets,namely the Digital Retinal Images for Vessel Extraction dataset(DRIVE)and the Structure Analysis of Retina(STARE)dataset.The experimental results demonstrate the effectiveness of our method,withmean accuracy values of 0.9467 forDRIVE and 0.9535 for STARE datasets,aswell as sensitivity values of 0.6952 forDRIVE and 0.6809 for STARE datasets.Notably,our algorithmexhibits competitive performance with state-of-the-art methods.Importantly,it operates at an average speed of 3.73 s per image for DRIVE and 3.75 s for STARE datasets.It is worth noting that these results were achieved using Matlab scripts containing multiple loops.This suggests that the processing time can be further reduced by replacing loops with vectorization.Thus the proposed algorithm can be deployed in real time applications.In summary,our proposed system strikes a fine balance between swift computation and accuracy that is on par with the best available methods in the field.展开更多
In clinical practice,the microscopic examination of urine sediment is considered an important in vitro examination with many broad applications.Measuring the amount of each type of urine sediment allows for screening,...In clinical practice,the microscopic examination of urine sediment is considered an important in vitro examination with many broad applications.Measuring the amount of each type of urine sediment allows for screening,diagnosis and evaluation of kidney and urinary tract disease,providing insight into the specific type and severity.However,manual urine sediment examination is labor-intensive,time-consuming,and subjective.Traditional machine learning based object detection methods require hand-crafted features for localization and classification,which have poor generalization capabilities and are difficult to quickly and accurately detect the number of urine sediments.Deep learning based object detection methods have the potential to address the challenges mentioned above,but these methods require access to large urine sediment image datasets.Unfortunately,only a limited number of publicly available urine sediment datasets are currently available.To alleviate the lack of urine sediment datasets in medical image analysis,we propose a new dataset named UriSed2K,which contains 2465 high-quality images annotated with expert guidance.Two main challenges are associated with our dataset:a large number of small objects and the occlusion between these small objects.Our manuscript focuses on applying deep learning object detection methods to the urine sediment dataset and addressing the challenges presented by this dataset.Specifically,our goal is to improve the accuracy and efficiency of the detection algorithm and,in doing so,provide medical professionals with an automatic detector that saves time and effort.We propose an improved lightweight one-stage object detection algorithm called Discriminatory-YOLO.The proposed algorithm comprises a local context attention module and a global background suppression module,which aid the detector in distinguishing urine sediment features in the image.The local context attention module captures context information beyond the object region,while the global background suppression module emphasizes objects in uninformative backgrounds.We comprehensively evaluate our method on the UriSed2K dataset,which includes seven categories of urine sediments,such as erythrocytes(red blood cells),leukocytes(white blood cells),epithelial cells,crystals,mycetes,broken erythrocytes,and broken leukocytes,achieving the best average precision(AP)of 95.3%while taking only 10 ms per image.The source code and dataset are available at https://github.com/binghuiwu98/discriminatoryyolov5.展开更多
A novel image encryption scheme based on parallel compressive sensing and edge detection embedding technology is proposed to improve visual security. Firstly, the plain image is sparsely represented using the discrete...A novel image encryption scheme based on parallel compressive sensing and edge detection embedding technology is proposed to improve visual security. Firstly, the plain image is sparsely represented using the discrete wavelet transform.Then, the coefficient matrix is scrambled and compressed to obtain a size-reduced image using the Fisher–Yates shuffle and parallel compressive sensing. Subsequently, to increase the security of the proposed algorithm, the compressed image is re-encrypted through permutation and diffusion to obtain a noise-like secret image. Finally, an adaptive embedding method based on edge detection for different carrier images is proposed to generate a visually meaningful cipher image. To improve the plaintext sensitivity of the algorithm, the counter mode is combined with the hash function to generate keys for chaotic systems. Additionally, an effective permutation method is designed to scramble the pixels of the compressed image in the re-encryption stage. The simulation results and analyses demonstrate that the proposed algorithm performs well in terms of visual security and decryption quality.展开更多
Object detection in unmanned aerial vehicle(UAV)aerial images has become increasingly important in military and civil applications.General object detection models are not robust enough against interclass similarity an...Object detection in unmanned aerial vehicle(UAV)aerial images has become increasingly important in military and civil applications.General object detection models are not robust enough against interclass similarity and intraclass variability of small objects,and UAV-specific nuisances such as uncontrolledweather conditions.Unlike previous approaches focusing on high-level semantic information,we report the importance of underlying features to improve detection accuracy and robustness fromthe information-theoretic perspective.Specifically,we propose a robust and discriminative feature learning approach through mutual information maximization(RD-MIM),which can be integrated into numerous object detection methods for aerial images.Firstly,we present the rank sample mining method to reduce underlying feature differences between the natural image domain and the aerial image domain.Then,we design a momentum contrast learning strategy to make object features similar to the same category and dissimilar to different categories.Finally,we construct a transformer-based global attention mechanism to boost object location semantics by leveraging the high interrelation of different receptive fields.We conduct extensive experiments on the VisDrone and Unmanned Aerial Vehicle Benchmark Object Detection and Tracking(UAVDT)datasets to prove the effectiveness of the proposed method.The experimental results show that our approach brings considerable robustness gains to basic detectors and advanced detection methods,achieving relative growth rates of 51.0%and 39.4%in corruption robustness,respectively.Our code is available at https://github.com/cq100/RD-MIM(accessed on 2 August 2024).展开更多
The intelligent detection technology driven by X-ray images and deep learning represents the forefront of advanced techniques and development trends in flaw detection and automated evaluation of light alloy castings.H...The intelligent detection technology driven by X-ray images and deep learning represents the forefront of advanced techniques and development trends in flaw detection and automated evaluation of light alloy castings.However,the efficacy of deep learning models hinges upon a substantial abundance of flaw samples.The existing research on X-ray image augmentation for flaw detection suffers from shortcomings such as poor diversity of flaw samples and low reliability of quality evaluation.To this end,a novel approach was put forward,which involves the creation of the Interpolation-Deep Convolutional Generative Adversarial Network(I-DCGAN)for flaw detection image generation and a comprehensive evaluation algorithm named TOPSIS-IFP.I-DCGAN enables the generation of high-resolution,diverse simulated images with multiple appearances,achieving an improvement in sample diversity and quality while maintaining a relatively lower computational complexity.TOPSIS-IFP facilitates multi-dimensional quality evaluation,including aspects such as diversity,authenticity,image distribution difference,and image distortion degree.The results indicate that the X-ray radiographic images of magnesium and aluminum alloy castings achieve optimal performance when trained up to the 800th and 600th epochs,respectively.The TOPSIS-IFP value reaches 78.7%and 73.8%similarity to the ideal solution,respectively.Compared to single index evaluation,the TOPSIS-IFP algorithm achieves higher-quality simulated images at the optimal training epoch.This approach successfully mitigates the issue of unreliable quality associated with single index evaluation.The image generation and comprehensive quality evaluation method developed in this paper provides a novel approach for image augmentation in flaw recognition,holding significant importance for enhancing the robustness of subsequent flaw recognition networks.展开更多
Multi-label image classification is recognized as an important task within the field of computer vision,a discipline that has experienced a significant escalation in research endeavors in recent years.The widespread a...Multi-label image classification is recognized as an important task within the field of computer vision,a discipline that has experienced a significant escalation in research endeavors in recent years.The widespread adoption of convolutional neural networks(CNNs)has catalyzed the remarkable success of architectures such as ResNet-101 within the domain of image classification.However,inmulti-label image classification tasks,it is crucial to consider the correlation between labels.In order to improve the accuracy and performance of multi-label classification and fully combine visual and semantic features,many existing studies use graph convolutional networks(GCN)for modeling.Object detection and multi-label image classification exhibit a degree of conceptual overlap;however,the integration of these two tasks within a unified framework has been relatively underexplored in the existing literature.In this paper,we come up with Object-GCN framework,a model combining object detection network YOLOv5 and graph convolutional network,and we carry out a thorough experimental analysis using a range of well-established public datasets.The designed framework Object-GCN achieves significantly better performance than existing studies in public datasets COCO2014,VOC2007,VOC2012.The final results achieved are 86.9%,96.7%,and 96.3%mean Average Precision(mAP)across the three datasets.展开更多
We propose a novel image segmentation algorithm to tackle the challenge of limited recognition and segmentation performance in identifying welding seam images during robotic intelligent operations.Initially,to enhance...We propose a novel image segmentation algorithm to tackle the challenge of limited recognition and segmentation performance in identifying welding seam images during robotic intelligent operations.Initially,to enhance the capability of deep neural networks in extracting geometric attributes from depth images,we developed a novel deep geometric convolution operator(DGConv).DGConv is utilized to construct a deep local geometric feature extraction module,facilitating a more comprehensive exploration of the intrinsic geometric information within depth images.Secondly,we integrate the newly proposed deep geometric feature module with the Fully Convolutional Network(FCN8)to establish a high-performance deep neural network algorithm tailored for depth image segmentation.Concurrently,we enhance the FCN8 detection head by separating the segmentation and classification processes.This enhancement significantly boosts the network’s overall detection capability.Thirdly,for a comprehensive assessment of our proposed algorithm and its applicability in real-world industrial settings,we curated a line-scan image dataset featuring weld seams.This dataset,named the Standardized Linear Depth Profile(SLDP)dataset,was collected from actual industrial sites where autonomous robots are in operation.Ultimately,we conducted experiments utilizing the SLDP dataset,achieving an average accuracy of 92.7%.Our proposed approach exhibited a remarkable performance improvement over the prior method on the identical dataset.Moreover,we have successfully deployed the proposed algorithm in genuine industrial environments,fulfilling the prerequisites of unmanned robot operations.展开更多
Breast cancer is a significant threat to the global population,affecting not only women but also a threat to the entire population.With recent advancements in digital pathology,Eosin and hematoxylin images provide enh...Breast cancer is a significant threat to the global population,affecting not only women but also a threat to the entire population.With recent advancements in digital pathology,Eosin and hematoxylin images provide enhanced clarity in examiningmicroscopic features of breast tissues based on their staining properties.Early cancer detection facilitates the quickening of the therapeutic process,thereby increasing survival rates.The analysis made by medical professionals,especially pathologists,is time-consuming and challenging,and there arises a need for automated breast cancer detection systems.The upcoming artificial intelligence platforms,especially deep learning models,play an important role in image diagnosis and prediction.Initially,the histopathology biopsy images are taken from standard data sources.Further,the gathered images are given as input to the Multi-Scale Dilated Vision Transformer,where the essential features are acquired.Subsequently,the features are subjected to the Bidirectional Long Short-Term Memory(Bi-LSTM)for classifying the breast cancer disorder.The efficacy of the model is evaluated using divergent metrics.When compared with other methods,the proposed work reveals that it offers impressive results for detection.展开更多
Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such...Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such as information extraction.However,because of the diversity in the shapes and sizes of tables,existing table detection methods adapted from general object detection algorithms,have not yet achieved satisfactory results.Incorrect detection results might lead to the loss of critical information.Methods Therefore,we propose a novel end-to-end trainable deep network combined with a self-supervised pretraining transformer for feature extraction to minimize incorrect detections.To better deal with table areas of different shapes and sizes,we added a dualbranch context content attention module(DCCAM)to high-dimensional features to extract context content information,thereby enhancing the network's ability to learn shape features.For feature fusion at different scales,we replaced the original 3×3 convolution with a multilayer residual module,which contains enhanced gradient flow information to improve the feature representation and extraction capability.Results We evaluated our method on public document datasets and compared it with previous methods,which achieved state-of-the-art results in terms of evaluation metrics such as recall and F1-score.https://github.com/Yong Z-Lee/TD-DCCAM.展开更多
Unmanned aerial vehicles(UAVs)have been widely used in military,medical,wireless communications,aerial surveillance,etc.One key topic involving UAVs is pose estimation in autonomous navigation.A standard procedure for...Unmanned aerial vehicles(UAVs)have been widely used in military,medical,wireless communications,aerial surveillance,etc.One key topic involving UAVs is pose estimation in autonomous navigation.A standard procedure for this process is to combine inertial navigation system sensor information with the global navigation satellite system(GNSS)signal.However,some factors can interfere with the GNSS signal,such as ionospheric scintillation,jamming,or spoofing.One alternative method to avoid using the GNSS signal is to apply an image processing approach by matching UAV images with georeferenced images.But a high effort is required for image edge extraction.Here a support vector regression(SVR)model is proposed to reduce this computational load and processing time.The dynamic partial reconfiguration(DPR)of part of the SVR datapath is implemented to accelerate the process,reduce the area,and analyze its granularity by increasing the grain size of the reconfigurable region.Results show that the implementation in hardware is 68 times faster than that in software.This architecture with DPR also facilitates the low power consumption of 4 mW,leading to a reduction of 57%than that without DPR.This is also the lowest power consumption in current machine learning hardware implementations.Besides,the circuitry area is 41 times smaller.SVR with Gaussian kernel shows a success rate of 99.18%and minimum square error of 0.0146 for testing with the planning trajectory.This system is useful for adaptive applications where the user/designer can modify/reconfigure the hardware layout during its application,thus contributing to lower power consumption,smaller hardware area,and shorter execution time.展开更多
Angle detection is a crucial aspect of industrial automation,ensuring precise alignment and orientation ofcomponents in manufacturing processes.Despite the widespread application of computer vision in industrialsettin...Angle detection is a crucial aspect of industrial automation,ensuring precise alignment and orientation ofcomponents in manufacturing processes.Despite the widespread application of computer vision in industrialsettings,angle detection remains an underexplored domain,with limited integration into production lines.Thispaper addresses the need for automated angle detection in industrial environments by presenting a methodologythat eliminates training time and higher computation cost on Graphics Processing Unit(GPU)from machinelearning in computer vision(e.g.,Convolutional Neural Networks(CNN)).Our approach leverages advanced imageprocessing techniques and a strategic combination of algorithms,including contour selection,circle regression,polar warp transformation,and outlier detection,to provide an adaptive solution for angle detection.By configuringthe algorithm with a diverse dataset and evaluating its performance across various objects,we demonstrate itsefficacy in achieving reliable results,with an average error of only 0.5 degrees.Notably,this error margin is 3.274times lower than the acceptable threshold.Our study highlights the importance of accurate angle detection inindustrial settings and showcases the reliability of our algorithm in accurately determining angles,thus contributingto improved manufacturing processes.展开更多
The detection and characterization of human veins using infrared (IR) image processing have gained significant attention due to its potential applications in biometric identification, medical diagnostics, and vein-bas...The detection and characterization of human veins using infrared (IR) image processing have gained significant attention due to its potential applications in biometric identification, medical diagnostics, and vein-based authentication systems. This paper presents a low-cost approach for automatic detection and characterization of human veins from IR images. The proposed method uses image processing techniques including segmentation, feature extraction, and, pattern recognition algorithms. Initially, the IR images are preprocessed to enhance vein structures and reduce noise. Subsequently, a CLAHE algorithm is employed to extract vein regions based on their unique IR absorption properties. Features such as vein thickness, orientation, and branching patterns are extracted using mathematical morphology and directional filters. Finally, a classification framework is implemented to categorize veins and distinguish them from surrounding tissues or artifacts. A setup based on Raspberry Pi was used. Experimental results of IR images demonstrate the effectiveness and robustness of the proposed approach in accurately detecting and characterizing human. The developed system shows promising for integration into applications requiring reliable and secure identification based on vein patterns. Our work provides an effective and low-cost solution for nursing staff in low and middle-income countries to perform a safe and accurate venipuncture.展开更多
A blind digital image forensic method for detecting copy-paste forgery between JPEG images was proposed.Two copy-paste tampering scenarios were introduced at first:the tampered image was saved in an uncompressed forma...A blind digital image forensic method for detecting copy-paste forgery between JPEG images was proposed.Two copy-paste tampering scenarios were introduced at first:the tampered image was saved in an uncompressed format or in a JPEG compressed format.Then the proposed detection method was analyzed and simulated for all the cases of the two tampering scenarios.The tampered region is detected by computing the averaged sum of absolute difference(ASAD) images between the examined image and a resaved JPEG compressed image at different quality factors.The experimental results show the advantages of the proposed method:capability of detecting small and/or multiple tampered regions,simple computation,and hence fast speed in processing.展开更多
Change detection(CD)is becoming indispensable for unmanned aerial vehicles(UAVs),especially in the domain of water landing,rescue and search.However,even the most advanced models require large amounts of data for mode...Change detection(CD)is becoming indispensable for unmanned aerial vehicles(UAVs),especially in the domain of water landing,rescue and search.However,even the most advanced models require large amounts of data for model training and testing.Therefore,sufficient labeled images with different imaging conditions are needed.Inspired by computer graphics,we present a cloning method to simulate inland-water scene and collect an auto-labeled simulated dataset.The simulated dataset consists of six challenges to test the effects of dynamic background,weather,and noise on change detection models.Then,we propose an image translation framework that translates simulated images to synthetic images.This framework uses shared parameters(encoder and generator)and 22×22 receptive fields(discriminator)to generate realistic synthetic images as model training sets.The experimental results indicate that:1)different imaging challenges affect the performance of change detection models;2)compared with simulated images,synthetic images can effectively improve the accuracy of supervised models.展开更多
The clustering technique is used to examine each pixel in the image which assigned to one of the clusters depending on the minimum distance to obtain primary classified image into different intensity regions. A waters...The clustering technique is used to examine each pixel in the image which assigned to one of the clusters depending on the minimum distance to obtain primary classified image into different intensity regions. A watershed transformation technique is then employes. This includes: gradient of the classified image, dividing the image into markers, checking the Marker Image to see if it has zero points (watershed lines). The watershed lines are then deleted in the Marker Image created by watershed algorithm. A Region Adjacency Graph (RAG) and Region Adjacency Boundary (RAB) are created between two regions from Marker Image. Finally region merging is done according to region average intensity and two edge strengths (T1, T2). The approach of the authors is tested on remote sensing and brain MR medical images. The final segmentation result is one closed boundary per actual region in the image.展开更多
Camouflaged people are extremely expert in actively concealing themselves by effectively utilizing cover and the surrounding environment. Despite advancements in optical detection capabilities through imaging systems,...Camouflaged people are extremely expert in actively concealing themselves by effectively utilizing cover and the surrounding environment. Despite advancements in optical detection capabilities through imaging systems, including spectral, polarization, and infrared technologies, there is still a lack of effective real-time method for accurately detecting small-size and high-efficient camouflaged people in complex real-world scenes. Here, this study proposes a snapshot multispectral image-based camouflaged detection model, multispectral YOLO(MS-YOLO), which utilizes the SPD-Conv and Sim AM modules to effectively represent targets and suppress background interference by exploiting the spatial-spectral target information. Besides, the study constructs the first real-shot multispectral camouflaged people dataset(MSCPD), which encompasses diverse scenes, target scales, and attitudes. To minimize information redundancy, MS-YOLO selects an optimal subset of 12 bands with strong feature representation and minimal inter-band correlation as input. Through experiments on the MSCPD, MS-YOLO achieves a mean Average Precision of 94.31% and real-time detection at 65 frames per second, which confirms the effectiveness and efficiency of our method in detecting camouflaged people in various typical desert and forest scenes. Our approach offers valuable support to improve the perception capabilities of unmanned aerial vehicles in detecting enemy forces and rescuing personnel in battlefield.展开更多
Image forging is the alteration of a digital image to conceal some of the necessary or helpful information.It cannot be easy to distinguish themodified region fromthe original image in somecircumstances.The demand for...Image forging is the alteration of a digital image to conceal some of the necessary or helpful information.It cannot be easy to distinguish themodified region fromthe original image in somecircumstances.The demand for authenticity and the integrity of the image drive the detection of a fabricated image.There have been cases of ownership infringements or fraudulent actions by counterfeiting multimedia files,including re-sampling or copy-moving.This work presents a high-level view of the forensics of digital images and their possible detection approaches.This work presents a thorough analysis of digital image forgery detection techniques with their steps and effectiveness.These methods have identified forgery and its type and compared it with state of the art.This work will help us to find the best forgery detection technique based on the different environments.It also shows the current issues in other methods,which can help researchers find future scope for further research in this field.展开更多
The network infrastructure has evolved rapidly due to the everincreasing volume of users and data.The massive number of online devices and users has forced the network to transform and facilitate the operational neces...The network infrastructure has evolved rapidly due to the everincreasing volume of users and data.The massive number of online devices and users has forced the network to transform and facilitate the operational necessities of consumers.Among these necessities,network security is of prime significance.Network intrusion detection systems(NIDS)are among the most suitable approaches to detect anomalies and assaults on a network.However,keeping up with the network security requirements is quite challenging due to the constant mutation in attack patterns by the intruders.This paper presents an effective and prevalent framework for NIDS by merging image processing with convolution neural networks(CNN).The proposed framework first converts non-image data from network traffic into images and then further enhances those images by using the Gabor filter.The images are then classified using a CNN classifier.To assess the efficacy of the recommended method,four benchmark datasets i.e.,CSE-CIC-IDS2018,CIC-IDS-2017,ISCX-IDS 2012,and NSL-KDD were used.The proposed approach showed higher precision in contrast with the recent work on the mentioned datasets.Further,the proposed method is compared with the recent well-known image processing methods for NIDS.展开更多
基金This research was funded by the Natural Science Foundation of Hebei Province(F2021506004).
文摘Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unmanned aerial vehicle(UAV)imagery.Addressing these limitations,we propose a hybrid transformer-based detector,H-DETR,and enhance it for dense small objects,leading to an accurate and efficient model.Firstly,we introduce a hybrid transformer encoder,which integrates a convolutional neural network-based cross-scale fusion module with the original encoder to handle multi-scale feature sequences more efficiently.Furthermore,we propose two novel strategies to enhance detection performance without incurring additional inference computation.Query filter is designed to cope with the dense clustering inherent in drone-captured images by counteracting similar queries with a training-aware non-maximum suppression.Adversarial denoising learning is a novel enhancement method inspired by adversarial learning,which improves the detection of numerous small targets by counteracting the effects of artificial spatial and semantic noise.Extensive experiments on the VisDrone and UAVDT datasets substantiate the effectiveness of our approach,achieving a significant improvement in accuracy with a reduction in computational complexity.Our method achieves 31.9%and 21.1%AP on the VisDrone and UAVDT datasets,respectively,and has a faster inference speed,making it a competitive model in UAV image object detection.
基金supported by Key Projects of Innovation and Entrepreneurship Training Program for College Students in Jiangsu Province of China(202210300028Z).
文摘Increasingly advanced image processing technology has made digital image editing easier and easier.With image processing software at one’s fingertips,one can easily alter the content of an image,and the altered image is so realistic that it is illegible to the naked eye.These tampered images have posed a serious threat to personal privacy,social order,and national security.Therefore,detecting and locating tampered areas in images has important practical significance,and has become an important research topic in the field of multimedia information security.In recent years,deep learning technology has been widely used in image tampering localization,and the achieved performance has significantly surpassed traditional tampering forensics methods.This paper mainly sorts out the relevant knowledge and latest methods in the field of image tampering detection based on deep learning.According to the two types of tampering detection based on deep learning,the detection tasks of the method are detailed separately,and the problems and future prospects in this field are discussed.It is quite different from the existing work:(1)This paper mainly focuses on the problem of image tampering detection,so it does not elaborate on various forensic methods.(2)This paper focuses on the detectionmethod of image tampering based on deep learning.(3)This paper is driven by the needs of tampering targets,so it pays more attention to sorting out methods for different tampering detection tasks.
文摘Diagnosing various diseases such as glaucoma,age-related macular degeneration,cardiovascular conditions,and diabetic retinopathy involves segmenting retinal blood vessels.The task is particularly challenging when dealing with color fundus images due to issues like non-uniformillumination,low contrast,and variations in vessel appearance,especially in the presence of different pathologies.Furthermore,the speed of the retinal vessel segmentation system is of utmost importance.With the surge of now available big data,the speed of the algorithm becomes increasingly important,carrying almost equivalent weightage to the accuracy of the algorithm.To address these challenges,we present a novel approach for retinal vessel segmentation,leveraging efficient and robust techniques based on multiscale line detection and mathematical morphology.Our algorithm’s performance is evaluated on two publicly available datasets,namely the Digital Retinal Images for Vessel Extraction dataset(DRIVE)and the Structure Analysis of Retina(STARE)dataset.The experimental results demonstrate the effectiveness of our method,withmean accuracy values of 0.9467 forDRIVE and 0.9535 for STARE datasets,aswell as sensitivity values of 0.6952 forDRIVE and 0.6809 for STARE datasets.Notably,our algorithmexhibits competitive performance with state-of-the-art methods.Importantly,it operates at an average speed of 3.73 s per image for DRIVE and 3.75 s for STARE datasets.It is worth noting that these results were achieved using Matlab scripts containing multiple loops.This suggests that the processing time can be further reduced by replacing loops with vectorization.Thus the proposed algorithm can be deployed in real time applications.In summary,our proposed system strikes a fine balance between swift computation and accuracy that is on par with the best available methods in the field.
基金This work was partially supported by the National Natural Science Foundation of China(Grant Nos.61906168,U20A20171)Zhejiang Provincial Natural Science Foundation of China(Grant Nos.LY23F020023,LY21F020027)Construction of Hubei Provincial Key Laboratory for Intelligent Visual Monitoring of Hydropower Projects(Grant Nos.2022SDSJ01).
文摘In clinical practice,the microscopic examination of urine sediment is considered an important in vitro examination with many broad applications.Measuring the amount of each type of urine sediment allows for screening,diagnosis and evaluation of kidney and urinary tract disease,providing insight into the specific type and severity.However,manual urine sediment examination is labor-intensive,time-consuming,and subjective.Traditional machine learning based object detection methods require hand-crafted features for localization and classification,which have poor generalization capabilities and are difficult to quickly and accurately detect the number of urine sediments.Deep learning based object detection methods have the potential to address the challenges mentioned above,but these methods require access to large urine sediment image datasets.Unfortunately,only a limited number of publicly available urine sediment datasets are currently available.To alleviate the lack of urine sediment datasets in medical image analysis,we propose a new dataset named UriSed2K,which contains 2465 high-quality images annotated with expert guidance.Two main challenges are associated with our dataset:a large number of small objects and the occlusion between these small objects.Our manuscript focuses on applying deep learning object detection methods to the urine sediment dataset and addressing the challenges presented by this dataset.Specifically,our goal is to improve the accuracy and efficiency of the detection algorithm and,in doing so,provide medical professionals with an automatic detector that saves time and effort.We propose an improved lightweight one-stage object detection algorithm called Discriminatory-YOLO.The proposed algorithm comprises a local context attention module and a global background suppression module,which aid the detector in distinguishing urine sediment features in the image.The local context attention module captures context information beyond the object region,while the global background suppression module emphasizes objects in uninformative backgrounds.We comprehensively evaluate our method on the UriSed2K dataset,which includes seven categories of urine sediments,such as erythrocytes(red blood cells),leukocytes(white blood cells),epithelial cells,crystals,mycetes,broken erythrocytes,and broken leukocytes,achieving the best average precision(AP)of 95.3%while taking only 10 ms per image.The source code and dataset are available at https://github.com/binghuiwu98/discriminatoryyolov5.
基金supported by the Key Area R&D Program of Guangdong Province (Grant No.2022B0701180001)the National Natural Science Foundation of China (Grant No.61801127)+1 种基金the Science Technology Planning Project of Guangdong Province,China (Grant Nos.2019B010140002 and 2020B111110002)the Guangdong-Hong Kong-Macao Joint Innovation Field Project (Grant No.2021A0505080006)。
文摘A novel image encryption scheme based on parallel compressive sensing and edge detection embedding technology is proposed to improve visual security. Firstly, the plain image is sparsely represented using the discrete wavelet transform.Then, the coefficient matrix is scrambled and compressed to obtain a size-reduced image using the Fisher–Yates shuffle and parallel compressive sensing. Subsequently, to increase the security of the proposed algorithm, the compressed image is re-encrypted through permutation and diffusion to obtain a noise-like secret image. Finally, an adaptive embedding method based on edge detection for different carrier images is proposed to generate a visually meaningful cipher image. To improve the plaintext sensitivity of the algorithm, the counter mode is combined with the hash function to generate keys for chaotic systems. Additionally, an effective permutation method is designed to scramble the pixels of the compressed image in the re-encryption stage. The simulation results and analyses demonstrate that the proposed algorithm performs well in terms of visual security and decryption quality.
基金supported by the National Natural Science Foundation of China under Grant 61671219.
文摘Object detection in unmanned aerial vehicle(UAV)aerial images has become increasingly important in military and civil applications.General object detection models are not robust enough against interclass similarity and intraclass variability of small objects,and UAV-specific nuisances such as uncontrolledweather conditions.Unlike previous approaches focusing on high-level semantic information,we report the importance of underlying features to improve detection accuracy and robustness fromthe information-theoretic perspective.Specifically,we propose a robust and discriminative feature learning approach through mutual information maximization(RD-MIM),which can be integrated into numerous object detection methods for aerial images.Firstly,we present the rank sample mining method to reduce underlying feature differences between the natural image domain and the aerial image domain.Then,we design a momentum contrast learning strategy to make object features similar to the same category and dissimilar to different categories.Finally,we construct a transformer-based global attention mechanism to boost object location semantics by leveraging the high interrelation of different receptive fields.We conduct extensive experiments on the VisDrone and Unmanned Aerial Vehicle Benchmark Object Detection and Tracking(UAVDT)datasets to prove the effectiveness of the proposed method.The experimental results show that our approach brings considerable robustness gains to basic detectors and advanced detection methods,achieving relative growth rates of 51.0%and 39.4%in corruption robustness,respectively.Our code is available at https://github.com/cq100/RD-MIM(accessed on 2 August 2024).
基金funded by the National Key R&D Program of China(2020YFB1710100)the National Natural Science Foundation of China(Nos.52275337,52090042,51905188).
文摘The intelligent detection technology driven by X-ray images and deep learning represents the forefront of advanced techniques and development trends in flaw detection and automated evaluation of light alloy castings.However,the efficacy of deep learning models hinges upon a substantial abundance of flaw samples.The existing research on X-ray image augmentation for flaw detection suffers from shortcomings such as poor diversity of flaw samples and low reliability of quality evaluation.To this end,a novel approach was put forward,which involves the creation of the Interpolation-Deep Convolutional Generative Adversarial Network(I-DCGAN)for flaw detection image generation and a comprehensive evaluation algorithm named TOPSIS-IFP.I-DCGAN enables the generation of high-resolution,diverse simulated images with multiple appearances,achieving an improvement in sample diversity and quality while maintaining a relatively lower computational complexity.TOPSIS-IFP facilitates multi-dimensional quality evaluation,including aspects such as diversity,authenticity,image distribution difference,and image distortion degree.The results indicate that the X-ray radiographic images of magnesium and aluminum alloy castings achieve optimal performance when trained up to the 800th and 600th epochs,respectively.The TOPSIS-IFP value reaches 78.7%and 73.8%similarity to the ideal solution,respectively.Compared to single index evaluation,the TOPSIS-IFP algorithm achieves higher-quality simulated images at the optimal training epoch.This approach successfully mitigates the issue of unreliable quality associated with single index evaluation.The image generation and comprehensive quality evaluation method developed in this paper provides a novel approach for image augmentation in flaw recognition,holding significant importance for enhancing the robustness of subsequent flaw recognition networks.
文摘Multi-label image classification is recognized as an important task within the field of computer vision,a discipline that has experienced a significant escalation in research endeavors in recent years.The widespread adoption of convolutional neural networks(CNNs)has catalyzed the remarkable success of architectures such as ResNet-101 within the domain of image classification.However,inmulti-label image classification tasks,it is crucial to consider the correlation between labels.In order to improve the accuracy and performance of multi-label classification and fully combine visual and semantic features,many existing studies use graph convolutional networks(GCN)for modeling.Object detection and multi-label image classification exhibit a degree of conceptual overlap;however,the integration of these two tasks within a unified framework has been relatively underexplored in the existing literature.In this paper,we come up with Object-GCN framework,a model combining object detection network YOLOv5 and graph convolutional network,and we carry out a thorough experimental analysis using a range of well-established public datasets.The designed framework Object-GCN achieves significantly better performance than existing studies in public datasets COCO2014,VOC2007,VOC2012.The final results achieved are 86.9%,96.7%,and 96.3%mean Average Precision(mAP)across the three datasets.
基金This work was supported by the National Natural Science Foundation of China(Grant No.U20A20197).
文摘We propose a novel image segmentation algorithm to tackle the challenge of limited recognition and segmentation performance in identifying welding seam images during robotic intelligent operations.Initially,to enhance the capability of deep neural networks in extracting geometric attributes from depth images,we developed a novel deep geometric convolution operator(DGConv).DGConv is utilized to construct a deep local geometric feature extraction module,facilitating a more comprehensive exploration of the intrinsic geometric information within depth images.Secondly,we integrate the newly proposed deep geometric feature module with the Fully Convolutional Network(FCN8)to establish a high-performance deep neural network algorithm tailored for depth image segmentation.Concurrently,we enhance the FCN8 detection head by separating the segmentation and classification processes.This enhancement significantly boosts the network’s overall detection capability.Thirdly,for a comprehensive assessment of our proposed algorithm and its applicability in real-world industrial settings,we curated a line-scan image dataset featuring weld seams.This dataset,named the Standardized Linear Depth Profile(SLDP)dataset,was collected from actual industrial sites where autonomous robots are in operation.Ultimately,we conducted experiments utilizing the SLDP dataset,achieving an average accuracy of 92.7%.Our proposed approach exhibited a remarkable performance improvement over the prior method on the identical dataset.Moreover,we have successfully deployed the proposed algorithm in genuine industrial environments,fulfilling the prerequisites of unmanned robot operations.
基金Deanship of Research and Graduate Studies at King Khalid University for funding this work through Small Group Research Project under Grant Number RGP1/261/45.
文摘Breast cancer is a significant threat to the global population,affecting not only women but also a threat to the entire population.With recent advancements in digital pathology,Eosin and hematoxylin images provide enhanced clarity in examiningmicroscopic features of breast tissues based on their staining properties.Early cancer detection facilitates the quickening of the therapeutic process,thereby increasing survival rates.The analysis made by medical professionals,especially pathologists,is time-consuming and challenging,and there arises a need for automated breast cancer detection systems.The upcoming artificial intelligence platforms,especially deep learning models,play an important role in image diagnosis and prediction.Initially,the histopathology biopsy images are taken from standard data sources.Further,the gathered images are given as input to the Multi-Scale Dilated Vision Transformer,where the essential features are acquired.Subsequently,the features are subjected to the Bidirectional Long Short-Term Memory(Bi-LSTM)for classifying the breast cancer disorder.The efficacy of the model is evaluated using divergent metrics.When compared with other methods,the proposed work reveals that it offers impressive results for detection.
文摘Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such as information extraction.However,because of the diversity in the shapes and sizes of tables,existing table detection methods adapted from general object detection algorithms,have not yet achieved satisfactory results.Incorrect detection results might lead to the loss of critical information.Methods Therefore,we propose a novel end-to-end trainable deep network combined with a self-supervised pretraining transformer for feature extraction to minimize incorrect detections.To better deal with table areas of different shapes and sizes,we added a dualbranch context content attention module(DCCAM)to high-dimensional features to extract context content information,thereby enhancing the network's ability to learn shape features.For feature fusion at different scales,we replaced the original 3×3 convolution with a multilayer residual module,which contains enhanced gradient flow information to improve the feature representation and extraction capability.Results We evaluated our method on public document datasets and compared it with previous methods,which achieved state-of-the-art results in terms of evaluation metrics such as recall and F1-score.https://github.com/Yong Z-Lee/TD-DCCAM.
基金financially supported by the National Council for Scientific and Technological Development(CNPq,Brazil),Swedish-Brazilian Research and Innovation Centre(CISB),and Saab AB under Grant No.CNPq:200053/2022-1the National Council for Scientific and Technological Development(CNPq,Brazil)under Grants No.CNPq:312924/2017-8 and No.CNPq:314660/2020-8.
文摘Unmanned aerial vehicles(UAVs)have been widely used in military,medical,wireless communications,aerial surveillance,etc.One key topic involving UAVs is pose estimation in autonomous navigation.A standard procedure for this process is to combine inertial navigation system sensor information with the global navigation satellite system(GNSS)signal.However,some factors can interfere with the GNSS signal,such as ionospheric scintillation,jamming,or spoofing.One alternative method to avoid using the GNSS signal is to apply an image processing approach by matching UAV images with georeferenced images.But a high effort is required for image edge extraction.Here a support vector regression(SVR)model is proposed to reduce this computational load and processing time.The dynamic partial reconfiguration(DPR)of part of the SVR datapath is implemented to accelerate the process,reduce the area,and analyze its granularity by increasing the grain size of the reconfigurable region.Results show that the implementation in hardware is 68 times faster than that in software.This architecture with DPR also facilitates the low power consumption of 4 mW,leading to a reduction of 57%than that without DPR.This is also the lowest power consumption in current machine learning hardware implementations.Besides,the circuitry area is 41 times smaller.SVR with Gaussian kernel shows a success rate of 99.18%and minimum square error of 0.0146 for testing with the planning trajectory.This system is useful for adaptive applications where the user/designer can modify/reconfigure the hardware layout during its application,thus contributing to lower power consumption,smaller hardware area,and shorter execution time.
文摘Angle detection is a crucial aspect of industrial automation,ensuring precise alignment and orientation ofcomponents in manufacturing processes.Despite the widespread application of computer vision in industrialsettings,angle detection remains an underexplored domain,with limited integration into production lines.Thispaper addresses the need for automated angle detection in industrial environments by presenting a methodologythat eliminates training time and higher computation cost on Graphics Processing Unit(GPU)from machinelearning in computer vision(e.g.,Convolutional Neural Networks(CNN)).Our approach leverages advanced imageprocessing techniques and a strategic combination of algorithms,including contour selection,circle regression,polar warp transformation,and outlier detection,to provide an adaptive solution for angle detection.By configuringthe algorithm with a diverse dataset and evaluating its performance across various objects,we demonstrate itsefficacy in achieving reliable results,with an average error of only 0.5 degrees.Notably,this error margin is 3.274times lower than the acceptable threshold.Our study highlights the importance of accurate angle detection inindustrial settings and showcases the reliability of our algorithm in accurately determining angles,thus contributingto improved manufacturing processes.
文摘The detection and characterization of human veins using infrared (IR) image processing have gained significant attention due to its potential applications in biometric identification, medical diagnostics, and vein-based authentication systems. This paper presents a low-cost approach for automatic detection and characterization of human veins from IR images. The proposed method uses image processing techniques including segmentation, feature extraction, and, pattern recognition algorithms. Initially, the IR images are preprocessed to enhance vein structures and reduce noise. Subsequently, a CLAHE algorithm is employed to extract vein regions based on their unique IR absorption properties. Features such as vein thickness, orientation, and branching patterns are extracted using mathematical morphology and directional filters. Finally, a classification framework is implemented to categorize veins and distinguish them from surrounding tissues or artifacts. A setup based on Raspberry Pi was used. Experimental results of IR images demonstrate the effectiveness and robustness of the proposed approach in accurately detecting and characterizing human. The developed system shows promising for integration into applications requiring reliable and secure identification based on vein patterns. Our work provides an effective and low-cost solution for nursing staff in low and middle-income countries to perform a safe and accurate venipuncture.
基金Project(61172184) supported by the National Natural Science Foundation of ChinaProject(200902482) supported by China Postdoctoral Science Foundation Specially Funded ProjectProject(12JJ6062) supported by the Natural Science Foundation of Hunan Province,China
文摘A blind digital image forensic method for detecting copy-paste forgery between JPEG images was proposed.Two copy-paste tampering scenarios were introduced at first:the tampered image was saved in an uncompressed format or in a JPEG compressed format.Then the proposed detection method was analyzed and simulated for all the cases of the two tampering scenarios.The tampered region is detected by computing the averaged sum of absolute difference(ASAD) images between the examined image and a resaved JPEG compressed image at different quality factors.The experimental results show the advantages of the proposed method:capability of detecting small and/or multiple tampered regions,simple computation,and hence fast speed in processing.
基金supported in part by the Science and Technology Innovation 2030-Key Project of“New Generation Artificial Intelligence”(2018AAA0102303)the Young Elite Scientists Sponsorship Program of China Association of Science and Technology(YESS20210289)+1 种基金the China Postdoctoral Science Foundation(2020TQ1057,2020M682823)the National Natural Science Foundation of China(U20B2071,U1913602,91948204)。
文摘Change detection(CD)is becoming indispensable for unmanned aerial vehicles(UAVs),especially in the domain of water landing,rescue and search.However,even the most advanced models require large amounts of data for model training and testing.Therefore,sufficient labeled images with different imaging conditions are needed.Inspired by computer graphics,we present a cloning method to simulate inland-water scene and collect an auto-labeled simulated dataset.The simulated dataset consists of six challenges to test the effects of dynamic background,weather,and noise on change detection models.Then,we propose an image translation framework that translates simulated images to synthetic images.This framework uses shared parameters(encoder and generator)and 22×22 receptive fields(discriminator)to generate realistic synthetic images as model training sets.The experimental results indicate that:1)different imaging challenges affect the performance of change detection models;2)compared with simulated images,synthetic images can effectively improve the accuracy of supervised models.
文摘The clustering technique is used to examine each pixel in the image which assigned to one of the clusters depending on the minimum distance to obtain primary classified image into different intensity regions. A watershed transformation technique is then employes. This includes: gradient of the classified image, dividing the image into markers, checking the Marker Image to see if it has zero points (watershed lines). The watershed lines are then deleted in the Marker Image created by watershed algorithm. A Region Adjacency Graph (RAG) and Region Adjacency Boundary (RAB) are created between two regions from Marker Image. Finally region merging is done according to region average intensity and two edge strengths (T1, T2). The approach of the authors is tested on remote sensing and brain MR medical images. The final segmentation result is one closed boundary per actual region in the image.
基金support by the National Natural Science Foundation of China (Grant No. 62005049)Natural Science Foundation of Fujian Province (Grant Nos. 2020J01451, 2022J05113)Education and Scientific Research Program for Young and Middleaged Teachers in Fujian Province (Grant No. JAT210035)。
文摘Camouflaged people are extremely expert in actively concealing themselves by effectively utilizing cover and the surrounding environment. Despite advancements in optical detection capabilities through imaging systems, including spectral, polarization, and infrared technologies, there is still a lack of effective real-time method for accurately detecting small-size and high-efficient camouflaged people in complex real-world scenes. Here, this study proposes a snapshot multispectral image-based camouflaged detection model, multispectral YOLO(MS-YOLO), which utilizes the SPD-Conv and Sim AM modules to effectively represent targets and suppress background interference by exploiting the spatial-spectral target information. Besides, the study constructs the first real-shot multispectral camouflaged people dataset(MSCPD), which encompasses diverse scenes, target scales, and attitudes. To minimize information redundancy, MS-YOLO selects an optimal subset of 12 bands with strong feature representation and minimal inter-band correlation as input. Through experiments on the MSCPD, MS-YOLO achieves a mean Average Precision of 94.31% and real-time detection at 65 frames per second, which confirms the effectiveness and efficiency of our method in detecting camouflaged people in various typical desert and forest scenes. Our approach offers valuable support to improve the perception capabilities of unmanned aerial vehicles in detecting enemy forces and rescuing personnel in battlefield.
文摘Image forging is the alteration of a digital image to conceal some of the necessary or helpful information.It cannot be easy to distinguish themodified region fromthe original image in somecircumstances.The demand for authenticity and the integrity of the image drive the detection of a fabricated image.There have been cases of ownership infringements or fraudulent actions by counterfeiting multimedia files,including re-sampling or copy-moving.This work presents a high-level view of the forensics of digital images and their possible detection approaches.This work presents a thorough analysis of digital image forgery detection techniques with their steps and effectiveness.These methods have identified forgery and its type and compared it with state of the art.This work will help us to find the best forgery detection technique based on the different environments.It also shows the current issues in other methods,which can help researchers find future scope for further research in this field.
基金This work was supported by the National Research Foundation of Korea(NRF)NRF-2022R1A2C1011774.
文摘The network infrastructure has evolved rapidly due to the everincreasing volume of users and data.The massive number of online devices and users has forced the network to transform and facilitate the operational necessities of consumers.Among these necessities,network security is of prime significance.Network intrusion detection systems(NIDS)are among the most suitable approaches to detect anomalies and assaults on a network.However,keeping up with the network security requirements is quite challenging due to the constant mutation in attack patterns by the intruders.This paper presents an effective and prevalent framework for NIDS by merging image processing with convolution neural networks(CNN).The proposed framework first converts non-image data from network traffic into images and then further enhances those images by using the Gabor filter.The images are then classified using a CNN classifier.To assess the efficacy of the recommended method,four benchmark datasets i.e.,CSE-CIC-IDS2018,CIC-IDS-2017,ISCX-IDS 2012,and NSL-KDD were used.The proposed approach showed higher precision in contrast with the recent work on the mentioned datasets.Further,the proposed method is compared with the recent well-known image processing methods for NIDS.