Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unman...Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unmanned aerial vehicle(UAV)imagery.Addressing these limitations,we propose a hybrid transformer-based detector,H-DETR,and enhance it for dense small objects,leading to an accurate and efficient model.Firstly,we introduce a hybrid transformer encoder,which integrates a convolutional neural network-based cross-scale fusion module with the original encoder to handle multi-scale feature sequences more efficiently.Furthermore,we propose two novel strategies to enhance detection performance without incurring additional inference computation.Query filter is designed to cope with the dense clustering inherent in drone-captured images by counteracting similar queries with a training-aware non-maximum suppression.Adversarial denoising learning is a novel enhancement method inspired by adversarial learning,which improves the detection of numerous small targets by counteracting the effects of artificial spatial and semantic noise.Extensive experiments on the VisDrone and UAVDT datasets substantiate the effectiveness of our approach,achieving a significant improvement in accuracy with a reduction in computational complexity.Our method achieves 31.9%and 21.1%AP on the VisDrone and UAVDT datasets,respectively,and has a faster inference speed,making it a competitive model in UAV image object detection.展开更多
In clinical practice,the microscopic examination of urine sediment is considered an important in vitro examination with many broad applications.Measuring the amount of each type of urine sediment allows for screening,...In clinical practice,the microscopic examination of urine sediment is considered an important in vitro examination with many broad applications.Measuring the amount of each type of urine sediment allows for screening,diagnosis and evaluation of kidney and urinary tract disease,providing insight into the specific type and severity.However,manual urine sediment examination is labor-intensive,time-consuming,and subjective.Traditional machine learning based object detection methods require hand-crafted features for localization and classification,which have poor generalization capabilities and are difficult to quickly and accurately detect the number of urine sediments.Deep learning based object detection methods have the potential to address the challenges mentioned above,but these methods require access to large urine sediment image datasets.Unfortunately,only a limited number of publicly available urine sediment datasets are currently available.To alleviate the lack of urine sediment datasets in medical image analysis,we propose a new dataset named UriSed2K,which contains 2465 high-quality images annotated with expert guidance.Two main challenges are associated with our dataset:a large number of small objects and the occlusion between these small objects.Our manuscript focuses on applying deep learning object detection methods to the urine sediment dataset and addressing the challenges presented by this dataset.Specifically,our goal is to improve the accuracy and efficiency of the detection algorithm and,in doing so,provide medical professionals with an automatic detector that saves time and effort.We propose an improved lightweight one-stage object detection algorithm called Discriminatory-YOLO.The proposed algorithm comprises a local context attention module and a global background suppression module,which aid the detector in distinguishing urine sediment features in the image.The local context attention module captures context information beyond the object region,while the global background suppression module emphasizes objects in uninformative backgrounds.We comprehensively evaluate our method on the UriSed2K dataset,which includes seven categories of urine sediments,such as erythrocytes(red blood cells),leukocytes(white blood cells),epithelial cells,crystals,mycetes,broken erythrocytes,and broken leukocytes,achieving the best average precision(AP)of 95.3%while taking only 10 ms per image.The source code and dataset are available at https://github.com/binghuiwu98/discriminatoryyolov5.展开更多
Object detection in unmanned aerial vehicle(UAV)aerial images has become increasingly important in military and civil applications.General object detection models are not robust enough against interclass similarity an...Object detection in unmanned aerial vehicle(UAV)aerial images has become increasingly important in military and civil applications.General object detection models are not robust enough against interclass similarity and intraclass variability of small objects,and UAV-specific nuisances such as uncontrolledweather conditions.Unlike previous approaches focusing on high-level semantic information,we report the importance of underlying features to improve detection accuracy and robustness fromthe information-theoretic perspective.Specifically,we propose a robust and discriminative feature learning approach through mutual information maximization(RD-MIM),which can be integrated into numerous object detection methods for aerial images.Firstly,we present the rank sample mining method to reduce underlying feature differences between the natural image domain and the aerial image domain.Then,we design a momentum contrast learning strategy to make object features similar to the same category and dissimilar to different categories.Finally,we construct a transformer-based global attention mechanism to boost object location semantics by leveraging the high interrelation of different receptive fields.We conduct extensive experiments on the VisDrone and Unmanned Aerial Vehicle Benchmark Object Detection and Tracking(UAVDT)datasets to prove the effectiveness of the proposed method.The experimental results show that our approach brings considerable robustness gains to basic detectors and advanced detection methods,achieving relative growth rates of 51.0%and 39.4%in corruption robustness,respectively.Our code is available at https://github.com/cq100/RD-MIM(accessed on 2 August 2024).展开更多
Multi-label image classification is recognized as an important task within the field of computer vision,a discipline that has experienced a significant escalation in research endeavors in recent years.The widespread a...Multi-label image classification is recognized as an important task within the field of computer vision,a discipline that has experienced a significant escalation in research endeavors in recent years.The widespread adoption of convolutional neural networks(CNNs)has catalyzed the remarkable success of architectures such as ResNet-101 within the domain of image classification.However,inmulti-label image classification tasks,it is crucial to consider the correlation between labels.In order to improve the accuracy and performance of multi-label classification and fully combine visual and semantic features,many existing studies use graph convolutional networks(GCN)for modeling.Object detection and multi-label image classification exhibit a degree of conceptual overlap;however,the integration of these two tasks within a unified framework has been relatively underexplored in the existing literature.In this paper,we come up with Object-GCN framework,a model combining object detection network YOLOv5 and graph convolutional network,and we carry out a thorough experimental analysis using a range of well-established public datasets.The designed framework Object-GCN achieves significantly better performance than existing studies in public datasets COCO2014,VOC2007,VOC2012.The final results achieved are 86.9%,96.7%,and 96.3%mean Average Precision(mAP)across the three datasets.展开更多
In many image analysis and processing problems, discriminating the size and shape of each individual object in an aggregate pile projected in an image is an important practice. It is relatively easy to distinguish the...In many image analysis and processing problems, discriminating the size and shape of each individual object in an aggregate pile projected in an image is an important practice. It is relatively easy to distinguish these features among the objects already separated from each other. The problems will be undoubtedly more complex and of greater challenge if the objects are touched or/and overlapped. This letter presents an algorithm that can be used to separate the touches and overlaps existing in the objects within a 2-D image. The approach is first to convert the gray-scale image to its corresponding binary one and then to the 3-D topographic one using the erosion operations. A template (or mask) is engineered to search the topographic surface for the saddle point, from which the segmenting orientation is determined followed by the desired separating operation. The algorithm is tested on a real image and the running result is adequately satisfying and encouraging.展开更多
An improved estimation of motion vectors of feature points is proposed for tracking moving objects of dynamic image sequence. Feature points are firstly extracted by the improved minimum intensity change (MIC) algor...An improved estimation of motion vectors of feature points is proposed for tracking moving objects of dynamic image sequence. Feature points are firstly extracted by the improved minimum intensity change (MIC) algorithm. The matching points of these feature points are then determined by adaptive rood pattern searching. Based on the random sample consensus (RANSAC) method, the background motion is finally compensated by the parameters of an affine transform of the background motion. With reasonable morphological filtering, the moving objects are completely extracted from the background, and then tracked accurately. Experimental results show that the improved method is successful on the motion background compensation and offers great promise in tracking moving objects of the dynamic image sequence.展开更多
Image is an important and creative way to express poets" feelings in both Chinese and English poetry. There are concrete representations and abstract concept in image. They are two key notions in Poetics and Aestheti...Image is an important and creative way to express poets" feelings in both Chinese and English poetry. There are concrete representations and abstract concept in image. They are two key notions in Poetics and Aesthetics. This paper is to show the different versions of "tress" in poems and to explore the exact nature of concepts of sensitive affection in English and Chinese, so as to appreciate the artistic beauty of images.展开更多
Road traffic monitoring is an imperative topic widely discussed among researchers.Systems used to monitor traffic frequently rely on cameras mounted on bridges or roadsides.However,aerial images provide the flexibilit...Road traffic monitoring is an imperative topic widely discussed among researchers.Systems used to monitor traffic frequently rely on cameras mounted on bridges or roadsides.However,aerial images provide the flexibility to use mobile platforms to detect the location and motion of the vehicle over a larger area.To this end,different models have shown the ability to recognize and track vehicles.However,these methods are not mature enough to produce accurate results in complex road scenes.Therefore,this paper presents an algorithm that combines state-of-the-art techniques for identifying and tracking vehicles in conjunction with image bursts.The extracted frames were converted to grayscale,followed by the application of a georeferencing algorithm to embed coordinate information into the images.The masking technique eliminated irrelevant data and reduced the computational cost of the overall monitoring system.Next,Sobel edge detection combined with Canny edge detection and Hough line transform has been applied for noise reduction.After preprocessing,the blob detection algorithm helped detect the vehicles.Vehicles of varying sizes have been detected by implementing a dynamic thresholding scheme.Detection was done on the first image of every burst.Then,to track vehicles,the model of each vehicle was made to find its matches in the succeeding images using the template matching algorithm.To further improve the tracking accuracy by incorporating motion information,Scale Invariant Feature Transform(SIFT)features have been used to find the best possible match among multiple matches.An accuracy rate of 87%for detection and 80%accuracy for tracking in the A1 Motorway Netherland dataset has been achieved.For the Vehicle Aerial Imaging from Drone(VAID)dataset,an accuracy rate of 86%for detection and 78%accuracy for tracking has been achieved.展开更多
An effective model(image to wrinkle, ITW) for garment fitting evaluation is presented. The proposed model is to improve the accuracy of garment fitting evaluation based on dressing image. The ITW model is an objective...An effective model(image to wrinkle, ITW) for garment fitting evaluation is presented. The proposed model is to improve the accuracy of garment fitting evaluation based on dressing image. The ITW model is an objective evaluation model of fitting based on the wrinkle index of dressing image. The ITW model consists of two main steps, the gray curve-fitting(GCF) threshold segmentation algorithm and Canny edge detection algorithm. In the ITW model, three types of wrinkle trends are defined. And the network dressing image is evaluated and simulated by three quantitative indexes: wrinkle number, wrinkle regularity and wrinkle unevenness. Finally, the fitness of three kinds of dress effects(tight, fit and loose) is quantified by objective fitting evaluation model.展开更多
Recently,there has been a notable surge of interest in scientific research regarding spectral images.The potential of these images to revolutionize the digital photography industry,like aerial photography through Unma...Recently,there has been a notable surge of interest in scientific research regarding spectral images.The potential of these images to revolutionize the digital photography industry,like aerial photography through Unmanned Aerial Vehicles(UAVs),has captured considerable attention.One encouraging aspect is their combination with machine learning and deep learning algorithms,which have demonstrated remarkable outcomes in image classification.As a result of this powerful amalgamation,the adoption of spectral images has experienced exponential growth across various domains,with agriculture being one of the prominent beneficiaries.This paper presents an extensive survey encompassing multispectral and hyperspectral images,focusing on their applications for classification challenges in diverse agricultural areas,including plants,grains,fruits,and vegetables.By meticulously examining primary studies,we delve into the specific agricultural domains where multispectral and hyperspectral images have found practical use.Additionally,our attention is directed towards utilizing machine learning techniques for effectively classifying hyperspectral images within the agricultural context.The findings of our investigation reveal that deep learning and support vector machines have emerged as widely employed methods for hyperspectral image classification in agriculture.Nevertheless,we also shed light on the various issues and limitations of working with spectral images.This comprehensive analysis aims to provide valuable insights into the current state of spectral imaging in agriculture and its potential for future advancements.展开更多
Virtual reality(VR) environment can provide immersive experience to viewers.Under the VR environment, providing a good quality of experience is extremely important.Therefore, in this paper, we present an image quality...Virtual reality(VR) environment can provide immersive experience to viewers.Under the VR environment, providing a good quality of experience is extremely important.Therefore, in this paper, we present an image quality assessment(IQA) study on omnidirectional images. We first build an omnidirectional IQA(OIQA) database, including 16 source images with their corresponding 320 distorted images. We add four commonly encountered distortions. These distortions are JPEG compression, JPEG2000 compression, Gaussian blur, and Gaussian noise. Then we conduct a subjective quality evaluation study in the VR environment based on the OIQA database. Considering that visual attention is more important in VR environment, head and eye movement data are also tracked and collected during the quality rating experiments. The 16 raw and their corresponding distorted images,subjective quality assessment scores, and the head-orientation data and eye-gaze data together constitute the OIQA database. Based on the OIQA database, we test some state-of-the-art full-reference IQA(FR-IQA) measures on equirectangular format or cubic formatomnidirectional images. The results show that applying FR-IQA metrics on cubic format omnidirectional images could improve their performance. The performance of some FR-IQA metrics combining the saliency weight of three different types are also tested based on our database. Some new phenomena different from traditional IQA are observed.展开更多
How to construct an appropriate spatial consistent measurement is the key to improving image retrieval performance. To address this problem, this paper introduces a novel image retrieval mechanism based on the family ...How to construct an appropriate spatial consistent measurement is the key to improving image retrieval performance. To address this problem, this paper introduces a novel image retrieval mechanism based on the family filtration in object region. First, we supply an object region by selecting a rectangle in a query image such that system returns a ranked list of images that contain the same object, retrieved from the corpus based on 100 images, as a result of the first rank. To further improve retrieval performance, we add an efficient spatial consistency stage, which is named family-based spatial consistency filtration, to re-rank the results returned by the first rank. We elaborate the performance of the retrieval system by some experiments on the dataset selected from the key frames of "TREC Video Retrieval Evaluation 2005 (TRECVID2005)". The results of experiments show that the retrieval mechanism proposed by us has vast major effect on the retrieval quality. The paper also verifies the stability of the retrieval mechanism by increasing the number of images from 100 to 2000 and realizes generalized retrieval with the object outside the dataset.展开更多
Cone photoreceptor cell identication is important for the early diagnosis of retinopathy.In this study,an object detection algorithm is used for cone cell identication in confocal adaptive optics scanning laser ophtha...Cone photoreceptor cell identication is important for the early diagnosis of retinopathy.In this study,an object detection algorithm is used for cone cell identication in confocal adaptive optics scanning laser ophthalmoscope(AOSLO)images.An effectiveness evaluation of identication using the proposed method reveals precision,recall,and F_(1)-score of 95.8%,96.5%,and 96.1%,respectively,considering manual identication as the ground truth.Various object detection and identication results from images with different cone photoreceptor cell distributions further demonstrate the performance of the proposed method.Overall,the proposed method can accurately identify cone photoreceptor cells on confocal adaptive optics scanning laser ophthalmoscope images,being comparable to manual identication.展开更多
In image processing, one of the most important steps is image segmentation. The objects in remote sensing images often have to be detected in order toperform next steps in image processing. Remote sensing images usua...In image processing, one of the most important steps is image segmentation. The objects in remote sensing images often have to be detected in order toperform next steps in image processing. Remote sensing images usually havelarge size and various spatial resolutions. Thus, detecting objects in remote sensing images is very complicated. In this paper, we develop a model to detectobjects in remote sensing images based on the combination of picture fuzzy clustering and MapReduce method (denoted as MPFC). Firstly, picture fuzzy clustering is applied to segment the input images. Then, MapReduce is used to reducethe runtime with the guarantee of quality. To convert data for MapReduce processing, two new procedures are introduced, including Map_PFC and Reduce_PFC.The formal representation and details of two these procedures are presented in thispaper. The experiments on satellite image and remote sensing image datasets aregiven to evaluate proposed model. Validity indices and time consuming are usedto compare proposed model to picture fuzzy clustering model. The values ofvalidity indices show that picture fuzzy clustering integrated to MapReduce getsbetter quality of segmentation than using picture fuzzy clustering only. Moreover,on two selected image datasets, the run time of MPFC model is much less thanthat of picture fuzzy clustering.展开更多
This letter presents an efficient and simple image segmentation method for semantic object spatial segmentation. First, the image is filtered using contour-preserving filters. Then it is quasi-flat labeled. The small ...This letter presents an efficient and simple image segmentation method for semantic object spatial segmentation. First, the image is filtered using contour-preserving filters. Then it is quasi-flat labeled. The small regions near the contour are classified as uncertain regions and are eliminated by region growing and merging. Further region merging is used to reduce the region number. The simulation results show its efficiency and simplicity. It can preserve the semantic object shape while emphasize on the perceptual complex part of the object. So it conforms to the human visual perception very well.展开更多
This paper proposes an object-tracking algorithm with multiple randomly-generated features. We mainly improve the tracking performance which is sometimes good and sometimes bad in compressive tracking. In compressive ...This paper proposes an object-tracking algorithm with multiple randomly-generated features. We mainly improve the tracking performance which is sometimes good and sometimes bad in compressive tracking. In compressive tracking, the image features are generated by random projection. The resulting image features are affected by the random numbers so that the results of each execution are different. If the obvious features of the target are not captured, the tracker is likely to fail. Therefore the tracking results are inconsistent for each execution. The proposed algorithm uses a number of different image features to track, and chooses the best tracking result by measuring the similarity with the target model. It reduces the chances to determine the target location by the poor image features. In this paper, we use the Bhattacharyya coefficient to choose the best tracking result. The experimental results show that the proposed tracking algorithm can greatly reduce the tracking errors. The best performance improvements in terms of center location error, bounding box overlap ratio and success rate are from 63.62 pixels to 15.45 pixels, from 31.75% to 64.48% and from 38.51% to 82.58%, respectively.展开更多
alient object detection aims at identifying the visually interesting object regions that are consistent with human perception. Multispectral remote sensing images provide rich radiometric information in revealing the ...alient object detection aims at identifying the visually interesting object regions that are consistent with human perception. Multispectral remote sensing images provide rich radiometric information in revealing the physical properties of the observed objects, which leads to great potential to perform salient object detection for remote sensing images. Conventional salient object detection methods often employ handcrafted features to predict saliency by evaluating the pixel-wise or superpixel-wise contrast. With the recent use of deep learning framework, in particular, fully convolutional neural networks, there has been profound progress in visual saliency detection. However, this success has not been extended to multispectral remote sensing images, and existing multispectral salient object detection methods are still mainly based on handcrafted features, essentially due to the difficulties in image acquisition and labeling. In this paper, we propose a novel deep residual network based on a top-down model, which is trained in an end-to-end manner to tackle the above issues in multispectral salient object detection. Our model effectively exploits the saliency cues at different levels of the deep residual network. To overcome the limited availability of remote sensing images in training of our deep residual network, we also introduce a new spectral image reconstruction model that can generate multispectral images from RGB images. Our extensive experimental results using both multispectral and RGB salient object detection datasets demonstrate a significant performance improvement of more than 10% improvement compared with the state-of-the-art methods.展开更多
In an effort to reduce vehicle collisions with snowplows in poor weather conditions, this paper details the development of a real time thermal image based machine learning approach to an early collision avoidance syst...In an effort to reduce vehicle collisions with snowplows in poor weather conditions, this paper details the development of a real time thermal image based machine learning approach to an early collision avoidance system for snowplows, which intends to detect and estimate the distance of trailing vehicles. Due to the operational conditions of snowplows, which include heavy-blowing snow, traditional optical sensors like LiDAR and visible spectrum cameras have reduced effectiveness in detecting objects in such environments. Thus, we propose using a thermal infrared camera as the primary sensor along with machine learning algorithms. First, we curate a large dataset of thermal images of vehicles in heavy snow conditions. Using the curated dataset, two machine-learning models based on the modified ResNet architectures were trained to detect and estimate the trailing vehicle distance using real-time thermal images. The trained detection network was capable of detecting trailing vehicles 99.0% of the time at 1500.0 ft distance from the snowplow. The trained trailing distance network was capable of estimating distance with an average estimation error of 10.70 ft. The inference performance of the trained models is discussed, along with the interpretation of the performance.展开更多
A femtosecond optical Kerr gate time-gated ballistic imaging method is demonstrated to image a transparent object in a turbid medium. The shape features of the object are obtained by time-resolved selection of the bal...A femtosecond optical Kerr gate time-gated ballistic imaging method is demonstrated to image a transparent object in a turbid medium. The shape features of the object are obtained by time-resolved selection of the ballistic photons with different optical path lengths, the thickness distribution of the object is mapped, and the maximum is less than 3.6%. This time-resolved ballistic imaging has potential applications in studying properties of the liquid core in the near field of the fuel spray.展开更多
Fractional Brownian motion, continuous everywhere and differentiable nowhere, offers a convenient modeling for irregular nonstationary stochastic processes with long-term dependencies and power law behavior of spectru...Fractional Brownian motion, continuous everywhere and differentiable nowhere, offers a convenient modeling for irregular nonstationary stochastic processes with long-term dependencies and power law behavior of spectrum over wide ranges of frequencies. It shows high correlation at coarse scale and varies slightly at fine scale, which is suitable for and successful in describing and modeling natural scenes. On the other hand, man-made objects can be constructively well described by using a set of regular simple shape primitives such as line, cylinder, etc. and are free of fractal. Based on the difference, a method to discriminate man-made objects from natural scenes is provided. Experiments are used to demonstrate the good efficiency of developed technique.展开更多
基金This research was funded by the Natural Science Foundation of Hebei Province(F2021506004).
文摘Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unmanned aerial vehicle(UAV)imagery.Addressing these limitations,we propose a hybrid transformer-based detector,H-DETR,and enhance it for dense small objects,leading to an accurate and efficient model.Firstly,we introduce a hybrid transformer encoder,which integrates a convolutional neural network-based cross-scale fusion module with the original encoder to handle multi-scale feature sequences more efficiently.Furthermore,we propose two novel strategies to enhance detection performance without incurring additional inference computation.Query filter is designed to cope with the dense clustering inherent in drone-captured images by counteracting similar queries with a training-aware non-maximum suppression.Adversarial denoising learning is a novel enhancement method inspired by adversarial learning,which improves the detection of numerous small targets by counteracting the effects of artificial spatial and semantic noise.Extensive experiments on the VisDrone and UAVDT datasets substantiate the effectiveness of our approach,achieving a significant improvement in accuracy with a reduction in computational complexity.Our method achieves 31.9%and 21.1%AP on the VisDrone and UAVDT datasets,respectively,and has a faster inference speed,making it a competitive model in UAV image object detection.
基金This work was partially supported by the National Natural Science Foundation of China(Grant Nos.61906168,U20A20171)Zhejiang Provincial Natural Science Foundation of China(Grant Nos.LY23F020023,LY21F020027)Construction of Hubei Provincial Key Laboratory for Intelligent Visual Monitoring of Hydropower Projects(Grant Nos.2022SDSJ01).
文摘In clinical practice,the microscopic examination of urine sediment is considered an important in vitro examination with many broad applications.Measuring the amount of each type of urine sediment allows for screening,diagnosis and evaluation of kidney and urinary tract disease,providing insight into the specific type and severity.However,manual urine sediment examination is labor-intensive,time-consuming,and subjective.Traditional machine learning based object detection methods require hand-crafted features for localization and classification,which have poor generalization capabilities and are difficult to quickly and accurately detect the number of urine sediments.Deep learning based object detection methods have the potential to address the challenges mentioned above,but these methods require access to large urine sediment image datasets.Unfortunately,only a limited number of publicly available urine sediment datasets are currently available.To alleviate the lack of urine sediment datasets in medical image analysis,we propose a new dataset named UriSed2K,which contains 2465 high-quality images annotated with expert guidance.Two main challenges are associated with our dataset:a large number of small objects and the occlusion between these small objects.Our manuscript focuses on applying deep learning object detection methods to the urine sediment dataset and addressing the challenges presented by this dataset.Specifically,our goal is to improve the accuracy and efficiency of the detection algorithm and,in doing so,provide medical professionals with an automatic detector that saves time and effort.We propose an improved lightweight one-stage object detection algorithm called Discriminatory-YOLO.The proposed algorithm comprises a local context attention module and a global background suppression module,which aid the detector in distinguishing urine sediment features in the image.The local context attention module captures context information beyond the object region,while the global background suppression module emphasizes objects in uninformative backgrounds.We comprehensively evaluate our method on the UriSed2K dataset,which includes seven categories of urine sediments,such as erythrocytes(red blood cells),leukocytes(white blood cells),epithelial cells,crystals,mycetes,broken erythrocytes,and broken leukocytes,achieving the best average precision(AP)of 95.3%while taking only 10 ms per image.The source code and dataset are available at https://github.com/binghuiwu98/discriminatoryyolov5.
基金supported by the National Natural Science Foundation of China under Grant 61671219.
文摘Object detection in unmanned aerial vehicle(UAV)aerial images has become increasingly important in military and civil applications.General object detection models are not robust enough against interclass similarity and intraclass variability of small objects,and UAV-specific nuisances such as uncontrolledweather conditions.Unlike previous approaches focusing on high-level semantic information,we report the importance of underlying features to improve detection accuracy and robustness fromthe information-theoretic perspective.Specifically,we propose a robust and discriminative feature learning approach through mutual information maximization(RD-MIM),which can be integrated into numerous object detection methods for aerial images.Firstly,we present the rank sample mining method to reduce underlying feature differences between the natural image domain and the aerial image domain.Then,we design a momentum contrast learning strategy to make object features similar to the same category and dissimilar to different categories.Finally,we construct a transformer-based global attention mechanism to boost object location semantics by leveraging the high interrelation of different receptive fields.We conduct extensive experiments on the VisDrone and Unmanned Aerial Vehicle Benchmark Object Detection and Tracking(UAVDT)datasets to prove the effectiveness of the proposed method.The experimental results show that our approach brings considerable robustness gains to basic detectors and advanced detection methods,achieving relative growth rates of 51.0%and 39.4%in corruption robustness,respectively.Our code is available at https://github.com/cq100/RD-MIM(accessed on 2 August 2024).
文摘Multi-label image classification is recognized as an important task within the field of computer vision,a discipline that has experienced a significant escalation in research endeavors in recent years.The widespread adoption of convolutional neural networks(CNNs)has catalyzed the remarkable success of architectures such as ResNet-101 within the domain of image classification.However,inmulti-label image classification tasks,it is crucial to consider the correlation between labels.In order to improve the accuracy and performance of multi-label classification and fully combine visual and semantic features,many existing studies use graph convolutional networks(GCN)for modeling.Object detection and multi-label image classification exhibit a degree of conceptual overlap;however,the integration of these two tasks within a unified framework has been relatively underexplored in the existing literature.In this paper,we come up with Object-GCN framework,a model combining object detection network YOLOv5 and graph convolutional network,and we carry out a thorough experimental analysis using a range of well-established public datasets.The designed framework Object-GCN achieves significantly better performance than existing studies in public datasets COCO2014,VOC2007,VOC2012.The final results achieved are 86.9%,96.7%,and 96.3%mean Average Precision(mAP)across the three datasets.
基金Suppprted by the Scientific Research Start-up foundation of Ningbo University (No.2004037)Zhejiang Provincial Foundation for Returned Overseas Students and Scholars (No.2004884).
文摘In many image analysis and processing problems, discriminating the size and shape of each individual object in an aggregate pile projected in an image is an important practice. It is relatively easy to distinguish these features among the objects already separated from each other. The problems will be undoubtedly more complex and of greater challenge if the objects are touched or/and overlapped. This letter presents an algorithm that can be used to separate the touches and overlaps existing in the objects within a 2-D image. The approach is first to convert the gray-scale image to its corresponding binary one and then to the 3-D topographic one using the erosion operations. A template (or mask) is engineered to search the topographic surface for the saddle point, from which the segmenting orientation is determined followed by the desired separating operation. The algorithm is tested on a real image and the running result is adequately satisfying and encouraging.
文摘An improved estimation of motion vectors of feature points is proposed for tracking moving objects of dynamic image sequence. Feature points are firstly extracted by the improved minimum intensity change (MIC) algorithm. The matching points of these feature points are then determined by adaptive rood pattern searching. Based on the random sample consensus (RANSAC) method, the background motion is finally compensated by the parameters of an affine transform of the background motion. With reasonable morphological filtering, the moving objects are completely extracted from the background, and then tracked accurately. Experimental results show that the improved method is successful on the motion background compensation and offers great promise in tracking moving objects of the dynamic image sequence.
文摘Image is an important and creative way to express poets" feelings in both Chinese and English poetry. There are concrete representations and abstract concept in image. They are two key notions in Poetics and Aesthetics. This paper is to show the different versions of "tress" in poems and to explore the exact nature of concepts of sensitive affection in English and Chinese, so as to appreciate the artistic beauty of images.
基金supported by a grant from the Basic Science Research Program through the National Research Foundation(NRF)(2021R1F1A1063634)funded by the Ministry of Science and ICT(MSIT),Republic of KoreaThe authors are thankful to the Deanship of Scientific Research at Najran University for funding this work under the Research Group Funding Program Grant Code(NU/RG/SERC/13/40)+2 种基金Also,the authors are thankful to Prince Satam bin Abdulaziz University for supporting this study via funding from Prince Satam bin Abdulaziz University project number(PSAU/2024/R/1445)This work was also supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2023R54)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Road traffic monitoring is an imperative topic widely discussed among researchers.Systems used to monitor traffic frequently rely on cameras mounted on bridges or roadsides.However,aerial images provide the flexibility to use mobile platforms to detect the location and motion of the vehicle over a larger area.To this end,different models have shown the ability to recognize and track vehicles.However,these methods are not mature enough to produce accurate results in complex road scenes.Therefore,this paper presents an algorithm that combines state-of-the-art techniques for identifying and tracking vehicles in conjunction with image bursts.The extracted frames were converted to grayscale,followed by the application of a georeferencing algorithm to embed coordinate information into the images.The masking technique eliminated irrelevant data and reduced the computational cost of the overall monitoring system.Next,Sobel edge detection combined with Canny edge detection and Hough line transform has been applied for noise reduction.After preprocessing,the blob detection algorithm helped detect the vehicles.Vehicles of varying sizes have been detected by implementing a dynamic thresholding scheme.Detection was done on the first image of every burst.Then,to track vehicles,the model of each vehicle was made to find its matches in the succeeding images using the template matching algorithm.To further improve the tracking accuracy by incorporating motion information,Scale Invariant Feature Transform(SIFT)features have been used to find the best possible match among multiple matches.An accuracy rate of 87%for detection and 80%accuracy for tracking in the A1 Motorway Netherland dataset has been achieved.For the Vehicle Aerial Imaging from Drone(VAID)dataset,an accuracy rate of 86%for detection and 78%accuracy for tracking has been achieved.
文摘An effective model(image to wrinkle, ITW) for garment fitting evaluation is presented. The proposed model is to improve the accuracy of garment fitting evaluation based on dressing image. The ITW model is an objective evaluation model of fitting based on the wrinkle index of dressing image. The ITW model consists of two main steps, the gray curve-fitting(GCF) threshold segmentation algorithm and Canny edge detection algorithm. In the ITW model, three types of wrinkle trends are defined. And the network dressing image is evaluated and simulated by three quantitative indexes: wrinkle number, wrinkle regularity and wrinkle unevenness. Finally, the fitness of three kinds of dress effects(tight, fit and loose) is quantified by objective fitting evaluation model.
文摘Recently,there has been a notable surge of interest in scientific research regarding spectral images.The potential of these images to revolutionize the digital photography industry,like aerial photography through Unmanned Aerial Vehicles(UAVs),has captured considerable attention.One encouraging aspect is their combination with machine learning and deep learning algorithms,which have demonstrated remarkable outcomes in image classification.As a result of this powerful amalgamation,the adoption of spectral images has experienced exponential growth across various domains,with agriculture being one of the prominent beneficiaries.This paper presents an extensive survey encompassing multispectral and hyperspectral images,focusing on their applications for classification challenges in diverse agricultural areas,including plants,grains,fruits,and vegetables.By meticulously examining primary studies,we delve into the specific agricultural domains where multispectral and hyperspectral images have found practical use.Additionally,our attention is directed towards utilizing machine learning techniques for effectively classifying hyperspectral images within the agricultural context.The findings of our investigation reveal that deep learning and support vector machines have emerged as widely employed methods for hyperspectral image classification in agriculture.Nevertheless,we also shed light on the various issues and limitations of working with spectral images.This comprehensive analysis aims to provide valuable insights into the current state of spectral imaging in agriculture and its potential for future advancements.
文摘Virtual reality(VR) environment can provide immersive experience to viewers.Under the VR environment, providing a good quality of experience is extremely important.Therefore, in this paper, we present an image quality assessment(IQA) study on omnidirectional images. We first build an omnidirectional IQA(OIQA) database, including 16 source images with their corresponding 320 distorted images. We add four commonly encountered distortions. These distortions are JPEG compression, JPEG2000 compression, Gaussian blur, and Gaussian noise. Then we conduct a subjective quality evaluation study in the VR environment based on the OIQA database. Considering that visual attention is more important in VR environment, head and eye movement data are also tracked and collected during the quality rating experiments. The 16 raw and their corresponding distorted images,subjective quality assessment scores, and the head-orientation data and eye-gaze data together constitute the OIQA database. Based on the OIQA database, we test some state-of-the-art full-reference IQA(FR-IQA) measures on equirectangular format or cubic formatomnidirectional images. The results show that applying FR-IQA metrics on cubic format omnidirectional images could improve their performance. The performance of some FR-IQA metrics combining the saliency weight of three different types are also tested based on our database. Some new phenomena different from traditional IQA are observed.
基金supported by National High Technology Research and Development Program of China (863 Program)(No.2007AA01Z416)National Natural Science Foundation of China (No.60773056)+1 种基金Beijing New Star Project on Science and Technology (No.2007B071)Natural Science Foundation of Liaoning Province of China (No.20052184)
文摘How to construct an appropriate spatial consistent measurement is the key to improving image retrieval performance. To address this problem, this paper introduces a novel image retrieval mechanism based on the family filtration in object region. First, we supply an object region by selecting a rectangle in a query image such that system returns a ranked list of images that contain the same object, retrieved from the corpus based on 100 images, as a result of the first rank. To further improve retrieval performance, we add an efficient spatial consistency stage, which is named family-based spatial consistency filtration, to re-rank the results returned by the first rank. We elaborate the performance of the retrieval system by some experiments on the dataset selected from the key frames of "TREC Video Retrieval Evaluation 2005 (TRECVID2005)". The results of experiments show that the retrieval mechanism proposed by us has vast major effect on the retrieval quality. The paper also verifies the stability of the retrieval mechanism by increasing the number of images from 100 to 2000 and realizes generalized retrieval with the object outside the dataset.
基金the Natural Science Foundation of Jiangsu Province(BK20200214)National Key R&D Program of China(2017YFB0403701)+5 种基金Jiangsu Province Key R&D Program(BE2019682 and BE2018667)National Natural Science Foundation of China(61605210,61675226,and 62075235)Youth Innovation Promotion Association of Chinese Academy of Sciences(2019320)Frontier Science Research Project of the Chinese Academy of Sciences(QYZDB-SSW-JSC03)Strategic Priority Research Program of the Chinese Academy of Sciences(XDB02060000)and Entrepreneurship and Innova-tion Talents in Jiangsu Province(Innovation of Scienti¯c Research Institutes).
文摘Cone photoreceptor cell identication is important for the early diagnosis of retinopathy.In this study,an object detection algorithm is used for cone cell identication in confocal adaptive optics scanning laser ophthalmoscope(AOSLO)images.An effectiveness evaluation of identication using the proposed method reveals precision,recall,and F_(1)-score of 95.8%,96.5%,and 96.1%,respectively,considering manual identication as the ground truth.Various object detection and identication results from images with different cone photoreceptor cell distributions further demonstrate the performance of the proposed method.Overall,the proposed method can accurately identify cone photoreceptor cells on confocal adaptive optics scanning laser ophthalmoscope images,being comparable to manual identication.
基金funded by Thuyloi University Foundation for Science and Technologyunder Grant Number TLU.STF.19-02.
文摘In image processing, one of the most important steps is image segmentation. The objects in remote sensing images often have to be detected in order toperform next steps in image processing. Remote sensing images usually havelarge size and various spatial resolutions. Thus, detecting objects in remote sensing images is very complicated. In this paper, we develop a model to detectobjects in remote sensing images based on the combination of picture fuzzy clustering and MapReduce method (denoted as MPFC). Firstly, picture fuzzy clustering is applied to segment the input images. Then, MapReduce is used to reducethe runtime with the guarantee of quality. To convert data for MapReduce processing, two new procedures are introduced, including Map_PFC and Reduce_PFC.The formal representation and details of two these procedures are presented in thispaper. The experiments on satellite image and remote sensing image datasets aregiven to evaluate proposed model. Validity indices and time consuming are usedto compare proposed model to picture fuzzy clustering model. The values ofvalidity indices show that picture fuzzy clustering integrated to MapReduce getsbetter quality of segmentation than using picture fuzzy clustering only. Moreover,on two selected image datasets, the run time of MPFC model is much less thanthat of picture fuzzy clustering.
基金Supported by Guangdong Natural Science Foundation(No.011628)
文摘This letter presents an efficient and simple image segmentation method for semantic object spatial segmentation. First, the image is filtered using contour-preserving filters. Then it is quasi-flat labeled. The small regions near the contour are classified as uncertain regions and are eliminated by region growing and merging. Further region merging is used to reduce the region number. The simulation results show its efficiency and simplicity. It can preserve the semantic object shape while emphasize on the perceptual complex part of the object. So it conforms to the human visual perception very well.
文摘This paper proposes an object-tracking algorithm with multiple randomly-generated features. We mainly improve the tracking performance which is sometimes good and sometimes bad in compressive tracking. In compressive tracking, the image features are generated by random projection. The resulting image features are affected by the random numbers so that the results of each execution are different. If the obvious features of the target are not captured, the tracker is likely to fail. Therefore the tracking results are inconsistent for each execution. The proposed algorithm uses a number of different image features to track, and chooses the best tracking result by measuring the similarity with the target model. It reduces the chances to determine the target location by the poor image features. In this paper, we use the Bhattacharyya coefficient to choose the best tracking result. The experimental results show that the proposed tracking algorithm can greatly reduce the tracking errors. The best performance improvements in terms of center location error, bounding box overlap ratio and success rate are from 63.62 pixels to 15.45 pixels, from 31.75% to 64.48% and from 38.51% to 82.58%, respectively.
基金National 1000 Young Talents Plan of ChinaNational Natural Science Foundation of China(61420106007,61671387,61871325)DECRA of Australica Resenrch Council (DE140100180).
文摘alient object detection aims at identifying the visually interesting object regions that are consistent with human perception. Multispectral remote sensing images provide rich radiometric information in revealing the physical properties of the observed objects, which leads to great potential to perform salient object detection for remote sensing images. Conventional salient object detection methods often employ handcrafted features to predict saliency by evaluating the pixel-wise or superpixel-wise contrast. With the recent use of deep learning framework, in particular, fully convolutional neural networks, there has been profound progress in visual saliency detection. However, this success has not been extended to multispectral remote sensing images, and existing multispectral salient object detection methods are still mainly based on handcrafted features, essentially due to the difficulties in image acquisition and labeling. In this paper, we propose a novel deep residual network based on a top-down model, which is trained in an end-to-end manner to tackle the above issues in multispectral salient object detection. Our model effectively exploits the saliency cues at different levels of the deep residual network. To overcome the limited availability of remote sensing images in training of our deep residual network, we also introduce a new spectral image reconstruction model that can generate multispectral images from RGB images. Our extensive experimental results using both multispectral and RGB salient object detection datasets demonstrate a significant performance improvement of more than 10% improvement compared with the state-of-the-art methods.
文摘In an effort to reduce vehicle collisions with snowplows in poor weather conditions, this paper details the development of a real time thermal image based machine learning approach to an early collision avoidance system for snowplows, which intends to detect and estimate the distance of trailing vehicles. Due to the operational conditions of snowplows, which include heavy-blowing snow, traditional optical sensors like LiDAR and visible spectrum cameras have reduced effectiveness in detecting objects in such environments. Thus, we propose using a thermal infrared camera as the primary sensor along with machine learning algorithms. First, we curate a large dataset of thermal images of vehicles in heavy snow conditions. Using the curated dataset, two machine-learning models based on the modified ResNet architectures were trained to detect and estimate the trailing vehicle distance using real-time thermal images. The trained detection network was capable of detecting trailing vehicles 99.0% of the time at 1500.0 ft distance from the snowplow. The trained trailing distance network was capable of estimating distance with an average estimation error of 10.70 ft. The inference performance of the trained models is discussed, along with the interpretation of the performance.
基金Supported by the National Natural Science Foundation of China under Grant Nos 61427816 and 61690221the Collaborative Innovation Center of Suzhou Nano Science and Technology
文摘A femtosecond optical Kerr gate time-gated ballistic imaging method is demonstrated to image a transparent object in a turbid medium. The shape features of the object are obtained by time-resolved selection of the ballistic photons with different optical path lengths, the thickness distribution of the object is mapped, and the maximum is less than 3.6%. This time-resolved ballistic imaging has potential applications in studying properties of the liquid core in the near field of the fuel spray.
文摘Fractional Brownian motion, continuous everywhere and differentiable nowhere, offers a convenient modeling for irregular nonstationary stochastic processes with long-term dependencies and power law behavior of spectrum over wide ranges of frequencies. It shows high correlation at coarse scale and varies slightly at fine scale, which is suitable for and successful in describing and modeling natural scenes. On the other hand, man-made objects can be constructively well described by using a set of regular simple shape primitives such as line, cylinder, etc. and are free of fractal. Based on the difference, a method to discriminate man-made objects from natural scenes is provided. Experiments are used to demonstrate the good efficiency of developed technique.