Discovering floating wastes,especially bottles on water,is a crucial research problem in environmental hygiene.Nevertheless,real-world applications often face challenges such as interference from irrelevant objects an...Discovering floating wastes,especially bottles on water,is a crucial research problem in environmental hygiene.Nevertheless,real-world applications often face challenges such as interference from irrelevant objects and the high cost associated with data collection.Consequently,devising algorithms capable of accurately localizing specific objects within a scene in scenarios where annotated data is limited remains a formidable challenge.To solve this problem,this paper proposes an object discovery by request problem setting and a corresponding algorithmic framework.The proposed problem setting aims to identify specified objects in scenes,and the associated algorithmic framework comprises pseudo data generation and object discovery by request network.Pseudo-data generation generates images resembling natural scenes through various data augmentation rules,using a small number of object samples and scene images.The network structure of object discovery by request utilizes the pre-trained Vision Transformer(ViT)model as the backbone,employs object-centric methods to learn the latent representations of foreground objects,and applies patch-level reconstruction constraints to the model.During the validation phase,we use the generated pseudo datasets as training sets and evaluate the performance of our model on the original test sets.Experiments have proved that our method achieves state-of-the-art performance on Unmanned Aerial Vehicles-Bottle Detection(UAV-BD)dataset and self-constructed dataset Bottle,especially in multi-object scenarios.展开更多
Object detection in unmanned aerial vehicle(UAV)aerial images has become increasingly important in military and civil applications.General object detection models are not robust enough against interclass similarity an...Object detection in unmanned aerial vehicle(UAV)aerial images has become increasingly important in military and civil applications.General object detection models are not robust enough against interclass similarity and intraclass variability of small objects,and UAV-specific nuisances such as uncontrolledweather conditions.Unlike previous approaches focusing on high-level semantic information,we report the importance of underlying features to improve detection accuracy and robustness fromthe information-theoretic perspective.Specifically,we propose a robust and discriminative feature learning approach through mutual information maximization(RD-MIM),which can be integrated into numerous object detection methods for aerial images.Firstly,we present the rank sample mining method to reduce underlying feature differences between the natural image domain and the aerial image domain.Then,we design a momentum contrast learning strategy to make object features similar to the same category and dissimilar to different categories.Finally,we construct a transformer-based global attention mechanism to boost object location semantics by leveraging the high interrelation of different receptive fields.We conduct extensive experiments on the VisDrone and Unmanned Aerial Vehicle Benchmark Object Detection and Tracking(UAVDT)datasets to prove the effectiveness of the proposed method.The experimental results show that our approach brings considerable robustness gains to basic detectors and advanced detection methods,achieving relative growth rates of 51.0%and 39.4%in corruption robustness,respectively.Our code is available at https://github.com/cq100/RD-MIM(accessed on 2 August 2024).展开更多
For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior fe...For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior features. Yet existing technologies do not take full advantage of this information. In order to take object recognition further than existing algorithms in the above application, an object recognition method that fuses temporal sequence with scene priori information is proposed. This method first employs YOLOv3 as the basic algorithm to recognize objects in single-frame images, then the DeepSort algorithm to establish association among potential objects recognized in images of different moments, and finally the confidence fusion method and temporal boundary processing method designed herein to fuse, at the decision level, temporal sequence information with scene priori information. Experiments using public datasets and self-built industrial scene datasets show that due to the expansion of information sources, the quality of single-frame images has less impact on the recognition results, whereby the object recognition is greatly improved. It is presented herein as a widely applicable framework for the fusion of information under multiple classes. All the object recognition algorithms that output object class, location information and recognition confidence at the same time can be integrated into this information fusion framework to improve performance.展开更多
BACKGROUND Deep learning provides an efficient automatic image recognition method for small bowel(SB)capsule endoscopy(CE)that can assist physicians in diagnosis.However,the existing deep learning models present some ...BACKGROUND Deep learning provides an efficient automatic image recognition method for small bowel(SB)capsule endoscopy(CE)that can assist physicians in diagnosis.However,the existing deep learning models present some unresolved challenges.AIM To propose a novel and effective classification and detection model to automatically identify various SB lesions and their bleeding risks,and label the lesions accurately so as to enhance the diagnostic efficiency of physicians and the ability to identify high-risk bleeding groups.METHODS The proposed model represents a two-stage method that combined image classification with object detection.First,we utilized the improved ResNet-50 classification model to classify endoscopic images into SB lesion images,normal SB mucosa images,and invalid images.Then,the improved YOLO-V5 detection model was utilized to detect the type of lesion and its risk of bleeding,and the location of the lesion was marked.We constructed training and testing sets and compared model-assisted reading with physician reading.RESULTS The accuracy of the model constructed in this study reached 98.96%,which was higher than the accuracy of other systems using only a single module.The sensitivity,specificity,and accuracy of the model-assisted reading detection of all images were 99.17%,99.92%,and 99.86%,which were significantly higher than those of the endoscopists’diagnoses.The image processing time of the model was 48 ms/image,and the image processing time of the physicians was 0.40±0.24 s/image(P<0.001).CONCLUSION The deep learning model of image classification combined with object detection exhibits a satisfactory diagnostic effect on a variety of SB lesions and their bleeding risks in CE images,which enhances the diagnostic efficiency of physicians and improves the ability of physicians to identify high-risk bleeding groups.展开更多
Optical image-based ship detection can ensure the safety of ships and promote the orderly management of ships in offshore waters.Current deep learning researches on optical image-based ship detection mainly focus on i...Optical image-based ship detection can ensure the safety of ships and promote the orderly management of ships in offshore waters.Current deep learning researches on optical image-based ship detection mainly focus on improving one-stage detectors for real-time ship detection but sacrifices the accuracy of detection.To solve this problem,we present a hybrid ship detection framework which is named EfficientShip in this paper.The core parts of the EfficientShip are DLA-backboned object location(DBOL)and CascadeRCNN-guided object classification(CROC).The DBOL is responsible for finding potential ship objects,and the CROC is used to categorize the potential ship objects.We also design a pixel-spatial-level data augmentation(PSDA)to reduce the risk of detection model overfitting.We compare the proposed EfficientShip with state-of-the-art(SOTA)literature on a ship detection dataset called Seaships.Experiments show our ship detection framework achieves a result of 99.63%(mAP)at 45 fps,which is much better than 8 SOTA approaches on detection accuracy and can also meet the requirements of real-time application scenarios.展开更多
Computer vision(CV)was developed for computers and other systems to act or make recommendations based on visual inputs,such as digital photos,movies,and other media.Deep learning(DL)methods are more successful than ot...Computer vision(CV)was developed for computers and other systems to act or make recommendations based on visual inputs,such as digital photos,movies,and other media.Deep learning(DL)methods are more successful than other traditional machine learning(ML)methods inCV.DL techniques can produce state-of-the-art results for difficult CV problems like picture categorization,object detection,and face recognition.In this review,a structured discussion on the history,methods,and applications of DL methods to CV problems is presented.The sector-wise presentation of applications in this papermay be particularly useful for researchers in niche fields who have limited or introductory knowledge of DL methods and CV.This review will provide readers with context and examples of how these techniques can be applied to specific areas.A curated list of popular datasets and a brief description of them are also included for the benefit of readers.展开更多
Recently,there has been a notable surge of interest in scientific research regarding spectral images.The potential of these images to revolutionize the digital photography industry,like aerial photography through Unma...Recently,there has been a notable surge of interest in scientific research regarding spectral images.The potential of these images to revolutionize the digital photography industry,like aerial photography through Unmanned Aerial Vehicles(UAVs),has captured considerable attention.One encouraging aspect is their combination with machine learning and deep learning algorithms,which have demonstrated remarkable outcomes in image classification.As a result of this powerful amalgamation,the adoption of spectral images has experienced exponential growth across various domains,with agriculture being one of the prominent beneficiaries.This paper presents an extensive survey encompassing multispectral and hyperspectral images,focusing on their applications for classification challenges in diverse agricultural areas,including plants,grains,fruits,and vegetables.By meticulously examining primary studies,we delve into the specific agricultural domains where multispectral and hyperspectral images have found practical use.Additionally,our attention is directed towards utilizing machine learning techniques for effectively classifying hyperspectral images within the agricultural context.The findings of our investigation reveal that deep learning and support vector machines have emerged as widely employed methods for hyperspectral image classification in agriculture.Nevertheless,we also shed light on the various issues and limitations of working with spectral images.This comprehensive analysis aims to provide valuable insights into the current state of spectral imaging in agriculture and its potential for future advancements.展开更多
Significant advancements have beenwitnessed in visual tracking applications leveragingViT in recent years,mainly due to the formidablemodeling capabilities of Vision Transformer(ViT).However,the strong performance of ...Significant advancements have beenwitnessed in visual tracking applications leveragingViT in recent years,mainly due to the formidablemodeling capabilities of Vision Transformer(ViT).However,the strong performance of such trackers heavily relies on ViT models pretrained for long periods,limitingmore flexible model designs for tracking tasks.To address this issue,we propose an efficient unsupervised ViT pretraining method for the tracking task based on masked autoencoders,called TrackMAE.During pretraining,we employ two shared-parameter ViTs,serving as the appearance encoder and motion encoder,respectively.The appearance encoder encodes randomly masked image data,while the motion encoder encodes randomly masked pairs of video frames.Subsequently,an appearance decoder and a motion decoder separately reconstruct the original image data and video frame data at the pixel level.In this way,ViT learns to understand both the appearance of images and the motion between video frames simultaneously.Experimental results demonstrate that ViT-Base and ViT-Large models,pretrained with TrackMAE and combined with a simple tracking head,achieve state-of-the-art(SOTA)performance without additional design.Moreover,compared to the currently popular MAE pretraining methods,TrackMAE consumes only 1/5 of the training time,which will facilitate the customization of diverse models for tracking.For instance,we additionally customize a lightweight ViT-XS,which achieves SOTA efficient tracking performance.展开更多
In rice production,the prevention and management of pests and diseases have always received special attention.Traditional methods require human experts,which is costly and time-consuming.Due to the complexity of the s...In rice production,the prevention and management of pests and diseases have always received special attention.Traditional methods require human experts,which is costly and time-consuming.Due to the complexity of the structure of rice diseases and pests,quickly and reliably recognizing and locating them is difficult.Recently,deep learning technology has been employed to detect and identify rice diseases and pests.This paper introduces common publicly available datasets;summarizes the applications on rice diseases and pests from the aspects of image recognition,object detection,image segmentation,attention mechanism,and few-shot learning methods according to the network structure differences;and compares the performances of existing studies.Finally,the current issues and challenges are explored fromthe perspective of data acquisition,data processing,and application,providing possible solutions and suggestions.This study aims to review various DL models and provide improved insight into DL techniques and their cutting-edge progress in the prevention and management of rice diseases and pests.展开更多
Arbitrary‐oriented object detection is widely used in aerial image applications because of its efficient object representation.However,the use of oriented bounding box aggravates the imbalance between positive and ne...Arbitrary‐oriented object detection is widely used in aerial image applications because of its efficient object representation.However,the use of oriented bounding box aggravates the imbalance between positive and negative samples when using one‐stage object detectors,which seriously decreases the detection accuracy.We believe that it is the anchor learning strategy(ALS)used by such detectors that needs to take the responsibility.In this study,three perspectives on ALS design were summarised and ALS—Performance Releaser with Smart Anchor Learning(PRSAL)was proposed.Performance Releaser with Smart Anchor Learning is a dynamic ALS that utilises anchor classification ability as an equivalent indicator to anchor box regression ability,this allows anchors with high detection potential to be filtered out in a more reasonable way.At the same time,PRSAL focuses more on anchor potential and it is able to automatically select a number of positive samples that far exceed that of other methods by activating anchors that previously had a low spatial overlap,thereby releasing the detection performance.We validate the PRSAL using three remote sensing datasets—HRSC2016,DOTA and UCAS‐AOD as well as one scene text dataset—ICDAR 2013.The experimental results show that the proposed method gives substantially better results than existing models.展开更多
Object Detection is the task of localization and classification of objects in a video or image.In recent times,because of its widespread applications,it has obtained more importance.In the modern world,waste pollution...Object Detection is the task of localization and classification of objects in a video or image.In recent times,because of its widespread applications,it has obtained more importance.In the modern world,waste pollution is one significant environmental problem.The prominence of recycling is known very well for both ecological and economic reasons,and the industry needs higher efficiency.Waste object detection utilizing deep learning(DL)involves training a machine-learning method to classify and detect various types of waste in videos or images.This technology is utilized for several purposes recycling and sorting waste,enhancing waste management and reducing environmental pollution.Recent studies of automatic waste detection are difficult to compare because of the need for benchmarks and broadly accepted standards concerning the employed data andmetrics.Therefore,this study designs an Entropy-based Feature Fusion using Deep Learning forWasteObject Detection and Classification(EFFDL-WODC)algorithm.The presented EFFDL-WODC system inherits the concepts of feature fusion and DL techniques for the effectual recognition and classification of various kinds of waste objects.In the presented EFFDL-WODC system,two major procedures can be contained,such as waste object detection and waste object classification.For object detection,the EFFDL-WODC technique uses a YOLOv7 object detector with a fusionbased backbone network.In addition,entropy feature fusion-based models such as VGG-16,SqueezeNet,and NASNetmodels are used.Finally,the EFFDL-WODC technique uses a graph convolutional network(GCN)model performed for the classification of detected waste objects.The performance validation of the EFFDL-WODC approach was validated on the benchmark database.The comprehensive comparative results demonstrated the improved performance of the EFFDL-WODC technique over recent approaches.展开更多
At present days,object detection and tracking concepts have gained more importance among researchers and business people.Presently,deep learning(DL)approaches have been used for object tracking as it increases the per...At present days,object detection and tracking concepts have gained more importance among researchers and business people.Presently,deep learning(DL)approaches have been used for object tracking as it increases the perfor-mance and speed of the tracking process.This paper presents a novel robust DL based object detection and tracking algorithm using Automated Image Anno-tation with ResNet based Faster regional convolutional neural network(R-CNN)named(AIA-FRCNN)model.The AIA-RFRCNN method performs image anno-tation using a Discriminative Correlation Filter(DCF)with Channel and Spatial Reliability tracker(CSR)called DCF-CSRT model.The AIA-RFRCNN model makes use of Faster RCNN as an object detector and tracker,which involves region proposal network(RPN)and Fast R-CNN.The RPN is a full convolution network that concurrently predicts the bounding box and score of different objects.The RPN is a trained model used for the generation of the high-quality region proposals,which are utilized by Fast R-CNN for detection process.Besides,Residual Network(ResNet 101)model is used as a shared convolutional neural network(CNN)for the generation of feature maps.The performance of the ResNet 101 model is further improved by the use of Adam optimizer,which tunes the hyperparameters namely learning rate,batch size,momentum,and weight decay.Finally,softmax layer is applied to classify the images.The performance of the AIA-RFRCNN method has been assessed using a benchmark dataset and a detailed comparative analysis of the results takes place.The outcome of the experiments indicated the superior characteristics of the AIA-RFRCNN model under diverse aspects.展开更多
This paper discusses about the new approach of multiple object track-ing relative to background information.The concept of multiple object tracking through background learning is based upon the theory of relativity,th...This paper discusses about the new approach of multiple object track-ing relative to background information.The concept of multiple object tracking through background learning is based upon the theory of relativity,that involves a frame of reference in spatial domain to localize and/or track any object.Thefield of multiple object tracking has seen a lot of research,but researchers have considered the background as redundant.However,in object tracking,the back-ground plays a vital role and leads to definite improvement in the overall process of tracking.In the present work an algorithm is proposed for the multiple object tracking through background learning.The learning framework is based on graph embedding approach for localizing multiple objects.The graph utilizes the inher-ent capabilities of depth modelling that assist in prior to track occlusion avoidance among multiple objects.The proposed algorithm has been compared with the recent work available in literature on numerous performance evaluation measures.It is observed that our proposed algorithm gives better performance.展开更多
Nowadays,to improve English learning capability,an increasing number of Chinese students choose to study abroad.As a member of International students,I have a puzzle about why I am less motivated to learn English in C...Nowadays,to improve English learning capability,an increasing number of Chinese students choose to study abroad.As a member of International students,I have a puzzle about why I am less motivated to learn English in China than in England,which is the same as the puzzle of a large proportion of international students.Furthermore,self-determination theory(SDT)focuses on students’learning motivation,therefore SDT helps investigate the international students’puzzle which related to language learning motivation.展开更多
Robert Mills Gagne's five categories of learning have a profound influence on the many aspects of educational field.This essay attempts to differentiate and analyze the five categories of learning:motor skills,ver...Robert Mills Gagne's five categories of learning have a profound influence on the many aspects of educational field.This essay attempts to differentiate and analyze the five categories of learning:motor skills,verbal information,intellectual skills,cognitive strategies,and attitudes.And then applies Gagne's five categories of learning to design English teaching objectives.展开更多
Anomalous situations in surveillance videos or images that may result in security issues,such as disasters,accidents,crime,violence,or terrorism,can be identified through video anomaly detection.However,differentiat-i...Anomalous situations in surveillance videos or images that may result in security issues,such as disasters,accidents,crime,violence,or terrorism,can be identified through video anomaly detection.However,differentiat-ing anomalous situations from normal can be challenging due to variations in human activity in complex environments such as train stations,busy sporting fields,airports,shopping areas,military bases,care centers,etc.Deep learning models’learning capability is leveraged to identify abnormal situations with improved accuracy.This work proposes a deep learning architecture called Anomalous Situation Recognition Network(ASRNet)for deep feature extraction to improve the detection accuracy of various anomalous image situations.The proposed framework has five steps.In the first step,pretraining of the proposed architecture is performed on the CIFAR-100 dataset.In the second step,the proposed pre-trained model and Inception V3 architecture are used for feature extraction by utilizing the suspicious activity recognition dataset.In the third step,serial feature fusion is performed,and then the Dragonfly algorithm is utilized for feature optimization in the fourth step.Finally,using optimized features,various Support Vector Machine(SVM)and K-Nearest Neighbor(KNN)based classification models are utilized to detect anomalous situations.The proposed framework is validated on the suspicious activity dataset by varying the number of optimized features from 100 to 1000.The results show that the proposed method is effective in detecting anomalous situations and achieves the highest accuracy of 99.24%using cubic SVM.展开更多
Collaborative Robotics is one of the high-interest research topics in the area of academia and industry.It has been progressively utilized in numerous applications,particularly in intelligent surveillance systems.It a...Collaborative Robotics is one of the high-interest research topics in the area of academia and industry.It has been progressively utilized in numerous applications,particularly in intelligent surveillance systems.It allows the deployment of smart cameras or optical sensors with computer vision techniques,which may serve in several object detection and tracking tasks.These tasks have been considered challenging and high-level perceptual problems,frequently dominated by relative information about the environment,where main concerns such as occlusion,illumination,background,object deformation,and object class variations are commonplace.In order to show the importance of top view surveillance,a collaborative robotics framework has been presented.It can assist in the detection and tracking of multiple objects in top view surveillance.The framework consists of a smart robotic camera embedded with the visual processing unit.The existing pre-trained deep learning models named SSD and YOLO has been adopted for object detection and localization.The detection models are further combined with different tracking algorithms,including GOTURN,MEDIANFLOW,TLD,KCF,MIL,and BOOSTING.These algorithms,along with detection models,help to track and predict the trajectories of detected objects.The pre-trained models are employed;therefore,the generalization performance is also investigated through testing the models on various sequences of top view data set.The detection models achieved maximum True Detection Rate 93%to 90%with a maximum 0.6%False Detection Rate.The tracking results of different algorithms are nearly identical,with tracking accuracy ranging from 90%to 94%.Furthermore,a discussion has been carried out on output results along with future guidelines.展开更多
There is a drastic increase experienced in the production of vehicles in recent years across the globe.In this scenario,vehicle classification system plays a vital part in designing Intelligent Transportation Systems(...There is a drastic increase experienced in the production of vehicles in recent years across the globe.In this scenario,vehicle classification system plays a vital part in designing Intelligent Transportation Systems(ITS)for automatic highway toll collection,autonomous driving,and traffic management.Recently,computer vision and pattern recognition models are useful in designing effective vehicle classification systems.But these models are trained using a small number of hand-engineered features derived fromsmall datasets.So,such models cannot be applied for real-time road traffic conditions.Recent developments in Deep Learning(DL)-enabled vehicle classification models are highly helpful in resolving the issues that exist in traditional models.In this background,the current study develops a Lightning Search Algorithm with Deep Transfer Learning-based Vehicle Classification Model for ITS,named LSADTL-VCITS model.The key objective of the presented LSADTL-VCITS model is to automatically detect and classify the types of vehicles.To accomplish this,the presented LSADTL-VCITS model initially employs You Only Look Once(YOLO)-v5 object detector with Capsule Network(CapsNet)as baseline model.In addition,the proposed LSADTL-VCITS model applies LSA with Multilayer Perceptron(MLP)for detection and classification of the vehicles.The performance of the proposed LSADTL-VCITS model was experimentally validated using benchmark dataset and the outcomes were examined under several measures.The experimental outcomes established the superiority of the proposed LSADTL-VCITS model compared to existing approaches.展开更多
Human hand detection in uncontrolled environments is a challenging visual recognition task due to numerous variations of hand poses and background image clutter.To achieve highly accurate results as well as provide re...Human hand detection in uncontrolled environments is a challenging visual recognition task due to numerous variations of hand poses and background image clutter.To achieve highly accurate results as well as provide real-time execution,we proposed a deep transfer learning approach over the state-of-the-art deep learning object detector.Our method,denoted as YOLOHANDS,is built on top of the You Only Look Once(YOLO)deep learning architecture,which is modified to adapt to the single class hand detection task.The model transfer is performed by modifying the higher convolutional layers including the last fully connected layer,while initializing lower non-modified layers with the generic pre-trained weights.To address robustness issues,we introduced a comprehensive augmentation procedure over the training image dataset,specifically adapted for the hand detection problem.Experimental evaluation of the proposed method,which is performed on a challenging public dataset,has demonstrated highly accurate results,comparable to the state-of-the-art methods.展开更多
The performance of deep learning(DL)networks has been increased by elaborating the network structures. However, the DL netowrks have many parameters, which have a lot of influence on the performance of the network. We...The performance of deep learning(DL)networks has been increased by elaborating the network structures. However, the DL netowrks have many parameters, which have a lot of influence on the performance of the network. We propose a genetic algorithm(GA) based deep belief neural network(DBNN) method for robot object recognition and grasping purpose. This method optimizes the parameters of the DBNN method, such as the number of hidden units, the number of epochs, and the learning rates, which would reduce the error rate and the network training time of object recognition. After recognizing objects, the robot performs the pick-andplace operations. We build a database of six objects for experimental purpose. Experimental results demonstrate that our method outperforms on the optimized robot object recognition and grasping tasks.展开更多
文摘Discovering floating wastes,especially bottles on water,is a crucial research problem in environmental hygiene.Nevertheless,real-world applications often face challenges such as interference from irrelevant objects and the high cost associated with data collection.Consequently,devising algorithms capable of accurately localizing specific objects within a scene in scenarios where annotated data is limited remains a formidable challenge.To solve this problem,this paper proposes an object discovery by request problem setting and a corresponding algorithmic framework.The proposed problem setting aims to identify specified objects in scenes,and the associated algorithmic framework comprises pseudo data generation and object discovery by request network.Pseudo-data generation generates images resembling natural scenes through various data augmentation rules,using a small number of object samples and scene images.The network structure of object discovery by request utilizes the pre-trained Vision Transformer(ViT)model as the backbone,employs object-centric methods to learn the latent representations of foreground objects,and applies patch-level reconstruction constraints to the model.During the validation phase,we use the generated pseudo datasets as training sets and evaluate the performance of our model on the original test sets.Experiments have proved that our method achieves state-of-the-art performance on Unmanned Aerial Vehicles-Bottle Detection(UAV-BD)dataset and self-constructed dataset Bottle,especially in multi-object scenarios.
基金supported by the National Natural Science Foundation of China under Grant 61671219.
文摘Object detection in unmanned aerial vehicle(UAV)aerial images has become increasingly important in military and civil applications.General object detection models are not robust enough against interclass similarity and intraclass variability of small objects,and UAV-specific nuisances such as uncontrolledweather conditions.Unlike previous approaches focusing on high-level semantic information,we report the importance of underlying features to improve detection accuracy and robustness fromthe information-theoretic perspective.Specifically,we propose a robust and discriminative feature learning approach through mutual information maximization(RD-MIM),which can be integrated into numerous object detection methods for aerial images.Firstly,we present the rank sample mining method to reduce underlying feature differences between the natural image domain and the aerial image domain.Then,we design a momentum contrast learning strategy to make object features similar to the same category and dissimilar to different categories.Finally,we construct a transformer-based global attention mechanism to boost object location semantics by leveraging the high interrelation of different receptive fields.We conduct extensive experiments on the VisDrone and Unmanned Aerial Vehicle Benchmark Object Detection and Tracking(UAVDT)datasets to prove the effectiveness of the proposed method.The experimental results show that our approach brings considerable robustness gains to basic detectors and advanced detection methods,achieving relative growth rates of 51.0%and 39.4%in corruption robustness,respectively.Our code is available at https://github.com/cq100/RD-MIM(accessed on 2 August 2024).
文摘For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior features. Yet existing technologies do not take full advantage of this information. In order to take object recognition further than existing algorithms in the above application, an object recognition method that fuses temporal sequence with scene priori information is proposed. This method first employs YOLOv3 as the basic algorithm to recognize objects in single-frame images, then the DeepSort algorithm to establish association among potential objects recognized in images of different moments, and finally the confidence fusion method and temporal boundary processing method designed herein to fuse, at the decision level, temporal sequence information with scene priori information. Experiments using public datasets and self-built industrial scene datasets show that due to the expansion of information sources, the quality of single-frame images has less impact on the recognition results, whereby the object recognition is greatly improved. It is presented herein as a widely applicable framework for the fusion of information under multiple classes. All the object recognition algorithms that output object class, location information and recognition confidence at the same time can be integrated into this information fusion framework to improve performance.
基金The Shanxi Provincial Administration of Traditional Chinese Medicine,No.2023ZYYDA2005.
文摘BACKGROUND Deep learning provides an efficient automatic image recognition method for small bowel(SB)capsule endoscopy(CE)that can assist physicians in diagnosis.However,the existing deep learning models present some unresolved challenges.AIM To propose a novel and effective classification and detection model to automatically identify various SB lesions and their bleeding risks,and label the lesions accurately so as to enhance the diagnostic efficiency of physicians and the ability to identify high-risk bleeding groups.METHODS The proposed model represents a two-stage method that combined image classification with object detection.First,we utilized the improved ResNet-50 classification model to classify endoscopic images into SB lesion images,normal SB mucosa images,and invalid images.Then,the improved YOLO-V5 detection model was utilized to detect the type of lesion and its risk of bleeding,and the location of the lesion was marked.We constructed training and testing sets and compared model-assisted reading with physician reading.RESULTS The accuracy of the model constructed in this study reached 98.96%,which was higher than the accuracy of other systems using only a single module.The sensitivity,specificity,and accuracy of the model-assisted reading detection of all images were 99.17%,99.92%,and 99.86%,which were significantly higher than those of the endoscopists’diagnoses.The image processing time of the model was 48 ms/image,and the image processing time of the physicians was 0.40±0.24 s/image(P<0.001).CONCLUSION The deep learning model of image classification combined with object detection exhibits a satisfactory diagnostic effect on a variety of SB lesions and their bleeding risks in CE images,which enhances the diagnostic efficiency of physicians and improves the ability of physicians to identify high-risk bleeding groups.
基金This work was supported by the Outstanding Youth Science and Technology Innovation Team Project of Colleges and Universities in Hubei Province(Grant No.T201923)Key Science and Technology Project of Jingmen(Grant Nos.2021ZDYF024,2022ZDYF019)+2 种基金LIAS Pioneering Partnerships Award,UK(Grant No.P202ED10)Data Science Enhancement Fund,UK(Grant No.P202RE237)Cultivation Project of Jingchu University of Technology(Grant No.PY201904).
文摘Optical image-based ship detection can ensure the safety of ships and promote the orderly management of ships in offshore waters.Current deep learning researches on optical image-based ship detection mainly focus on improving one-stage detectors for real-time ship detection but sacrifices the accuracy of detection.To solve this problem,we present a hybrid ship detection framework which is named EfficientShip in this paper.The core parts of the EfficientShip are DLA-backboned object location(DBOL)and CascadeRCNN-guided object classification(CROC).The DBOL is responsible for finding potential ship objects,and the CROC is used to categorize the potential ship objects.We also design a pixel-spatial-level data augmentation(PSDA)to reduce the risk of detection model overfitting.We compare the proposed EfficientShip with state-of-the-art(SOTA)literature on a ship detection dataset called Seaships.Experiments show our ship detection framework achieves a result of 99.63%(mAP)at 45 fps,which is much better than 8 SOTA approaches on detection accuracy and can also meet the requirements of real-time application scenarios.
基金supported by the Project SP2023/074 Application of Machine and Process Control Advanced Methods supported by the Ministry of Education,Youth and Sports,Czech Republic.
文摘Computer vision(CV)was developed for computers and other systems to act or make recommendations based on visual inputs,such as digital photos,movies,and other media.Deep learning(DL)methods are more successful than other traditional machine learning(ML)methods inCV.DL techniques can produce state-of-the-art results for difficult CV problems like picture categorization,object detection,and face recognition.In this review,a structured discussion on the history,methods,and applications of DL methods to CV problems is presented.The sector-wise presentation of applications in this papermay be particularly useful for researchers in niche fields who have limited or introductory knowledge of DL methods and CV.This review will provide readers with context and examples of how these techniques can be applied to specific areas.A curated list of popular datasets and a brief description of them are also included for the benefit of readers.
文摘Recently,there has been a notable surge of interest in scientific research regarding spectral images.The potential of these images to revolutionize the digital photography industry,like aerial photography through Unmanned Aerial Vehicles(UAVs),has captured considerable attention.One encouraging aspect is their combination with machine learning and deep learning algorithms,which have demonstrated remarkable outcomes in image classification.As a result of this powerful amalgamation,the adoption of spectral images has experienced exponential growth across various domains,with agriculture being one of the prominent beneficiaries.This paper presents an extensive survey encompassing multispectral and hyperspectral images,focusing on their applications for classification challenges in diverse agricultural areas,including plants,grains,fruits,and vegetables.By meticulously examining primary studies,we delve into the specific agricultural domains where multispectral and hyperspectral images have found practical use.Additionally,our attention is directed towards utilizing machine learning techniques for effectively classifying hyperspectral images within the agricultural context.The findings of our investigation reveal that deep learning and support vector machines have emerged as widely employed methods for hyperspectral image classification in agriculture.Nevertheless,we also shed light on the various issues and limitations of working with spectral images.This comprehensive analysis aims to provide valuable insights into the current state of spectral imaging in agriculture and its potential for future advancements.
基金supported in part by National Natural Science Foundation of China(No.62176041)in part by Excellent Science and Technique Talent Foundation of Dalian(No.2022RY21).
文摘Significant advancements have beenwitnessed in visual tracking applications leveragingViT in recent years,mainly due to the formidablemodeling capabilities of Vision Transformer(ViT).However,the strong performance of such trackers heavily relies on ViT models pretrained for long periods,limitingmore flexible model designs for tracking tasks.To address this issue,we propose an efficient unsupervised ViT pretraining method for the tracking task based on masked autoencoders,called TrackMAE.During pretraining,we employ two shared-parameter ViTs,serving as the appearance encoder and motion encoder,respectively.The appearance encoder encodes randomly masked image data,while the motion encoder encodes randomly masked pairs of video frames.Subsequently,an appearance decoder and a motion decoder separately reconstruct the original image data and video frame data at the pixel level.In this way,ViT learns to understand both the appearance of images and the motion between video frames simultaneously.Experimental results demonstrate that ViT-Base and ViT-Large models,pretrained with TrackMAE and combined with a simple tracking head,achieve state-of-the-art(SOTA)performance without additional design.Moreover,compared to the currently popular MAE pretraining methods,TrackMAE consumes only 1/5 of the training time,which will facilitate the customization of diverse models for tracking.For instance,we additionally customize a lightweight ViT-XS,which achieves SOTA efficient tracking performance.
基金funded by Hunan Provincial Natural Science Foundation of China with Grant Numbers(2022JJ50016,2023JJ50096)Innovation Platform Open Fund of Hengyang Normal University Grant 2021HSKFJJ039Hengyang Science and Technology Plan Guiding Project with Number 202222025902.
文摘In rice production,the prevention and management of pests and diseases have always received special attention.Traditional methods require human experts,which is costly and time-consuming.Due to the complexity of the structure of rice diseases and pests,quickly and reliably recognizing and locating them is difficult.Recently,deep learning technology has been employed to detect and identify rice diseases and pests.This paper introduces common publicly available datasets;summarizes the applications on rice diseases and pests from the aspects of image recognition,object detection,image segmentation,attention mechanism,and few-shot learning methods according to the network structure differences;and compares the performances of existing studies.Finally,the current issues and challenges are explored fromthe perspective of data acquisition,data processing,and application,providing possible solutions and suggestions.This study aims to review various DL models and provide improved insight into DL techniques and their cutting-edge progress in the prevention and management of rice diseases and pests.
基金supported by the National Key R&D Program of China(Grant No.2021YFB3900502)the Scientific Research and Development Program of China Railway(K2019G008)the Tianjin Intelligent Manufacturing Special Fund Project(No.20201198).
文摘Arbitrary‐oriented object detection is widely used in aerial image applications because of its efficient object representation.However,the use of oriented bounding box aggravates the imbalance between positive and negative samples when using one‐stage object detectors,which seriously decreases the detection accuracy.We believe that it is the anchor learning strategy(ALS)used by such detectors that needs to take the responsibility.In this study,three perspectives on ALS design were summarised and ALS—Performance Releaser with Smart Anchor Learning(PRSAL)was proposed.Performance Releaser with Smart Anchor Learning is a dynamic ALS that utilises anchor classification ability as an equivalent indicator to anchor box regression ability,this allows anchors with high detection potential to be filtered out in a more reasonable way.At the same time,PRSAL focuses more on anchor potential and it is able to automatically select a number of positive samples that far exceed that of other methods by activating anchors that previously had a low spatial overlap,thereby releasing the detection performance.We validate the PRSAL using three remote sensing datasets—HRSC2016,DOTA and UCAS‐AOD as well as one scene text dataset—ICDAR 2013.The experimental results show that the proposed method gives substantially better results than existing models.
基金funded by Institutional Fund Projects under Grant No. (IFPIP:557-135-1443).
文摘Object Detection is the task of localization and classification of objects in a video or image.In recent times,because of its widespread applications,it has obtained more importance.In the modern world,waste pollution is one significant environmental problem.The prominence of recycling is known very well for both ecological and economic reasons,and the industry needs higher efficiency.Waste object detection utilizing deep learning(DL)involves training a machine-learning method to classify and detect various types of waste in videos or images.This technology is utilized for several purposes recycling and sorting waste,enhancing waste management and reducing environmental pollution.Recent studies of automatic waste detection are difficult to compare because of the need for benchmarks and broadly accepted standards concerning the employed data andmetrics.Therefore,this study designs an Entropy-based Feature Fusion using Deep Learning forWasteObject Detection and Classification(EFFDL-WODC)algorithm.The presented EFFDL-WODC system inherits the concepts of feature fusion and DL techniques for the effectual recognition and classification of various kinds of waste objects.In the presented EFFDL-WODC system,two major procedures can be contained,such as waste object detection and waste object classification.For object detection,the EFFDL-WODC technique uses a YOLOv7 object detector with a fusionbased backbone network.In addition,entropy feature fusion-based models such as VGG-16,SqueezeNet,and NASNetmodels are used.Finally,the EFFDL-WODC technique uses a graph convolutional network(GCN)model performed for the classification of detected waste objects.The performance validation of the EFFDL-WODC approach was validated on the benchmark database.The comprehensive comparative results demonstrated the improved performance of the EFFDL-WODC technique over recent approaches.
文摘At present days,object detection and tracking concepts have gained more importance among researchers and business people.Presently,deep learning(DL)approaches have been used for object tracking as it increases the perfor-mance and speed of the tracking process.This paper presents a novel robust DL based object detection and tracking algorithm using Automated Image Anno-tation with ResNet based Faster regional convolutional neural network(R-CNN)named(AIA-FRCNN)model.The AIA-RFRCNN method performs image anno-tation using a Discriminative Correlation Filter(DCF)with Channel and Spatial Reliability tracker(CSR)called DCF-CSRT model.The AIA-RFRCNN model makes use of Faster RCNN as an object detector and tracker,which involves region proposal network(RPN)and Fast R-CNN.The RPN is a full convolution network that concurrently predicts the bounding box and score of different objects.The RPN is a trained model used for the generation of the high-quality region proposals,which are utilized by Fast R-CNN for detection process.Besides,Residual Network(ResNet 101)model is used as a shared convolutional neural network(CNN)for the generation of feature maps.The performance of the ResNet 101 model is further improved by the use of Adam optimizer,which tunes the hyperparameters namely learning rate,batch size,momentum,and weight decay.Finally,softmax layer is applied to classify the images.The performance of the AIA-RFRCNN method has been assessed using a benchmark dataset and a detailed comparative analysis of the results takes place.The outcome of the experiments indicated the superior characteristics of the AIA-RFRCNN model under diverse aspects.
文摘This paper discusses about the new approach of multiple object track-ing relative to background information.The concept of multiple object tracking through background learning is based upon the theory of relativity,that involves a frame of reference in spatial domain to localize and/or track any object.Thefield of multiple object tracking has seen a lot of research,but researchers have considered the background as redundant.However,in object tracking,the back-ground plays a vital role and leads to definite improvement in the overall process of tracking.In the present work an algorithm is proposed for the multiple object tracking through background learning.The learning framework is based on graph embedding approach for localizing multiple objects.The graph utilizes the inher-ent capabilities of depth modelling that assist in prior to track occlusion avoidance among multiple objects.The proposed algorithm has been compared with the recent work available in literature on numerous performance evaluation measures.It is observed that our proposed algorithm gives better performance.
文摘Nowadays,to improve English learning capability,an increasing number of Chinese students choose to study abroad.As a member of International students,I have a puzzle about why I am less motivated to learn English in China than in England,which is the same as the puzzle of a large proportion of international students.Furthermore,self-determination theory(SDT)focuses on students’learning motivation,therefore SDT helps investigate the international students’puzzle which related to language learning motivation.
文摘Robert Mills Gagne's five categories of learning have a profound influence on the many aspects of educational field.This essay attempts to differentiate and analyze the five categories of learning:motor skills,verbal information,intellectual skills,cognitive strategies,and attitudes.And then applies Gagne's five categories of learning to design English teaching objectives.
基金supported by the“Human Resources Program in Energy Technology”of the Korea Institute of Energy Technology Evaluation and Planning(KETEP)granted financial resources from the Ministry of Trade,Industry Energy,Republic ofKorea.(No.20204010600090).
文摘Anomalous situations in surveillance videos or images that may result in security issues,such as disasters,accidents,crime,violence,or terrorism,can be identified through video anomaly detection.However,differentiat-ing anomalous situations from normal can be challenging due to variations in human activity in complex environments such as train stations,busy sporting fields,airports,shopping areas,military bases,care centers,etc.Deep learning models’learning capability is leveraged to identify abnormal situations with improved accuracy.This work proposes a deep learning architecture called Anomalous Situation Recognition Network(ASRNet)for deep feature extraction to improve the detection accuracy of various anomalous image situations.The proposed framework has five steps.In the first step,pretraining of the proposed architecture is performed on the CIFAR-100 dataset.In the second step,the proposed pre-trained model and Inception V3 architecture are used for feature extraction by utilizing the suspicious activity recognition dataset.In the third step,serial feature fusion is performed,and then the Dragonfly algorithm is utilized for feature optimization in the fourth step.Finally,using optimized features,various Support Vector Machine(SVM)and K-Nearest Neighbor(KNN)based classification models are utilized to detect anomalous situations.The proposed framework is validated on the suspicious activity dataset by varying the number of optimized features from 100 to 1000.The results show that the proposed method is effective in detecting anomalous situations and achieves the highest accuracy of 99.24%using cubic SVM.
基金the Framework of International Cooperation Program managed by the National Research Foundation of Korea(2019K1A3A1A8011295711).
文摘Collaborative Robotics is one of the high-interest research topics in the area of academia and industry.It has been progressively utilized in numerous applications,particularly in intelligent surveillance systems.It allows the deployment of smart cameras or optical sensors with computer vision techniques,which may serve in several object detection and tracking tasks.These tasks have been considered challenging and high-level perceptual problems,frequently dominated by relative information about the environment,where main concerns such as occlusion,illumination,background,object deformation,and object class variations are commonplace.In order to show the importance of top view surveillance,a collaborative robotics framework has been presented.It can assist in the detection and tracking of multiple objects in top view surveillance.The framework consists of a smart robotic camera embedded with the visual processing unit.The existing pre-trained deep learning models named SSD and YOLO has been adopted for object detection and localization.The detection models are further combined with different tracking algorithms,including GOTURN,MEDIANFLOW,TLD,KCF,MIL,and BOOSTING.These algorithms,along with detection models,help to track and predict the trajectories of detected objects.The pre-trained models are employed;therefore,the generalization performance is also investigated through testing the models on various sequences of top view data set.The detection models achieved maximum True Detection Rate 93%to 90%with a maximum 0.6%False Detection Rate.The tracking results of different algorithms are nearly identical,with tracking accuracy ranging from 90%to 94%.Furthermore,a discussion has been carried out on output results along with future guidelines.
文摘There is a drastic increase experienced in the production of vehicles in recent years across the globe.In this scenario,vehicle classification system plays a vital part in designing Intelligent Transportation Systems(ITS)for automatic highway toll collection,autonomous driving,and traffic management.Recently,computer vision and pattern recognition models are useful in designing effective vehicle classification systems.But these models are trained using a small number of hand-engineered features derived fromsmall datasets.So,such models cannot be applied for real-time road traffic conditions.Recent developments in Deep Learning(DL)-enabled vehicle classification models are highly helpful in resolving the issues that exist in traditional models.In this background,the current study develops a Lightning Search Algorithm with Deep Transfer Learning-based Vehicle Classification Model for ITS,named LSADTL-VCITS model.The key objective of the presented LSADTL-VCITS model is to automatically detect and classify the types of vehicles.To accomplish this,the presented LSADTL-VCITS model initially employs You Only Look Once(YOLO)-v5 object detector with Capsule Network(CapsNet)as baseline model.In addition,the proposed LSADTL-VCITS model applies LSA with Multilayer Perceptron(MLP)for detection and classification of the vehicles.The performance of the proposed LSADTL-VCITS model was experimentally validated using benchmark dataset and the outcomes were examined under several measures.The experimental outcomes established the superiority of the proposed LSADTL-VCITS model compared to existing approaches.
基金financed by the Ministry of Education,Science and Technological Development of the Republic of Serbia.
文摘Human hand detection in uncontrolled environments is a challenging visual recognition task due to numerous variations of hand poses and background image clutter.To achieve highly accurate results as well as provide real-time execution,we proposed a deep transfer learning approach over the state-of-the-art deep learning object detector.Our method,denoted as YOLOHANDS,is built on top of the You Only Look Once(YOLO)deep learning architecture,which is modified to adapt to the single class hand detection task.The model transfer is performed by modifying the higher convolutional layers including the last fully connected layer,while initializing lower non-modified layers with the generic pre-trained weights.To address robustness issues,we introduced a comprehensive augmentation procedure over the training image dataset,specifically adapted for the hand detection problem.Experimental evaluation of the proposed method,which is performed on a challenging public dataset,has demonstrated highly accurate results,comparable to the state-of-the-art methods.
文摘The performance of deep learning(DL)networks has been increased by elaborating the network structures. However, the DL netowrks have many parameters, which have a lot of influence on the performance of the network. We propose a genetic algorithm(GA) based deep belief neural network(DBNN) method for robot object recognition and grasping purpose. This method optimizes the parameters of the DBNN method, such as the number of hidden units, the number of epochs, and the learning rates, which would reduce the error rate and the network training time of object recognition. After recognizing objects, the robot performs the pick-andplace operations. We build a database of six objects for experimental purpose. Experimental results demonstrate that our method outperforms on the optimized robot object recognition and grasping tasks.