Detection of floating garbage in inland rivers is crucial for water environmental protection,as it effectively reduces ecological damage and ensures the safety of water resources.To address the inefficiency of traditi...Detection of floating garbage in inland rivers is crucial for water environmental protection,as it effectively reduces ecological damage and ensures the safety of water resources.To address the inefficiency of traditional cleanup methods and the challenges in detecting small targets,an improved YOLOv5 object detection model was proposed in this study.In order to enhance the model’s sensitivity to small targets and mitigate the impact of redundant information on detection performance,a bi-level routing attention mechanism was introduced and embedded into the backbone network.Additionally,a multi-scale detection head was incorporated into the model,allowing for more comprehensive coverage of floating garbage of various sizes through multi-scale feature extraction and detection.The Focal-EIoU loss function was also employed to optimize the model parameters,improving localization accuracy.Experimental results on the publicly available FloW_Img dataset demonstrated that the improved YOLOv5 model outperforms the original YOLOv5 model in terms of precision and recall,achieving a mAP(mean average precision)of 86.12%,with significant improvements and faster convergence.展开更多
In response to the challenge of low detection accuracy and susceptibility to missed and false detections of small targets in unmanned aerial vehicles(UAVs)aerial images,an improved UAV image target detection algorithm...In response to the challenge of low detection accuracy and susceptibility to missed and false detections of small targets in unmanned aerial vehicles(UAVs)aerial images,an improved UAV image target detection algorithm based on YOLOv8 was proposed in this study.To begin with,the CoordAtt attention mechanism was employed to enhance the feature extraction capability of the backbone network,thereby reducing interference from backgrounds.Additionally,the BiFPN feature fusion network with an added small object detection layer was used to enhance the model's ability to perceive for small objects.Furthermore,a multi-level fusion module was designed and proposed to effectively integrate shallow and deep information.The use of an enhanced MPDIoU loss function further improved detection performance.The experimental results based on the publicly available VisDrone2019 dataset showed that the improved model outperformed the YOLOv8 baseline model,mAP@0.5 improved by 20%,and the improved method improved the detection accuracy of the model for small targets.展开更多
With the warming up and continuous development of machine learning,especially deep learning,the research on visual question answering field has made significant progress,with important theoretical research significanc...With the warming up and continuous development of machine learning,especially deep learning,the research on visual question answering field has made significant progress,with important theoretical research significance and practical application value.Therefore,it is necessary to summarize the current research and provide some reference for researchers in this field.This article conducted a detailed and in-depth analysis and summarized of relevant research and typical methods of visual question answering field.First,relevant background knowledge about VQA(Visual Question Answering)was introduced.Secondly,the issues and challenges of visual question answering were discussed,and at the same time,some promising discussion on the particular methodologies was given.Thirdly,the key sub-problems affecting visual question answering were summarized and analyzed.Then,the current commonly used data sets and evaluation indicators were summarized.Next,in view of the popular algorithms and models in VQA research,comparison of the algorithms and models was summarized and listed.Finally,the future development trend and conclusion of visual question answering were prospected.展开更多
Semantic segmentation is for pixel-level classification tasks,and contextual information has an important impact on the performance of segmentation.In order to capture richer contextual information,we adopt ResNet as ...Semantic segmentation is for pixel-level classification tasks,and contextual information has an important impact on the performance of segmentation.In order to capture richer contextual information,we adopt ResNet as the backbone network and designs an encoder-decoder architecture based on multidimensional attention(MDA)module and multiscale upsampling(MSU)module.The MDA module calculates the attention matrices of the three dimensions to capture the dependency of each position,and adaptively captures the image features.The MSU module adopts parallel branches to capture the multiscale features of the images,and multiscale feature aggregation can enhance contextual information.A series of experiments demonstrate the validity of the model on Cityscapes and Camvid datasets.展开更多
文摘Detection of floating garbage in inland rivers is crucial for water environmental protection,as it effectively reduces ecological damage and ensures the safety of water resources.To address the inefficiency of traditional cleanup methods and the challenges in detecting small targets,an improved YOLOv5 object detection model was proposed in this study.In order to enhance the model’s sensitivity to small targets and mitigate the impact of redundant information on detection performance,a bi-level routing attention mechanism was introduced and embedded into the backbone network.Additionally,a multi-scale detection head was incorporated into the model,allowing for more comprehensive coverage of floating garbage of various sizes through multi-scale feature extraction and detection.The Focal-EIoU loss function was also employed to optimize the model parameters,improving localization accuracy.Experimental results on the publicly available FloW_Img dataset demonstrated that the improved YOLOv5 model outperforms the original YOLOv5 model in terms of precision and recall,achieving a mAP(mean average precision)of 86.12%,with significant improvements and faster convergence.
文摘In response to the challenge of low detection accuracy and susceptibility to missed and false detections of small targets in unmanned aerial vehicles(UAVs)aerial images,an improved UAV image target detection algorithm based on YOLOv8 was proposed in this study.To begin with,the CoordAtt attention mechanism was employed to enhance the feature extraction capability of the backbone network,thereby reducing interference from backgrounds.Additionally,the BiFPN feature fusion network with an added small object detection layer was used to enhance the model's ability to perceive for small objects.Furthermore,a multi-level fusion module was designed and proposed to effectively integrate shallow and deep information.The use of an enhanced MPDIoU loss function further improved detection performance.The experimental results based on the publicly available VisDrone2019 dataset showed that the improved model outperformed the YOLOv8 baseline model,mAP@0.5 improved by 20%,and the improved method improved the detection accuracy of the model for small targets.
基金Project(61702063)supported by the National Natural Science Foundation of China。
文摘With the warming up and continuous development of machine learning,especially deep learning,the research on visual question answering field has made significant progress,with important theoretical research significance and practical application value.Therefore,it is necessary to summarize the current research and provide some reference for researchers in this field.This article conducted a detailed and in-depth analysis and summarized of relevant research and typical methods of visual question answering field.First,relevant background knowledge about VQA(Visual Question Answering)was introduced.Secondly,the issues and challenges of visual question answering were discussed,and at the same time,some promising discussion on the particular methodologies was given.Thirdly,the key sub-problems affecting visual question answering were summarized and analyzed.Then,the current commonly used data sets and evaluation indicators were summarized.Next,in view of the popular algorithms and models in VQA research,comparison of the algorithms and models was summarized and listed.Finally,the future development trend and conclusion of visual question answering were prospected.
基金Fundamental Research Fund in Heilongjiang Provincial Universities(Nos.135409602,135409102)。
文摘Semantic segmentation is for pixel-level classification tasks,and contextual information has an important impact on the performance of segmentation.In order to capture richer contextual information,we adopt ResNet as the backbone network and designs an encoder-decoder architecture based on multidimensional attention(MDA)module and multiscale upsampling(MSU)module.The MDA module calculates the attention matrices of the three dimensions to capture the dependency of each position,and adaptively captures the image features.The MSU module adopts parallel branches to capture the multiscale features of the images,and multiscale feature aggregation can enhance contextual information.A series of experiments demonstrate the validity of the model on Cityscapes and Camvid datasets.