Anchor-free object-detection methods achieve a significant advancement in field of computer vision,particularly in the realm of real-time inferences.However,in remote sensing object detection,anchor-free methods often...Anchor-free object-detection methods achieve a significant advancement in field of computer vision,particularly in the realm of real-time inferences.However,in remote sensing object detection,anchor-free methods often lack of capability in separating the foreground and background.This paper proposes an anchor-free method named probability-enhanced anchor-free detector(ProEnDet)for remote sensing object detection.First,a weighted bidirectional feature pyramid is used for feature extraction.Second,we introduce probability enhancement to strengthen the classification of the object’s foreground and background.The detector uses the logarithm likelihood as the final score to improve the classification of the foreground and background of the object.ProEnDet is verified using the DIOR and NWPU-VHR-10 datasets.The experiment achieved mean average precisions of 61.4 and 69.0 on the DIOR dataset and NWPU-VHR-10 dataset,respectively.ProEnDet achieves a speed of 32.4 FPS on the DIOR dataset,which satisfies the real-time requirements for remote-sensing object detection.展开更多
Person Search is a task involving pedestrian detection and person re-identification,aiming to retrieve person images matching a given objective attribute from a large-scale image library.The Person Search models need ...Person Search is a task involving pedestrian detection and person re-identification,aiming to retrieve person images matching a given objective attribute from a large-scale image library.The Person Search models need to understand and capture the detailed features and context information of smaller objects in the image more accurately and comprehensively.The current popular Person Search models,whether end-to-end or two-step,are based on anchor boxes.However,due to the limitations of the anchor itself,the model inevitably has some disadvantages,such as unbalance of positive and negative samples and redundant calculation,which will affect the performance of models.To address the problem of fine-grained understanding of target pedestrians in complex scenes and small sizes,this paper proposes a Deformable-Attention-based Anchor-free Person Search model(DAAPS).Fully Convolutional One-Stage(FCOS),as a classic Anchor-free detector,is chosen as the model’s infrastructure.The DAAPS model is the first to combine the Anchor-free Person Search model with Deformable Attention Mechanism,applied to guide the model adaptively adjust the perceptual.The Deformable Attention Mechanism is used to help the model focus on the critical information and effectively improve the poor accuracy caused by the absence of anchor boxes.The experiment proves the adaptability of the Attention mechanism to the Anchor-free model.Besides,with an improved ResNeXt+network frame,the DAAPS model selects the Triplet-based Online Instance Matching(TOIM)Loss function to achieve a more precise end-to-end Person Search task.Simulation experiments demonstrate that the proposed model has higher accuracy and better robustness than most Person Search models,reaching 95.0%of mean Average Precision(mAP)and 95.6%of Top-1 on the CUHK-SYSU dataset,48.6%of mAP and 84.7%of Top-1 on the Person Re-identification in the Wild(PRW)dataset,respectively.展开更多
A novel image sequence-based risk behavior detection method to achieve high-precision risk behavior detection for power maintenance personnel is proposed in this paper.In this method,the original image sequence data i...A novel image sequence-based risk behavior detection method to achieve high-precision risk behavior detection for power maintenance personnel is proposed in this paper.In this method,the original image sequence data is first separated from the foreground and background.Then,the free anchor frame detection method is used in the foreground image to detect the personnel and correct their direction.Finally,human posture nodes are extracted from each frame of the image sequence,which are then used to identify the abnormal behavior of the human.Simulation experiment results demonstrate that the proposed algorithm has significant advantages in terms of the accuracy of human posture node detection and risk behavior identification.展开更多
Surface defects can affect the quality of steel plate.Many methods based on computer vision are currently applied to surface defect detection of steel plate.However,their real-time performance and object detection of ...Surface defects can affect the quality of steel plate.Many methods based on computer vision are currently applied to surface defect detection of steel plate.However,their real-time performance and object detection of small defect are still unsatisfactory.An improved object detection network based on You Only Look One-level Feature(YOLOF)is proposed to show excellent performance in surface defect detection of steel plate,called DLF-YOLOF.First,the anchor-free detector is used to reduce the network hyperparameters.Secondly,deformable convolution network and local spatial attention module are introduced into the feature extraction network to increase the contextual information in the feature maps.Also,the soft non-maximum suppression is used to improve detection accuracy significantly.Finally,data augmentation is performed for small defect objects during training to improve detection accuracy.Experiments show the average precision and average precision for small objects are 42.7%and 33.5%at a detection speed of 62 frames per second on a single GPU,respectively.This shows that DLF-YOLOF has excellent performance to meet the needs of industrial real-time detection.展开更多
In the complex orchard environment,the efficient and accurate detection of object fruit is the basic requirement to realize the orchard yield measurement and automatic harvesting.Sometimes it is hard to differentiate ...In the complex orchard environment,the efficient and accurate detection of object fruit is the basic requirement to realize the orchard yield measurement and automatic harvesting.Sometimes it is hard to differentiate between the object fruits and the background because of the similar color,and it is challenging due to the ambient light and camera angle by which the photos have been taken.These problems make it hard to detect green fruits in orchard environments.In this study,a two-stage dense to detection framework(D2D)was proposed to detect green fruits in orchard environments.The proposed model was based on multi-scale feature extraction of target fruit by using feature pyramid networks MobileNetV2+FPN structure and generated region proposal of target fruit by using Region Proposal Network(RPN)structure.In the regression branch,the offset of each local feature was calculated,and the positive and negative samples of the region proposals were predicted by a binary mask prediction to reduce the interference of the background to the prediction box.In the classification branch,features were extracted from each sub-region of the region proposal,and features with distinguishing information were obtained through adaptive weighted pooling to achieve accurate classification.The new proposed model adopted an anchor-free frame design,which improves the generalization ability,makes the model more robust,and reduces the storage requirements.The experimental results of persimmon and green apple datasets show that the new model has the best detection performance,which can provide theoretical reference for other green object detection.展开更多
Great progress has been made toward accurate face detection in recent years.However,the heavy model and expensive computation costs make it difficult to deploy many detectors on mobile and embedded devices where model...Great progress has been made toward accurate face detection in recent years.However,the heavy model and expensive computation costs make it difficult to deploy many detectors on mobile and embedded devices where model size and latency are highly constrained.In this paper,we present a millisecond-level anchor-free face detector,YuNet,which is specifically designed for edge devices.There are several key contributions in improving the efficiency-accuracy trade-off.First,we analyse the influential state-of-theart face detectors in recent years and summarize the rules to reduce the size of models.Then,a lightweight face detector,YuNet,is introduced.Our detector contains a tiny and efficient feature extraction backbone and a simplified pyramid feature fusion neck.To the best of our knowledge,YuNet has the best trade-off between accuracy and speed.It has only 75856 parameters and is less than 1/5 of other small-size detectors.In addition,a training strategy is presented for the tiny face detector,and it can effectively train models with the same distribution of the training set.The proposed YuNet achieves 81.1%mAP(single-scale)on the WIDER FACE validation hard track with a high inference efficiency(Intel i7-12700K:1.6ms per frame at 320×320).Because of its unique advantages,the repository for YuNet and its predecessors has been popular at GitHub and gained more than 11K stars at https://github.com/ShiqiYu/libfacedetection.Keywords:Face detection,object detection,computer version,lightweight,inference efficiency,anchor-free mechanism.展开更多
Object detection is widely used in object tracking;anchor-free object tracking provides an end-to-end single-object-tracking approach.In this study,we propose a new anchor-free network,the Siamese center-prediction ne...Object detection is widely used in object tracking;anchor-free object tracking provides an end-to-end single-object-tracking approach.In this study,we propose a new anchor-free network,the Siamese center-prediction network(SiamCPN).Given the presence of referenced object features in the initial frame,we directly predict the center point and size of the object in subsequent frames in a Siamese-structure network without the need for perframe post-processing operations.Unlike other anchor-free tracking approaches that are based on semantic segmentation and achieve anchor-free tracking by pixel-level prediction,SiamCPN directly obtains all information required for tracking,greatly simplifying the model.A center-prediction sub-network is applied to multiple stages of the backbone to adaptively learn from the experience of different branches of the Siamese net.The model can accurately predict object location,implement appropriate corrections,and regress the size of the target bounding box.Compared to other leading Siamese networks,SiamCPN is simpler,faster,and more efficient as it uses fewer hyperparameters.Experiments demonstrate that our method outperforms other leading Siamese networks on GOT-10K and UAV123 benchmarks,and is comparable to other excellent trackers on LaSOT,VOT2016,and OTB-100 while improving inference speed 1.5 to 2 times.展开更多
基金supported in part by the National Natural Science Foundation of China(42001408).
文摘Anchor-free object-detection methods achieve a significant advancement in field of computer vision,particularly in the realm of real-time inferences.However,in remote sensing object detection,anchor-free methods often lack of capability in separating the foreground and background.This paper proposes an anchor-free method named probability-enhanced anchor-free detector(ProEnDet)for remote sensing object detection.First,a weighted bidirectional feature pyramid is used for feature extraction.Second,we introduce probability enhancement to strengthen the classification of the object’s foreground and background.The detector uses the logarithm likelihood as the final score to improve the classification of the foreground and background of the object.ProEnDet is verified using the DIOR and NWPU-VHR-10 datasets.The experiment achieved mean average precisions of 61.4 and 69.0 on the DIOR dataset and NWPU-VHR-10 dataset,respectively.ProEnDet achieves a speed of 32.4 FPS on the DIOR dataset,which satisfies the real-time requirements for remote-sensing object detection.
基金to the Natural Science Foundation of Shanghai under Grant 21ZR1426500,and the Top-Notch Innovative Talent Training Program for Graduate Students of Shanghai Maritime University under Grant 2021YBR008for their generous support and funding through the project funding program.This funding has played a pivotal role in the successful completion of our research.We are deeply appreciative of their invaluable contribution to our research efforts.
文摘Person Search is a task involving pedestrian detection and person re-identification,aiming to retrieve person images matching a given objective attribute from a large-scale image library.The Person Search models need to understand and capture the detailed features and context information of smaller objects in the image more accurately and comprehensively.The current popular Person Search models,whether end-to-end or two-step,are based on anchor boxes.However,due to the limitations of the anchor itself,the model inevitably has some disadvantages,such as unbalance of positive and negative samples and redundant calculation,which will affect the performance of models.To address the problem of fine-grained understanding of target pedestrians in complex scenes and small sizes,this paper proposes a Deformable-Attention-based Anchor-free Person Search model(DAAPS).Fully Convolutional One-Stage(FCOS),as a classic Anchor-free detector,is chosen as the model’s infrastructure.The DAAPS model is the first to combine the Anchor-free Person Search model with Deformable Attention Mechanism,applied to guide the model adaptively adjust the perceptual.The Deformable Attention Mechanism is used to help the model focus on the critical information and effectively improve the poor accuracy caused by the absence of anchor boxes.The experiment proves the adaptability of the Attention mechanism to the Anchor-free model.Besides,with an improved ResNeXt+network frame,the DAAPS model selects the Triplet-based Online Instance Matching(TOIM)Loss function to achieve a more precise end-to-end Person Search task.Simulation experiments demonstrate that the proposed model has higher accuracy and better robustness than most Person Search models,reaching 95.0%of mean Average Precision(mAP)and 95.6%of Top-1 on the CUHK-SYSU dataset,48.6%of mAP and 84.7%of Top-1 on the Person Re-identification in the Wild(PRW)dataset,respectively.
基金supported by the project“Research and application of key technologies of safe production management and control of substation operation and maintenance based on video semantic analysis”(5700-202133259A-0-0-00)of the State Grid Corporation of China.
文摘A novel image sequence-based risk behavior detection method to achieve high-precision risk behavior detection for power maintenance personnel is proposed in this paper.In this method,the original image sequence data is first separated from the foreground and background.Then,the free anchor frame detection method is used in the foreground image to detect the personnel and correct their direction.Finally,human posture nodes are extracted from each frame of the image sequence,which are then used to identify the abnormal behavior of the human.Simulation experiment results demonstrate that the proposed algorithm has significant advantages in terms of the accuracy of human posture node detection and risk behavior identification.
基金supported by the Natural Science Foundation of Liaoning Province(No.2022-MS-353)Basic Scientific Research Project of Education Department of Liaoning Province(Nos.2020LNZD06 and LJKMZ20220640)。
文摘Surface defects can affect the quality of steel plate.Many methods based on computer vision are currently applied to surface defect detection of steel plate.However,their real-time performance and object detection of small defect are still unsatisfactory.An improved object detection network based on You Only Look One-level Feature(YOLOF)is proposed to show excellent performance in surface defect detection of steel plate,called DLF-YOLOF.First,the anchor-free detector is used to reduce the network hyperparameters.Secondly,deformable convolution network and local spatial attention module are introduced into the feature extraction network to increase the contextual information in the feature maps.Also,the soft non-maximum suppression is used to improve detection accuracy significantly.Finally,data augmentation is performed for small defect objects during training to improve detection accuracy.Experiments show the average precision and average precision for small objects are 42.7%and 33.5%at a detection speed of 62 frames per second on a single GPU,respectively.This shows that DLF-YOLOF has excellent performance to meet the needs of industrial real-time detection.
基金the Natural Science Foundation of Shandong Province in China(Grant No.ZR2020MF076)the Focus on Research and Development Plan in Shandong Province(Grant No.2019GNC106115)+2 种基金the National Nature Science Foundation of China(Grant No.62072289)the Shandong Province Higher Educational Science and Technology Program(Grant No.J18KA308)the Taishan Scholar Program of Shandong Province of China.
文摘In the complex orchard environment,the efficient and accurate detection of object fruit is the basic requirement to realize the orchard yield measurement and automatic harvesting.Sometimes it is hard to differentiate between the object fruits and the background because of the similar color,and it is challenging due to the ambient light and camera angle by which the photos have been taken.These problems make it hard to detect green fruits in orchard environments.In this study,a two-stage dense to detection framework(D2D)was proposed to detect green fruits in orchard environments.The proposed model was based on multi-scale feature extraction of target fruit by using feature pyramid networks MobileNetV2+FPN structure and generated region proposal of target fruit by using Region Proposal Network(RPN)structure.In the regression branch,the offset of each local feature was calculated,and the positive and negative samples of the region proposals were predicted by a binary mask prediction to reduce the interference of the background to the prediction box.In the classification branch,features were extracted from each sub-region of the region proposal,and features with distinguishing information were obtained through adaptive weighted pooling to achieve accurate classification.The new proposed model adopted an anchor-free frame design,which improves the generalization ability,makes the model more robust,and reduces the storage requirements.The experimental results of persimmon and green apple datasets show that the new model has the best detection performance,which can provide theoretical reference for other green object detection.
基金supported in part by National Natural Science Foundation of China(No.61976144)the Stable Support Plan Program of Shenzhen Natural Science Fund,China(No.20200925155017002)the National Key Research and Development Program of China(No.2020 AAA0140000).
文摘Great progress has been made toward accurate face detection in recent years.However,the heavy model and expensive computation costs make it difficult to deploy many detectors on mobile and embedded devices where model size and latency are highly constrained.In this paper,we present a millisecond-level anchor-free face detector,YuNet,which is specifically designed for edge devices.There are several key contributions in improving the efficiency-accuracy trade-off.First,we analyse the influential state-of-theart face detectors in recent years and summarize the rules to reduce the size of models.Then,a lightweight face detector,YuNet,is introduced.Our detector contains a tiny and efficient feature extraction backbone and a simplified pyramid feature fusion neck.To the best of our knowledge,YuNet has the best trade-off between accuracy and speed.It has only 75856 parameters and is less than 1/5 of other small-size detectors.In addition,a training strategy is presented for the tiny face detector,and it can effectively train models with the same distribution of the training set.The proposed YuNet achieves 81.1%mAP(single-scale)on the WIDER FACE validation hard track with a high inference efficiency(Intel i7-12700K:1.6ms per frame at 320×320).Because of its unique advantages,the repository for YuNet and its predecessors has been popular at GitHub and gained more than 11K stars at https://github.com/ShiqiYu/libfacedetection.Keywords:Face detection,object detection,computer version,lightweight,inference efficiency,anchor-free mechanism.
基金supported by the National Key R&D Program of China(Grant No.2018YFC0807500)the National Natural Science Foundation of China(Grant Nos.U20B2070 and 61832016).
文摘Object detection is widely used in object tracking;anchor-free object tracking provides an end-to-end single-object-tracking approach.In this study,we propose a new anchor-free network,the Siamese center-prediction network(SiamCPN).Given the presence of referenced object features in the initial frame,we directly predict the center point and size of the object in subsequent frames in a Siamese-structure network without the need for perframe post-processing operations.Unlike other anchor-free tracking approaches that are based on semantic segmentation and achieve anchor-free tracking by pixel-level prediction,SiamCPN directly obtains all information required for tracking,greatly simplifying the model.A center-prediction sub-network is applied to multiple stages of the backbone to adaptively learn from the experience of different branches of the Siamese net.The model can accurately predict object location,implement appropriate corrections,and regress the size of the target bounding box.Compared to other leading Siamese networks,SiamCPN is simpler,faster,and more efficient as it uses fewer hyperparameters.Experiments demonstrate that our method outperforms other leading Siamese networks on GOT-10K and UAV123 benchmarks,and is comparable to other excellent trackers on LaSOT,VOT2016,and OTB-100 while improving inference speed 1.5 to 2 times.