Copy-Move Forgery Detection(CMFD)is a technique that is designed to identify image tampering and locate suspicious areas.However,the practicality of the CMFD is impeded by the scarcity of datasets,inadequate quality a...Copy-Move Forgery Detection(CMFD)is a technique that is designed to identify image tampering and locate suspicious areas.However,the practicality of the CMFD is impeded by the scarcity of datasets,inadequate quality and quantity,and a narrow range of applicable tasks.These limitations significantly restrict the capacity and applicability of CMFD.To overcome the limitations of existing methods,a novel solution called IMTNet is proposed for CMFD by employing a feature decoupling approach.Firstly,this study formulates the objective task and network relationship as an optimization problem using transfer learning.Furthermore,it thoroughly discusses and analyzes the relationship between CMFD and deep network architecture by employing ResNet-50 during the optimization solving phase.Secondly,a quantitative comparison between fine-tuning and feature decoupling is conducted to evaluate the degree of similarity between the image classification and CMFD domains by the enhanced ResNet-50.Finally,suspicious regions are localized using a feature pyramid network with bottom-up path augmentation.Experimental results demonstrate that IMTNet achieves faster convergence,shorter training times,and favorable generalization performance compared to existingmethods.Moreover,it is shown that IMTNet significantly outperforms fine-tuning based approaches in terms of accuracy and F_(1).展开更多
Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain les...Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain less information.Many current methods,particularly those based on Feature Pyramid Network(FPN),address this challenge by leveraging multi-scale feature fusion.However,existing FPN-based methods often suffer from inadequate feature fusion due to varying resolutions across different layers,leading to suboptimal small object detection.To address this problem,we propose the Two-layerAttention Feature Pyramid Network(TA-FPN),featuring two key modules:the Two-layer Attention Module(TAM)and the Small Object Detail Enhancement Module(SODEM).TAM uses the attention module to make the network more focused on the semantic information of the object and fuse it to the lower layer,so that each layer contains similar semantic information,to alleviate the problem of small object information being submerged due to semantic gaps between different layers.At the same time,SODEM is introduced to strengthen the local features of the object,suppress background noise,enhance the information details of the small object,and fuse the enhanced features to other feature layers to ensure that each layer is rich in small object information,to improve small object detection accuracy.Our extensive experiments on challenging datasets such as Microsoft Common Objects inContext(MSCOCO)and Pattern Analysis Statistical Modelling and Computational Learning,Visual Object Classes(PASCAL VOC)demonstrate the validity of the proposedmethod.Experimental results show a significant improvement in small object detection accuracy compared to state-of-theart detectors.展开更多
Railway turnouts often develop defects such as chipping,cracks,and wear during use.If not detected and addressed promptly,these defects can pose significant risks to train operation safety and passenger security.Despi...Railway turnouts often develop defects such as chipping,cracks,and wear during use.If not detected and addressed promptly,these defects can pose significant risks to train operation safety and passenger security.Despite advances in defect detection technologies,research specifically targeting railway turnout defects remains limited.To address this gap,we collected images from railway inspectors and constructed a dataset of railway turnout defects in complex environments.To enhance detection accuracy,we propose an improved YOLOv8 model named YOLO-VSS-SOUP-Inner-CIoU(YOLO-VSI).The model employs a state-space model(SSM)to enhance the C2f module in the YOLOv8 backbone,proposed the C2f-VSS module to better capture long-range dependencies and contextual features,thus improving feature extraction in complex environments.In the network’s neck layer,we integrate SPDConv and Omni-Kernel Network(OKM)modules to improve the original PAFPN(Path Aggregation Feature Pyramid Network)structure,and proposed the Small Object Upgrade Pyramid(SOUP)structure to enhance small object detection capabilities.Additionally,the Inner-CIoU loss function with a scale factor is applied to further enhance the model’s detection capabilities.Compared to the baseline model,YOLO-VSI demonstrates a 3.5%improvement in average precision on our railway turnout dataset,showcasing increased accuracy and robustness.Experiments on the public NEU-DET dataset reveal a 2.3%increase in average precision over the baseline,indicating that YOLO-VSI has good generalization capabilities.展开更多
In this paper,based on a bidirectional parallel multi-branch feature pyramid network(BPMFPN),a novel one-stage object detector called BPMFPN Det is proposed for real-time detection of ground multi-scale targets by swa...In this paper,based on a bidirectional parallel multi-branch feature pyramid network(BPMFPN),a novel one-stage object detector called BPMFPN Det is proposed for real-time detection of ground multi-scale targets by swarm unmanned aerial vehicles(UAVs).First,the bidirectional parallel multi-branch convolution modules are used to construct the feature pyramid to enhance the feature expression abilities of different scale feature layers.Next,the feature pyramid is integrated into the single-stage object detection framework to ensure real-time performance.In order to validate the effectiveness of the proposed algorithm,experiments are conducted on four datasets.For the PASCAL VOC dataset,the proposed algorithm achieves the mean average precision(mAP)of 85.4 on the VOC 2007 test set.With regard to the detection in optical remote sensing(DIOR)dataset,the proposed algorithm achieves 73.9 mAP.For vehicle detection in aerial imagery(VEDAI)dataset,the detection accuracy of small land vehicle(slv)targets reaches 97.4 mAP.For unmanned aerial vehicle detection and tracking(UAVDT)dataset,the proposed BPMFPN Det achieves the mAP of 48.75.Compared with the previous state-of-the-art methods,the results obtained by the proposed algorithm are more competitive.The experimental results demonstrate that the proposed algorithm can effectively solve the problem of real-time detection of ground multi-scale targets in aerial images of swarm UAVs.展开更多
In order to improve the detection accuracy of small objects,a neighborhood fusion-based hierarchical parallel feature pyramid network(NFPN)is proposed.Unlike the layer-by-layer structure adopted in the feature pyramid...In order to improve the detection accuracy of small objects,a neighborhood fusion-based hierarchical parallel feature pyramid network(NFPN)is proposed.Unlike the layer-by-layer structure adopted in the feature pyramid network(FPN)and deconvolutional single shot detector(DSSD),where the bottom layer of the feature pyramid network relies on the top layer,NFPN builds the feature pyramid network with no connections between the upper and lower layers.That is,it only fuses shallow features on similar scales.NFPN is highly portable and can be embedded in many models to further boost performance.Extensive experiments on PASCAL VOC 2007,2012,and COCO datasets demonstrate that the NFPN-based SSD without intricate tricks can exceed the DSSD model in terms of detection accuracy and inference speed,especially for small objects,e.g.,4%to 5%higher mAP(mean average precision)than SSD,and 2%to 3%higher mAP than DSSD.On VOC 2007 test set,the NFPN-based SSD with 300×300 input reaches 79.4%mAP at 34.6 frame/s,and the mAP can raise to 82.9%after using the multi-scale testing strategy.展开更多
Object detection could be recognized as an essential part of the research to scenarios such as automatic driving and pedestrian detection, etc. Among multiple types of target objects, the identification of small-scale...Object detection could be recognized as an essential part of the research to scenarios such as automatic driving and pedestrian detection, etc. Among multiple types of target objects, the identification of small-scale objects faces significant challenges. We would introduce a new feature pyramid framework called Dual Attention based Feature Pyramid Network(DAFPN), which is designed to avoid predicament about multi-scale object recognition. In DAFPN, the attention mechanism is introduced by calculating the topdown pathway and lateral pathway, where the spatial attention, as well as channel attention, would participate, respectively, such that the pyramidal feature maps can be generated with enhanced spatial and channel interdependencies, which bring more semantical information for the feature pyramid. Using the COCO data set, which consists of a considerable quantity of small-scale objects, the experiments are implemented. The analysis results verify the optimized performance of DAFPN compared with the original Feature Pyramid Network(FPN) specifically for the identification on a small scale. The proposed DAFPN is promising for object detection in an era full of intelligent machines that need to detect multi-scale objects.展开更多
Deep learning for topology optimization has been extensively studied to reduce the cost of calculation in recent years.However,the loss function of the above method is mainly based on pixel-wise errors from the image ...Deep learning for topology optimization has been extensively studied to reduce the cost of calculation in recent years.However,the loss function of the above method is mainly based on pixel-wise errors from the image perspective,which cannot embed the physical knowledge of topology optimization.Therefore,this paper presents an improved deep learning model to alleviate the above difficulty effectively.The feature pyramid network(FPN),a kind of deep learning model,is trained to learn the inherent physical law of topology optimization itself,of which the loss function is composed of pixel-wise errors and physical constraints.Since the calculation of physical constraints requires finite element analysis(FEA)with high calculating costs,the strategy of adjusting the time when physical constraints are added is proposed to achieve the balance between the training cost and the training effect.Then,two classical topology optimization problems are investigated to verify the effectiveness of the proposed method.The results show that the developed model using a small number of samples can quickly obtain the optimization structure without any iteration,which has not only high pixel-wise accuracy but also good physical performance.展开更多
Detecting non-motor drivers’helmets has significant implications for traffic control.Currently,most helmet detection methods are susceptible to the complex background and need more accuracy and better robustness of s...Detecting non-motor drivers’helmets has significant implications for traffic control.Currently,most helmet detection methods are susceptible to the complex background and need more accuracy and better robustness of small object detection,which are unsuitable for practical application scenar-ios.Therefore,this paper proposes a new helmet-wearing detection algorithm based on the You Only Look Once version 5(YOLOv5).First,the Dilated convolution In Coordinate Attention(DICA)layer is added to the backbone network.DICA combines the coordinated attention mechanism with atrous convolution to replace the original convolution layer,which can increase the perceptual field of the network to get more contextual information.Also,it can reduce the network’s learning of unnecessary features in the background and get attention to small objects.Second,the Rebuild Bidirectional Feature Pyramid Network(Re-BiFPN)is used as a feature extraction network.Re-BiFPN uses cross-scale feature fusion to combine the semantic information features at the high level with the spatial information features at the bottom level,which facilitates the model to learn object features at different scales.Verified on the proposed“Helmet Wearing dataset for Non-motor Drivers(HWND),”the results show that the proposed model is superior to the current detection algorithms,with the mean average precision(mAP)of 94.3%under complex background.展开更多
With the increasing demand for power in society,there is much live equipment in substations,and the safety and standardization of live working of workers are facing challenges.Aiming at these problems of scene complex...With the increasing demand for power in society,there is much live equipment in substations,and the safety and standardization of live working of workers are facing challenges.Aiming at these problems of scene complexity and object diversity in the real-time detection of the live working safety of substation workers,an adaptive multihead structure and lightweight feature pyramid-based network(AHLNet)is proposed in this study,which is based on YOLOV3.First,we take AH-Darknet53 as the backbone network of YOLOV3,which can introduce an adaptive multihead(AMH)structure,reduce the number of network parameters,and improve the feature extraction ability of the backbone network.Second,to reduce the number of convolution layers of the deeper feature map,a lightweight feature pyramid network(LFPN)is proposed,which can perform feature fusion in advance to alleviate the problem of feature imbalance and gradient disappearance.Finally,the proposed AHLNet is evaluated on the datasets of 16 categories of substation safety operation scenarios,and the average prediction accuracy MAP_(50)reaches 82.10%.Compared with YOLOV3,MAP_(50)is increased by 2.43%,and the number of parameters is 90 M,which is only 38%of the number of parameters of YOLOV3.In addition,the detection speed is basically the same as that of YOLOV3,which can meet the real-time and accurate detection requirements for the safe operation of substation staff.展开更多
Mature soybean phenotyping is an important process in soybean breeding;however, the manual process is time-consuming and labor-intensive. Therefore, a novel approach that is rapid, accurate and highly precise is requi...Mature soybean phenotyping is an important process in soybean breeding;however, the manual process is time-consuming and labor-intensive. Therefore, a novel approach that is rapid, accurate and highly precise is required to obtain the phenotypic data of soybean stems, pods and seeds. In this research, we propose a mature soybean phenotype measurement algorithm called Soybean Phenotype Measure-instance Segmentation(SPM-IS). SPM-IS is based on a feature pyramid network, Principal Component Analysis(PCA) and instance segmentation. We also propose a new method that uses PCA to locate and measure the length and width of a target object via image instance segmentation. After 60,000 iterations, the maximum mean Average Precision(m AP) of the mask and box was able to reach 95.7%. The correlation coefficients R^(2) of the manual measurement and SPM-IS measurement of the pod length, pod width, stem length, complete main stem length, seed length and seed width were 0.9755, 0.9872, 0.9692, 0.9803,0.9656, and 0.9716, respectively. The correlation coefficients R^(2) of the manual counting and SPM-IS counting of pods, stems and seeds were 0.9733, 0.9872, and 0.9851, respectively. The above results show that SPM-IS is a robust measurement and counting algorithm that can reduce labor intensity, improve efficiency and speed up the soybean breeding process.展开更多
Object detection models based on convolutional neural networks(CNN)have achieved state-of-the-art performance by heavily rely on large-scale training samples.They are insufficient when used in specific applications,su...Object detection models based on convolutional neural networks(CNN)have achieved state-of-the-art performance by heavily rely on large-scale training samples.They are insufficient when used in specific applications,such as the detection of military objects,as in these instances,a large number of samples is hard to obtain.In order to solve this problem,this paper proposes the use of Gabor-CNN for object detection based on a small number of samples.First of all,a feature extraction convolution kernel library composed of multi-shape Gabor and color Gabor is constructed,and the optimal Gabor convolution kernel group is obtained by means of training and screening,which is convolved with the input image to obtain feature information of objects with strong auxiliary function.Then,the k-means clustering algorithm is adopted to construct several different sizes of anchor boxes,which improves the quality of the regional proposals.We call this regional proposal process the Gabor-assisted Region Proposal Network(Gabor-assisted RPN).Finally,the Deeply-Utilized Feature Pyramid Network(DU-FPN)method is proposed to strengthen the feature expression of objects in the image.A bottom-up and a topdown feature pyramid is constructed in ResNet-50 and feature information of objects is deeply utilized through the transverse connection and integration of features at various scales.Experimental results show that the method proposed in this paper achieves better results than the state-of-art contrast models on data sets with small samples in terms of accuracy and recall rate,and thus has a strong application prospect.展开更多
Soybean leaf morphology is one of the most important morphological and biological characteristics of soybean.The germplasm gene differences of soybeans can lead to different phenotypic traits,among which soybean leaf ...Soybean leaf morphology is one of the most important morphological and biological characteristics of soybean.The germplasm gene differences of soybeans can lead to different phenotypic traits,among which soybean leaf morphology is an important parameter that directly reflects the difference in soybean germplasm.To realize the morphological classification of soybean leaves,a method was proposed based on deep learning to automatically detect soybean leaves and classify leaf morphology.The morphology of soybean leaves included lanceolate,oval,ellipse and round.First,an image collection platform was designed to collect images of soybean leaves.Then,the feature pyramid networks–single shot multibox detector(FPN-SSD)model was proposed to detect the top leaflets of soybean leaves on the collected images.Finally,a classification model based on knowledge distillation was proposed to classify different morphologies of soybean leaves.The obtained results indicated an overall classification accuracy of 0.956 over a private dataset of 3200 soybean leaf images,and the accuracy of classification for each morphology was 1.00,0.97,0.93 and 0.94.The results showed that this method could effectively classify soybean leaf morphology and had great application potential in analyzing other phenotypic traits of soybean.展开更多
Anchor-based detectors are widely used in object detection.To improve the accuracy of object detection,multiple anchor boxes are intensively placed on the input image,yet.Most of which are invalid.Although the anchor-...Anchor-based detectors are widely used in object detection.To improve the accuracy of object detection,multiple anchor boxes are intensively placed on the input image,yet.Most of which are invalid.Although the anchor-free method can reduce the number of useless anchor boxes,the invalid ones still occupy a high proportion.On this basis,this paper proposes a multiscale center point object detection method based on parallel network to further reduce the number of useless anchor boxes.This study adopts the parallel network architecture of hourglass-104 and darknet-53 of which the first one outputs heatmaps to generate the center point for object feature location on the output attribute feature map of darknet-53.Combining feature pyramid and CIoU loss function,this algorithm is trained and tested on MSCOCO dataset,increasing the detection rate of target location and the accuracy rate of small object detection.Though resembling the state-of-the-art two-stage detectors in overall object detection accuracy,this algorithm is superior in speed.展开更多
Floating wastes in rivers have specific characteristics such as small scale,low pixel density and complex backgrounds.These characteristics make it prone to false and missed detection during image analysis,thus result...Floating wastes in rivers have specific characteristics such as small scale,low pixel density and complex backgrounds.These characteristics make it prone to false and missed detection during image analysis,thus resulting in a degradation of detection performance.In order to tackle these challenges,a floating waste detection algorithm based on YOLOv7 is proposed,which combines the improved GFPN(Generalized Feature Pyramid Network)and a long-range attention mechanism.Firstly,we import the improved GFPN to replace the Neck of YOLOv7,thus providing more effective information transmission that can scale into deeper networks.Secondly,the convolution-based and hardware-friendly long-range attention mechanism is introduced,allowing the algorithm to rapidly generate an attention map with a global receptive field.Finally,the algorithm adopts the WiseIoU optimization loss function to achieve adaptive gradient gain allocation and alleviate the negative impact of low-quality samples on the gradient.The simulation results reveal that the proposed algorithm has achieved a favorable average accuracy of 86.3%in real-time scene detection tasks.This marks a significant enhancement of approximately 6.3%compared with the baseline,indicating the algorithm's good performance in floating waste detection.展开更多
The area of the pig’s face contains rich biological information,such as eyes,nose,and ear.The high-precision detection of pig face postures is crucial to the identification of pigs,and it can also provide fundamental...The area of the pig’s face contains rich biological information,such as eyes,nose,and ear.The high-precision detection of pig face postures is crucial to the identification of pigs,and it can also provide fundamental archival information for the study of abnormal behavioral characteristics and regularities.In this study,a series of attention blocks were embedded in Feature Pyramid Network(FPN)for automatic detection of the pig face posture in group-breeding environments.Firstly,the Channel Attention Block(CAB)and Position Attention Block(PAB)were proposed to capture the channel dependencies and the pixel-level long-range relationships,respectively.Secondly,a variety of attention modules are proposed to effectively combine the two kinds of attention information,specifically including Parallel Channel Position(PCP),Cascade Position Channel(CPC),and Cascade Channel Position(CCP),which fuse the channel and position attention information in both parallel or cascade ways.Finally,the verification experiments on three task networks with two backbone networks were conducted for different attention blocks or modules.A total of 45 pigs in 8 pigpens were used as the research objects.Experimental results show that attention-based models perform better.Especially,with Faster Region Convolutional Neural Network(Faster R-CNN)as the task network and ResNet101 as the backbone network,after the introduction of the PCP module,the Average Precision(AP)indicators of the face poses of Downward with head-on face(D-O),Downward with lateral face(D-L),Level with head-on face(L-O),Level with lateral face(L-L),Upward with head-on face(U-O),and Upward with lateral face(U-L)achieve 91.55%,90.36%,90.10%,90.05%,85.96%,and 87.92%,respectively.Ablation experiments show that the PAB attention block is not as effective as the CAB attention block,and the parallel combination method is better than the cascade manner.Taking Faster R-CNN as the task network and ResNet101 as the backbone network,the heatmap visualization of different layers of FPN before and after adding PCP shows that,compared with the non-PCP module,the PCP module can more easily aggregate denser and richer contextual information,this,in turn,enhances long-range dependencies to improve feature representation.At the same time,the model based on PCP attention can effectively detect the pig face posture of different ages,different scenes,and different light intensities,which can help lay the foundation for subsequent individual identification and behavior analysis of pigs.展开更多
Face anti-spoofing is used to assist face recognition system to judge whether the detected face is real face or fake face. In the traditional face anti-spoofing methods, features extracted by hand are used to describe...Face anti-spoofing is used to assist face recognition system to judge whether the detected face is real face or fake face. In the traditional face anti-spoofing methods, features extracted by hand are used to describe the difference between living face and fraudulent face. But these handmade features do not apply to different variations in an unconstrained environment. The convolutional neural network(CNN) for face deceptions achieves considerable results. However, most existing neural network-based methods simply use neural networks to extract single-scale features from single-modal data, while ignoring multi-scale and multi-modal information. To address this problem, a novel face anti-spoofing method based on multi-modal and multi-scale features fusion(MMFF) is proposed. Specifically, first residual network(Resnet)-34 is adopted to extract features of different scales from each modality, then these features of different scales are fused by feature pyramid network(FPN), finally squeeze-and-excitation fusion(SEF) module and self-attention network(SAN) are combined to fuse features from different modalities for classification. Experiments on the CASIA-SURF dataset show that the new method based on MMFF achieves better performance compared with most existing methods.展开更多
Due to the illumination,complex background,and occlusion of the litchi fruits,the accurate detection of litchi in the field is extremely challenging.In order to solve the problem of the low recognition rate of litchi-...Due to the illumination,complex background,and occlusion of the litchi fruits,the accurate detection of litchi in the field is extremely challenging.In order to solve the problem of the low recognition rate of litchi-picking robots in field conditions,this study was inspired by the ideas of ResNet and dense convolution and proposed an improved feature-extraction network model named“YOLOv3_Litchi”,combining dense connections and residuals for the detection of litchis.Firstly,based on the traditional YOLOv3 deep convolution neural network and regression detection,the idea of residuals was to be put into the feature-extraction network to effectively avoid the problem of decreasing detection accuracy due to the excessive depths of the network layers.Secondly,under the premise of a good receptive field and high detection accuracy,the large convolution kernel was replaced by a small convolution kernel in the shallow layer of the network,thereby effectively reducing the model parameters.Finally,the idea of feature pyramid was used to design the network to identify the small target litchi to ensure that the shallow features were not lost and simultaneously reduced the model parameters.Experimental results show that the improved YOLOv3_Litchi model achieved better results than the classic YOLOv3_DarkNet-53 model and the YOLOv3_Tiny model.The mean average precision(mAP)score was 97.07%,which was higher than the 95.18%mAP of the YOLOv3_DarkNet-53 model and the 94.48%mAP of the YOLOv3_Tiny model.The frame frequency was 58 fps,which was higher than 29 fps of the YOLOv3_DarkNet-53 model.Compared with the classic Faster R-CNN model with the feature-extraction network VGG16,the mAP was increased by 1%,and the FPS advantage was obvious.Compared with the classic single shot multibox detector(SSD)model,both the accuracy and the running efficiency were improved.The results show that the improved YOLOv3_Litchi model had stronger robustness,higher detection accuracy,and less computational complexity for the identification of litchi in the field conditions,which should be helpful for litchi orchard precision management.展开更多
Scene text detection plays a significant role in various applications,such as object recognition,document management,and visual navigation.The instance segmentation based method has been mostly used in existing resear...Scene text detection plays a significant role in various applications,such as object recognition,document management,and visual navigation.The instance segmentation based method has been mostly used in existing research due to its advantages in dealing with multi-oriented texts.However,a large number of non-text pixels exist in the labels during the model training,leading to text mis-segmentation.In this paper,we propose a novel multi-oriented scene text detection framework,which includes two main modules:character instance segmentation(one instance corresponds to one character),and character flow construction(one character flow corresponds to one word).We use feature pyramid network(FPN)to predict character and non-character instances with arbitrary directions.A joint network of FPN and bidirectional long short-term memory(BLSTM)is developed to explore the context information among isolated characters,which are finally grouped into character flows.Extensive experiments are conducted on ICDAR2013,ICDAR2015,MSRA-TD500 and MLT datasets to demonstrate the effectiveness of our approach.The F-measures are 92.62%,88.02%,83.69%and 77.81%,respectively.展开更多
Detection efficiency plays an increasingly important role in object detection tasks.One-stage methods are widely adopted in real life because of their high efficiency especially in some real-time detection tasks such ...Detection efficiency plays an increasingly important role in object detection tasks.One-stage methods are widely adopted in real life because of their high efficiency especially in some real-time detection tasks such as face recognition and self-driving cars.RetinaMask achieves significant progress in the field of one-stage detectors by adding a semantic segmentation branch,but it has limitation in detecting multi-scale objects.To solve this problem,this paper proposes RetinaMask with Gate(RMG)model,consisting of four main modules.It develops RetinaMask with a gate mechanism,which extracts and combines features at different levels more effectively according to the size of objects.It firstly extracted multi-level features from input image by ResNet.Secondly,it constructed a fused feature pyramid through feature pyramid network,then gate mechanism was employed to adaptively enhance and integrate features at various scales with the respect to the size of object.Finally,three prediction heads were added for classification,localization and mask prediction,driving the model to learn with mask prediction.The predictions of all levels were integrated during the post-processing.The augment network shows better performance in object detection without the increase of computation cost and inference time,especially for small objects.展开更多
基金supported and founded by the Guizhou Provincial Science and Technology Project under the Grant No.QKH-Basic-ZK[2021]YB311the Youth Science and Technology Talent Growth Project of Guizhou Provincial Education Department under Grant No.QJH-KY-ZK[2021]132+2 种基金the Guizhou Provincial Science and Technology Project under the Grant No.QKH-Basic-ZK[2021]YB319the National Natural Science Foundation of China(NSFC)under Grant 61902085the Key Laboratory Program of Blockchain and Fintech of Department of Education of Guizhou Province(2023-014).
文摘Copy-Move Forgery Detection(CMFD)is a technique that is designed to identify image tampering and locate suspicious areas.However,the practicality of the CMFD is impeded by the scarcity of datasets,inadequate quality and quantity,and a narrow range of applicable tasks.These limitations significantly restrict the capacity and applicability of CMFD.To overcome the limitations of existing methods,a novel solution called IMTNet is proposed for CMFD by employing a feature decoupling approach.Firstly,this study formulates the objective task and network relationship as an optimization problem using transfer learning.Furthermore,it thoroughly discusses and analyzes the relationship between CMFD and deep network architecture by employing ResNet-50 during the optimization solving phase.Secondly,a quantitative comparison between fine-tuning and feature decoupling is conducted to evaluate the degree of similarity between the image classification and CMFD domains by the enhanced ResNet-50.Finally,suspicious regions are localized using a feature pyramid network with bottom-up path augmentation.Experimental results demonstrate that IMTNet achieves faster convergence,shorter training times,and favorable generalization performance compared to existingmethods.Moreover,it is shown that IMTNet significantly outperforms fine-tuning based approaches in terms of accuracy and F_(1).
文摘Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain less information.Many current methods,particularly those based on Feature Pyramid Network(FPN),address this challenge by leveraging multi-scale feature fusion.However,existing FPN-based methods often suffer from inadequate feature fusion due to varying resolutions across different layers,leading to suboptimal small object detection.To address this problem,we propose the Two-layerAttention Feature Pyramid Network(TA-FPN),featuring two key modules:the Two-layer Attention Module(TAM)and the Small Object Detail Enhancement Module(SODEM).TAM uses the attention module to make the network more focused on the semantic information of the object and fuse it to the lower layer,so that each layer contains similar semantic information,to alleviate the problem of small object information being submerged due to semantic gaps between different layers.At the same time,SODEM is introduced to strengthen the local features of the object,suppress background noise,enhance the information details of the small object,and fuse the enhanced features to other feature layers to ensure that each layer is rich in small object information,to improve small object detection accuracy.Our extensive experiments on challenging datasets such as Microsoft Common Objects inContext(MSCOCO)and Pattern Analysis Statistical Modelling and Computational Learning,Visual Object Classes(PASCAL VOC)demonstrate the validity of the proposedmethod.Experimental results show a significant improvement in small object detection accuracy compared to state-of-theart detectors.
文摘Railway turnouts often develop defects such as chipping,cracks,and wear during use.If not detected and addressed promptly,these defects can pose significant risks to train operation safety and passenger security.Despite advances in defect detection technologies,research specifically targeting railway turnout defects remains limited.To address this gap,we collected images from railway inspectors and constructed a dataset of railway turnout defects in complex environments.To enhance detection accuracy,we propose an improved YOLOv8 model named YOLO-VSS-SOUP-Inner-CIoU(YOLO-VSI).The model employs a state-space model(SSM)to enhance the C2f module in the YOLOv8 backbone,proposed the C2f-VSS module to better capture long-range dependencies and contextual features,thus improving feature extraction in complex environments.In the network’s neck layer,we integrate SPDConv and Omni-Kernel Network(OKM)modules to improve the original PAFPN(Path Aggregation Feature Pyramid Network)structure,and proposed the Small Object Upgrade Pyramid(SOUP)structure to enhance small object detection capabilities.Additionally,the Inner-CIoU loss function with a scale factor is applied to further enhance the model’s detection capabilities.Compared to the baseline model,YOLO-VSI demonstrates a 3.5%improvement in average precision on our railway turnout dataset,showcasing increased accuracy and robustness.Experiments on the public NEU-DET dataset reveal a 2.3%increase in average precision over the baseline,indicating that YOLO-VSI has good generalization capabilities.
文摘In this paper,based on a bidirectional parallel multi-branch feature pyramid network(BPMFPN),a novel one-stage object detector called BPMFPN Det is proposed for real-time detection of ground multi-scale targets by swarm unmanned aerial vehicles(UAVs).First,the bidirectional parallel multi-branch convolution modules are used to construct the feature pyramid to enhance the feature expression abilities of different scale feature layers.Next,the feature pyramid is integrated into the single-stage object detection framework to ensure real-time performance.In order to validate the effectiveness of the proposed algorithm,experiments are conducted on four datasets.For the PASCAL VOC dataset,the proposed algorithm achieves the mean average precision(mAP)of 85.4 on the VOC 2007 test set.With regard to the detection in optical remote sensing(DIOR)dataset,the proposed algorithm achieves 73.9 mAP.For vehicle detection in aerial imagery(VEDAI)dataset,the detection accuracy of small land vehicle(slv)targets reaches 97.4 mAP.For unmanned aerial vehicle detection and tracking(UAVDT)dataset,the proposed BPMFPN Det achieves the mAP of 48.75.Compared with the previous state-of-the-art methods,the results obtained by the proposed algorithm are more competitive.The experimental results demonstrate that the proposed algorithm can effectively solve the problem of real-time detection of ground multi-scale targets in aerial images of swarm UAVs.
基金The National Natural Science Foundation of China(No.61603091)。
文摘In order to improve the detection accuracy of small objects,a neighborhood fusion-based hierarchical parallel feature pyramid network(NFPN)is proposed.Unlike the layer-by-layer structure adopted in the feature pyramid network(FPN)and deconvolutional single shot detector(DSSD),where the bottom layer of the feature pyramid network relies on the top layer,NFPN builds the feature pyramid network with no connections between the upper and lower layers.That is,it only fuses shallow features on similar scales.NFPN is highly portable and can be embedded in many models to further boost performance.Extensive experiments on PASCAL VOC 2007,2012,and COCO datasets demonstrate that the NFPN-based SSD without intricate tricks can exceed the DSSD model in terms of detection accuracy and inference speed,especially for small objects,e.g.,4%to 5%higher mAP(mean average precision)than SSD,and 2%to 3%higher mAP than DSSD.On VOC 2007 test set,the NFPN-based SSD with 300×300 input reaches 79.4%mAP at 34.6 frame/s,and the mAP can raise to 82.9%after using the multi-scale testing strategy.
基金supported by the National Natural Science Foundation of China(No.61901016)the special fund for basic scientific research in central colleges and universities-Youth talent support program of Beihang University。
文摘Object detection could be recognized as an essential part of the research to scenarios such as automatic driving and pedestrian detection, etc. Among multiple types of target objects, the identification of small-scale objects faces significant challenges. We would introduce a new feature pyramid framework called Dual Attention based Feature Pyramid Network(DAFPN), which is designed to avoid predicament about multi-scale object recognition. In DAFPN, the attention mechanism is introduced by calculating the topdown pathway and lateral pathway, where the spatial attention, as well as channel attention, would participate, respectively, such that the pyramidal feature maps can be generated with enhanced spatial and channel interdependencies, which bring more semantical information for the feature pyramid. Using the COCO data set, which consists of a considerable quantity of small-scale objects, the experiments are implemented. The analysis results verify the optimized performance of DAFPN compared with the original Feature Pyramid Network(FPN) specifically for the identification on a small scale. The proposed DAFPN is promising for object detection in an era full of intelligent machines that need to detect multi-scale objects.
基金This work was supported in part by National Natural Science Foundation of China under Grant Nos.11725211,52005505,and 62001502Post-graduate Scientific Research Innovation Project of Hunan Province under Grant No.CX20200023.
文摘Deep learning for topology optimization has been extensively studied to reduce the cost of calculation in recent years.However,the loss function of the above method is mainly based on pixel-wise errors from the image perspective,which cannot embed the physical knowledge of topology optimization.Therefore,this paper presents an improved deep learning model to alleviate the above difficulty effectively.The feature pyramid network(FPN),a kind of deep learning model,is trained to learn the inherent physical law of topology optimization itself,of which the loss function is composed of pixel-wise errors and physical constraints.Since the calculation of physical constraints requires finite element analysis(FEA)with high calculating costs,the strategy of adjusting the time when physical constraints are added is proposed to achieve the balance between the training cost and the training effect.Then,two classical topology optimization problems are investigated to verify the effectiveness of the proposed method.The results show that the developed model using a small number of samples can quickly obtain the optimization structure without any iteration,which has not only high pixel-wise accuracy but also good physical performance.
基金funded by Natural Science Foundation of Hunan Province under Grant NO:2021JJ31142,author F.J,http://kjt.hunan.gov.cn/.
文摘Detecting non-motor drivers’helmets has significant implications for traffic control.Currently,most helmet detection methods are susceptible to the complex background and need more accuracy and better robustness of small object detection,which are unsuitable for practical application scenar-ios.Therefore,this paper proposes a new helmet-wearing detection algorithm based on the You Only Look Once version 5(YOLOv5).First,the Dilated convolution In Coordinate Attention(DICA)layer is added to the backbone network.DICA combines the coordinated attention mechanism with atrous convolution to replace the original convolution layer,which can increase the perceptual field of the network to get more contextual information.Also,it can reduce the network’s learning of unnecessary features in the background and get attention to small objects.Second,the Rebuild Bidirectional Feature Pyramid Network(Re-BiFPN)is used as a feature extraction network.Re-BiFPN uses cross-scale feature fusion to combine the semantic information features at the high level with the spatial information features at the bottom level,which facilitates the model to learn object features at different scales.Verified on the proposed“Helmet Wearing dataset for Non-motor Drivers(HWND),”the results show that the proposed model is superior to the current detection algorithms,with the mean average precision(mAP)of 94.3%under complex background.
基金supported by the General Scientific Research Project of the Education Department of Zhejiang Province,China(No.Y202146060).
文摘With the increasing demand for power in society,there is much live equipment in substations,and the safety and standardization of live working of workers are facing challenges.Aiming at these problems of scene complexity and object diversity in the real-time detection of the live working safety of substation workers,an adaptive multihead structure and lightweight feature pyramid-based network(AHLNet)is proposed in this study,which is based on YOLOV3.First,we take AH-Darknet53 as the backbone network of YOLOV3,which can introduce an adaptive multihead(AMH)structure,reduce the number of network parameters,and improve the feature extraction ability of the backbone network.Second,to reduce the number of convolution layers of the deeper feature map,a lightweight feature pyramid network(LFPN)is proposed,which can perform feature fusion in advance to alleviate the problem of feature imbalance and gradient disappearance.Finally,the proposed AHLNet is evaluated on the datasets of 16 categories of substation safety operation scenarios,and the average prediction accuracy MAP_(50)reaches 82.10%.Compared with YOLOV3,MAP_(50)is increased by 2.43%,and the number of parameters is 90 M,which is only 38%of the number of parameters of YOLOV3.In addition,the detection speed is basically the same as that of YOLOV3,which can meet the real-time and accurate detection requirements for the safe operation of substation staff.
基金supported by the National Natural Science Foundation of China (31400074, 31471516, 31271747, and 30971809)the Natural Science Foundation of Heilongjiang Province of China(ZD201213)the Heilongjiang Postdoctoral Science Foundation(LBH-Q18025)。
文摘Mature soybean phenotyping is an important process in soybean breeding;however, the manual process is time-consuming and labor-intensive. Therefore, a novel approach that is rapid, accurate and highly precise is required to obtain the phenotypic data of soybean stems, pods and seeds. In this research, we propose a mature soybean phenotype measurement algorithm called Soybean Phenotype Measure-instance Segmentation(SPM-IS). SPM-IS is based on a feature pyramid network, Principal Component Analysis(PCA) and instance segmentation. We also propose a new method that uses PCA to locate and measure the length and width of a target object via image instance segmentation. After 60,000 iterations, the maximum mean Average Precision(m AP) of the mask and box was able to reach 95.7%. The correlation coefficients R^(2) of the manual measurement and SPM-IS measurement of the pod length, pod width, stem length, complete main stem length, seed length and seed width were 0.9755, 0.9872, 0.9692, 0.9803,0.9656, and 0.9716, respectively. The correlation coefficients R^(2) of the manual counting and SPM-IS counting of pods, stems and seeds were 0.9733, 0.9872, and 0.9851, respectively. The above results show that SPM-IS is a robust measurement and counting algorithm that can reduce labor intensity, improve efficiency and speed up the soybean breeding process.
基金supported by the National Natural Science Foundation of China(grant number:61671470)the National Key Research and Development Program of China(grant number:2016YFC0802904)the Postdoctoral Science Foundation Funded Project of China(grant number:2017M623423).
文摘Object detection models based on convolutional neural networks(CNN)have achieved state-of-the-art performance by heavily rely on large-scale training samples.They are insufficient when used in specific applications,such as the detection of military objects,as in these instances,a large number of samples is hard to obtain.In order to solve this problem,this paper proposes the use of Gabor-CNN for object detection based on a small number of samples.First of all,a feature extraction convolution kernel library composed of multi-shape Gabor and color Gabor is constructed,and the optimal Gabor convolution kernel group is obtained by means of training and screening,which is convolved with the input image to obtain feature information of objects with strong auxiliary function.Then,the k-means clustering algorithm is adopted to construct several different sizes of anchor boxes,which improves the quality of the regional proposals.We call this regional proposal process the Gabor-assisted Region Proposal Network(Gabor-assisted RPN).Finally,the Deeply-Utilized Feature Pyramid Network(DU-FPN)method is proposed to strengthen the feature expression of objects in the image.A bottom-up and a topdown feature pyramid is constructed in ResNet-50 and feature information of objects is deeply utilized through the transverse connection and integration of features at various scales.Experimental results show that the method proposed in this paper achieves better results than the state-of-art contrast models on data sets with small samples in terms of accuracy and recall rate,and thus has a strong application prospect.
基金Supported by Heilongjiang Province Philosophy and Social Science Research Planning Project(17TQB059)。
文摘Soybean leaf morphology is one of the most important morphological and biological characteristics of soybean.The germplasm gene differences of soybeans can lead to different phenotypic traits,among which soybean leaf morphology is an important parameter that directly reflects the difference in soybean germplasm.To realize the morphological classification of soybean leaves,a method was proposed based on deep learning to automatically detect soybean leaves and classify leaf morphology.The morphology of soybean leaves included lanceolate,oval,ellipse and round.First,an image collection platform was designed to collect images of soybean leaves.Then,the feature pyramid networks–single shot multibox detector(FPN-SSD)model was proposed to detect the top leaflets of soybean leaves on the collected images.Finally,a classification model based on knowledge distillation was proposed to classify different morphologies of soybean leaves.The obtained results indicated an overall classification accuracy of 0.956 over a private dataset of 3200 soybean leaf images,and the accuracy of classification for each morphology was 1.00,0.97,0.93 and 0.94.The results showed that this method could effectively classify soybean leaf morphology and had great application potential in analyzing other phenotypic traits of soybean.
文摘Anchor-based detectors are widely used in object detection.To improve the accuracy of object detection,multiple anchor boxes are intensively placed on the input image,yet.Most of which are invalid.Although the anchor-free method can reduce the number of useless anchor boxes,the invalid ones still occupy a high proportion.On this basis,this paper proposes a multiscale center point object detection method based on parallel network to further reduce the number of useless anchor boxes.This study adopts the parallel network architecture of hourglass-104 and darknet-53 of which the first one outputs heatmaps to generate the center point for object feature location on the output attribute feature map of darknet-53.Combining feature pyramid and CIoU loss function,this algorithm is trained and tested on MSCOCO dataset,increasing the detection rate of target location and the accuracy rate of small object detection.Though resembling the state-of-the-art two-stage detectors in overall object detection accuracy,this algorithm is superior in speed.
基金Supported by the Science Foundation of the Shaanxi Provincial Department of Science and Technology,General Program-Youth Program(2022JQ-695)the Scientific Research Program Funded by Education Department of Shaanxi Provincial Government(22JK0378)+1 种基金the Talent Program of Weinan Normal University(2021RC20)the Educational Reform Research Project(JG202342)。
文摘Floating wastes in rivers have specific characteristics such as small scale,low pixel density and complex backgrounds.These characteristics make it prone to false and missed detection during image analysis,thus resulting in a degradation of detection performance.In order to tackle these challenges,a floating waste detection algorithm based on YOLOv7 is proposed,which combines the improved GFPN(Generalized Feature Pyramid Network)and a long-range attention mechanism.Firstly,we import the improved GFPN to replace the Neck of YOLOv7,thus providing more effective information transmission that can scale into deeper networks.Secondly,the convolution-based and hardware-friendly long-range attention mechanism is introduced,allowing the algorithm to rapidly generate an attention map with a global receptive field.Finally,the algorithm adopts the WiseIoU optimization loss function to achieve adaptive gradient gain allocation and alleviate the negative impact of low-quality samples on the gradient.The simulation results reveal that the proposed algorithm has achieved a favorable average accuracy of 86.3%in real-time scene detection tasks.This marks a significant enhancement of approximately 6.3%compared with the baseline,indicating the algorithm's good performance in floating waste detection.
基金supported by the National Natural Science Foundation of China(Grant No.31671571)the Shanxi Province Basic Research Program Project(Free Exploration)(Grant No.20210302124523,20210302123408,202103021224149,202103021223141)the Youth Agricultural Science and Technology Innovation Fund of Shanxi Agricultural University(Grant No.2019027).
文摘The area of the pig’s face contains rich biological information,such as eyes,nose,and ear.The high-precision detection of pig face postures is crucial to the identification of pigs,and it can also provide fundamental archival information for the study of abnormal behavioral characteristics and regularities.In this study,a series of attention blocks were embedded in Feature Pyramid Network(FPN)for automatic detection of the pig face posture in group-breeding environments.Firstly,the Channel Attention Block(CAB)and Position Attention Block(PAB)were proposed to capture the channel dependencies and the pixel-level long-range relationships,respectively.Secondly,a variety of attention modules are proposed to effectively combine the two kinds of attention information,specifically including Parallel Channel Position(PCP),Cascade Position Channel(CPC),and Cascade Channel Position(CCP),which fuse the channel and position attention information in both parallel or cascade ways.Finally,the verification experiments on three task networks with two backbone networks were conducted for different attention blocks or modules.A total of 45 pigs in 8 pigpens were used as the research objects.Experimental results show that attention-based models perform better.Especially,with Faster Region Convolutional Neural Network(Faster R-CNN)as the task network and ResNet101 as the backbone network,after the introduction of the PCP module,the Average Precision(AP)indicators of the face poses of Downward with head-on face(D-O),Downward with lateral face(D-L),Level with head-on face(L-O),Level with lateral face(L-L),Upward with head-on face(U-O),and Upward with lateral face(U-L)achieve 91.55%,90.36%,90.10%,90.05%,85.96%,and 87.92%,respectively.Ablation experiments show that the PAB attention block is not as effective as the CAB attention block,and the parallel combination method is better than the cascade manner.Taking Faster R-CNN as the task network and ResNet101 as the backbone network,the heatmap visualization of different layers of FPN before and after adding PCP shows that,compared with the non-PCP module,the PCP module can more easily aggregate denser and richer contextual information,this,in turn,enhances long-range dependencies to improve feature representation.At the same time,the model based on PCP attention can effectively detect the pig face posture of different ages,different scenes,and different light intensities,which can help lay the foundation for subsequent individual identification and behavior analysis of pigs.
基金supported by the National Natural Science Foundation of China(61962010,62262005)the Natural Science Foundation of Guizhou Priovince(QianKeHeJichu[2019]1425).
文摘Face anti-spoofing is used to assist face recognition system to judge whether the detected face is real face or fake face. In the traditional face anti-spoofing methods, features extracted by hand are used to describe the difference between living face and fraudulent face. But these handmade features do not apply to different variations in an unconstrained environment. The convolutional neural network(CNN) for face deceptions achieves considerable results. However, most existing neural network-based methods simply use neural networks to extract single-scale features from single-modal data, while ignoring multi-scale and multi-modal information. To address this problem, a novel face anti-spoofing method based on multi-modal and multi-scale features fusion(MMFF) is proposed. Specifically, first residual network(Resnet)-34 is adopted to extract features of different scales from each modality, then these features of different scales are fused by feature pyramid network(FPN), finally squeeze-and-excitation fusion(SEF) module and self-attention network(SAN) are combined to fuse features from different modalities for classification. Experiments on the CASIA-SURF dataset show that the new method based on MMFF achieves better performance compared with most existing methods.
基金This work was financially supported by the National Natural Science Foundation of China(Grant No.32071912,No.61863011,No.31701325,No.31571568,No.31570180)the Guangzhou Science and Technology Project(Grant No.202002020016,No.202102080337)+4 种基金the Natural Science Foundation of Guangdong Province(Grant No.2018A030313330,No.2020A1515010793)the Second Batch of Industry-Education Cooperation Collaborative Projects in 2019,Ministry of Education(Grant No.201902062040)the Guangzhou Key Laboratory of Intelligent Agriculture(Grant No.201902010081)the Project of Rural Revitalization Strategy in Guangdong Province(Grant No.2020KJ261)the Applied Science and Technology Special Fund Project,Meizhou,China(Grant No.2019B0201005).
文摘Due to the illumination,complex background,and occlusion of the litchi fruits,the accurate detection of litchi in the field is extremely challenging.In order to solve the problem of the low recognition rate of litchi-picking robots in field conditions,this study was inspired by the ideas of ResNet and dense convolution and proposed an improved feature-extraction network model named“YOLOv3_Litchi”,combining dense connections and residuals for the detection of litchis.Firstly,based on the traditional YOLOv3 deep convolution neural network and regression detection,the idea of residuals was to be put into the feature-extraction network to effectively avoid the problem of decreasing detection accuracy due to the excessive depths of the network layers.Secondly,under the premise of a good receptive field and high detection accuracy,the large convolution kernel was replaced by a small convolution kernel in the shallow layer of the network,thereby effectively reducing the model parameters.Finally,the idea of feature pyramid was used to design the network to identify the small target litchi to ensure that the shallow features were not lost and simultaneously reduced the model parameters.Experimental results show that the improved YOLOv3_Litchi model achieved better results than the classic YOLOv3_DarkNet-53 model and the YOLOv3_Tiny model.The mean average precision(mAP)score was 97.07%,which was higher than the 95.18%mAP of the YOLOv3_DarkNet-53 model and the 94.48%mAP of the YOLOv3_Tiny model.The frame frequency was 58 fps,which was higher than 29 fps of the YOLOv3_DarkNet-53 model.Compared with the classic Faster R-CNN model with the feature-extraction network VGG16,the mAP was increased by 1%,and the FPS advantage was obvious.Compared with the classic single shot multibox detector(SSD)model,both the accuracy and the running efficiency were improved.The results show that the improved YOLOv3_Litchi model had stronger robustness,higher detection accuracy,and less computational complexity for the identification of litchi in the field conditions,which should be helpful for litchi orchard precision management.
基金supported by the National Natural Science Foundation of China under Grant No.61902435the National Science and Technology Major Project of China under Grant No.2018AAA0102102+1 种基金the 111 Project under Grant No.B18059the Hunan Provincial Natural Science Foundation of China under Grant No.2019JJ50808.
文摘Scene text detection plays a significant role in various applications,such as object recognition,document management,and visual navigation.The instance segmentation based method has been mostly used in existing research due to its advantages in dealing with multi-oriented texts.However,a large number of non-text pixels exist in the labels during the model training,leading to text mis-segmentation.In this paper,we propose a novel multi-oriented scene text detection framework,which includes two main modules:character instance segmentation(one instance corresponds to one character),and character flow construction(one character flow corresponds to one word).We use feature pyramid network(FPN)to predict character and non-character instances with arbitrary directions.A joint network of FPN and bidirectional long short-term memory(BLSTM)is developed to explore the context information among isolated characters,which are finally grouped into character flows.Extensive experiments are conducted on ICDAR2013,ICDAR2015,MSRA-TD500 and MLT datasets to demonstrate the effectiveness of our approach.The F-measures are 92.62%,88.02%,83.69%and 77.81%,respectively.
基金the National Natural Science Foundation of China under Grant No.61672181。
文摘Detection efficiency plays an increasingly important role in object detection tasks.One-stage methods are widely adopted in real life because of their high efficiency especially in some real-time detection tasks such as face recognition and self-driving cars.RetinaMask achieves significant progress in the field of one-stage detectors by adding a semantic segmentation branch,but it has limitation in detecting multi-scale objects.To solve this problem,this paper proposes RetinaMask with Gate(RMG)model,consisting of four main modules.It develops RetinaMask with a gate mechanism,which extracts and combines features at different levels more effectively according to the size of objects.It firstly extracted multi-level features from input image by ResNet.Secondly,it constructed a fused feature pyramid through feature pyramid network,then gate mechanism was employed to adaptively enhance and integrate features at various scales with the respect to the size of object.Finally,three prediction heads were added for classification,localization and mask prediction,driving the model to learn with mask prediction.The predictions of all levels were integrated during the post-processing.The augment network shows better performance in object detection without the increase of computation cost and inference time,especially for small objects.