Bone age assessment(BAA)helps doctors determine how a child’s bones grow and develop in clinical medicine.Traditional BAA methods rely on clinician expertise,leading to time-consuming predictions and inaccurate resul...Bone age assessment(BAA)helps doctors determine how a child’s bones grow and develop in clinical medicine.Traditional BAA methods rely on clinician expertise,leading to time-consuming predictions and inaccurate results.Most deep learning-based BAA methods feed the extracted critical points of images into the network by providing additional annotations.This operation is costly and subjective.To address these problems,we propose a multi-scale attentional densely connected network(MSADCN)in this paper.MSADCN constructs a multi-scale dense connectivity mechanism,which can avoid overfitting,obtain the local features effectively and prevent gradient vanishing even in limited training data.First,MSADCN designs multi-scale structures in the densely connected network to extract fine-grained features at different scales.Then,coordinate attention is embedded to focus on critical features and automatically locate the regions of interest(ROI)without additional annotation.In addition,to improve the model’s generalization,transfer learning is applied to train the proposed MSADCN on the public dataset IMDB-WIKI,and the obtained pre-trained weights are loaded onto the Radiological Society of North America(RSNA)dataset.Finally,label distribution learning(LDL)and expectation regression techniques are introduced into our model to exploit the correlation between hand bone images of different ages,which can obtain stable age estimates.Extensive experiments confirm that our model can converge more efficiently and obtain a mean absolute error(MAE)of 4.64 months,outperforming some state-of-the-art BAA methods.展开更多
The field of finance heavily relies on cybersecurity to safeguard its systems and clients from harmful software.The identification of malevolent code within financial software is vital for protecting both the financia...The field of finance heavily relies on cybersecurity to safeguard its systems and clients from harmful software.The identification of malevolent code within financial software is vital for protecting both the financial system and individual clients.Nevertheless,present detection models encounter limitations in their ability to identify malevolent code and its variations,all while encompassing a multitude of parameters.To overcome these obsta-cles,we introduce a lean model for classifying families of malevolent code,formulated on Ghost-DenseNet-SE.This model integrates the Ghost module,DenseNet,and the squeeze-and-excitation(SE)channel domain attention mechanism.It substitutes the standard convolutional layer in DenseNet with the Ghost module,thereby diminishing the model’s size and augmenting recognition speed.Additionally,the channel domain attention mechanism assigns distinctive weights to feature channels,facilitating the extraction of pivotal characteristics of malevolent code and bolstering detection precision.Experimental outcomes on the Malimg dataset indicate that the model attained an accuracy of 99.14%in discerning families of malevolent code,surpassing AlexNet(97.8%)and The visual geometry group network(VGGNet)(96.16%).The proposed model exhibits reduced parameters,leading to decreased model complexity alongside enhanced classification accuracy,rendering it a valuable asset for categorizing malevolent code.展开更多
Automatic road damage detection using image processing is an important aspect of road maintenance.It is also a challenging problem due to the inhomogeneity of road damage and complicated background in the road images....Automatic road damage detection using image processing is an important aspect of road maintenance.It is also a challenging problem due to the inhomogeneity of road damage and complicated background in the road images.In recent years,deep convolutional neural network based methods have been used to address the challenges of road damage detection and classification.In this paper,we propose a new approach to address those challenges.This approach uses densely connected convolution networks as the backbone of the Mask R-CNN to effectively extract image feature,a feature pyramid network for combining multiple scales features,a region proposal network to generate the road damage region,and a fully convolutional neural network to classify the road damage region and refine the region bounding box.This method can not only detect and classify the road damage,but also create a mask of the road damage.Experimental results show that the proposed approach can achieve better results compared with other existing methods.展开更多
The end-to-end separation algorithm with superior performance in the field of speech separation has not been effectively used in music separation.Moreover,since music signals are often dual channel data with a high sa...The end-to-end separation algorithm with superior performance in the field of speech separation has not been effectively used in music separation.Moreover,since music signals are often dual channel data with a high sampling rate,how to model longsequence data and make rational use of the relevant information between channels is also an urgent problem to be solved.In order to solve the above problems,the performance of the end-to-end music separation algorithm is enhanced by improving the network structure.Our main contributions include the following:(1)A more reasonable densely connected U-Net is designed to capture the long-term characteristics of music,such as main melody,tone and so on.(2)On this basis,the multi-head attention and dualpath transformer are introduced in the separation module.Channel attention units are applied recursively on the feature map of each layer of the network,enabling the network to perform long-sequence separation.Experimental results show that after the introduction of the channel attention,the performance of the proposed algorithm has a stable improvement compared with the baseline system.On the MUSDB18 dataset,the average score of the separated audio exceeds that of the current best-performing music separation algorithm based on the time-frequency domain(T-F domain).展开更多
BACKGROUND The nature of input data is an essential factor when training neural networks.Research concerning magnetic resonance imaging(MRI)-based diagnosis of liver tumors using deep learning has been rapidly advanci...BACKGROUND The nature of input data is an essential factor when training neural networks.Research concerning magnetic resonance imaging(MRI)-based diagnosis of liver tumors using deep learning has been rapidly advancing.Still,evidence to support the utilization of multi-dimensional and multi-parametric image data is lacking.Due to higher information content,three-dimensional input should presumably result in higher classification precision.Also,the differentiation between focal liver lesions(FLLs)can only be plausible with simultaneous analysis of multisequence MRI images.AIM To compare diagnostic efficiency of two-dimensional(2D)and three-dimensional(3D)-densely connected convolutional neural networks(DenseNet)for FLLs on multi-sequence MRI.METHODS We retrospectively collected T2-weighted,gadoxetate disodium-enhanced arterial phase,portal venous phase,and hepatobiliary phase MRI scans from patients with focal nodular hyperplasia(FNH),hepatocellular carcinomas(HCC)or liver metastases(MET).Our search identified 71 FNH,69 HCC and 76 MET.After volume registration,the same three most representative axial slices from all sequences were combined into four-channel images to train the 2D-DenseNet264 network.Identical bounding boxes were selected on all scans and stacked into 4D volumes to train the 3D-DenseNet264 model.The test set consisted of 10-10-10 tumors.The performance of the models was compared using area under the receiver operating characteristic curve(AUROC),specificity,sensitivity,positive predictive values(PPV),negative predictive values(NPV),and f1 scores.RESULTS The average AUC value of the 2D model(0.98)was slightly higher than that of the 3D model(0.94).Mean PPV,sensitivity,NPV,specificity and f1 scores(0.94,0.93,0.97,0.97,and 0.93)of the 2D model were also superior to metrics of the 3D model(0.84,0.83,0.92,0.92,and 0.83).The classification metrics of FNH were 0.91,1.00,1.00,0.95,and 0.95 using the 2D and 0.90,0.90,0.95,0.95,and 0.90 using the 3D models.The 2D and 3D networks'performance in the diagnosis of HCC were 1.00,0.80,0.91,1.00,and 0.89 and 0.88,0.70,0.86,0.95,and 0.78,respectively;while the evaluation of MET lesions resulted in 0.91,1.00,1.00,0.95,and 0.95 and 0.75,0.90,0.94,0.85,and 0.82 using the 2D and 3D networks,respectively.CONCLUSION Both 2D and 3D-DenseNets can differentiate FNH,HCC and MET with good accuracy when trained on hepatocyte-specific contrast-enhanced multi-sequence MRI volumes.展开更多
Aiming at the problem of radar base and ground observation stations on the Tibet is sparsely distributed and cannot achieve large-scale precipitation monitoring.U-Net,an advanced machine learning(ML)method,is used to ...Aiming at the problem of radar base and ground observation stations on the Tibet is sparsely distributed and cannot achieve large-scale precipitation monitoring.U-Net,an advanced machine learning(ML)method,is used to develop a robust and rapid algorithm for precipitating cloud detection based on the new-generation geostationary satellite of FengYun-4A(FY-4A).First,in this algorithm,the real-time multi-band infrared brightness temperature from FY-4A combined with the data of Digital Elevation Model(DEM)has been used as predictor variables for our model.Second,the efficiency of the feature was improved by changing the traditional convolution layer serial connection method of U-Net to residual mapping.Then,in order to solve the problem of the network that would produce semantic differences when directly concentrated with low-level and high-level features,we use dense skip pathways to reuse feature maps of different layers as inputs for concatenate neural networks feature layers from different depths.Finally,according to the characteristics of precipitation clouds,the pooling layer of U-Net was replaced by a convolution operation to realize the detection of small precipitation clouds.It was experimentally concluded that the Pixel Accuracy(PA)and Mean Intersection over Union(MIoU)of the improved U-Net on the test set could reach 0.916 and 0.928,the detection of precipitation clouds over Tibet were well actualized.展开更多
This work focuses on a novel lightweight machine learning approach to the task of plant disease classification,posing as a core component of a larger grow-light smart monitoring system.To the extent of our knowledge,t...This work focuses on a novel lightweight machine learning approach to the task of plant disease classification,posing as a core component of a larger grow-light smart monitoring system.To the extent of our knowledge,this work is the first to implement lightweight convolutional neural network architectures leveraging down-scaled versions of inception blocks,residual connections,and dense residual connections applied without pre-training to the PlantVillage dataset.The novel contributions of this work include the proposal of a smart monitor-ing framework outline;responsible for detection and classification of ailments via the devised lightweight net-works as well as interfacing with LED grow-light fixtures to optimize environmental parameters and lighting control for the growth of plants in a greenhouse system.Lightweight adaptation of dense residual connections achieved the best balance of minimizing model parameters and maximizing performance metrics with accuracy,precision,recall,and F1-scores of 96.75%,97.62%,97.59%,and 97.58%respectively,while consisting of only 228,479 model parameters.These results are further compared against various full-scale state-of-the-art model architectures trained on the PlantVillage dataset,of which the proposed down-scaled lightweight models were capable of performing equally to,if not better than many large-scale counterparts with drastically less com-putational requirements.展开更多
In the model of the vehicle recognition algorithm implemented by the convolutional neural network,the model needs to compute and store a lot of parameters.Too many parameters occupy a lot of computational resources ma...In the model of the vehicle recognition algorithm implemented by the convolutional neural network,the model needs to compute and store a lot of parameters.Too many parameters occupy a lot of computational resources making it difficult to run on computers with poor performance.Therefore,obtaining more efficient feature information of target image or video with better accuracy on computers with limited arithmetic power becomes the main goal of this research.In this paper,a lightweight densely connected,and deeply separable convolutional network(DCDSNet)algorithmis proposed to achieve this goal.Visual Geometry Group(VGG)model is improved by utilizing the convolution instead of the fully connected module,the deeply separable convolution module,and the densely connected network module,with the first two modules reducing the parameters and the third module allowing the algorithm to have more features in a limited number of parameters.The algorithm achieves better results in the mine vehicle recognition dataset.Experiments show that the recognition accuracy is improved by 4.41% compared to VGG19 and the amount of parameters is reduced by 71% compared to VGG19.展开更多
The poor quality of images recorded in low-light environments affects their further applications.To improve the visibility of low-light images,we propose a recurrent network based on filter-cluster attention(FCA),the ...The poor quality of images recorded in low-light environments affects their further applications.To improve the visibility of low-light images,we propose a recurrent network based on filter-cluster attention(FCA),the main body of which consists of three units:difference concern,gate recurrent,and iterative residual.The network performs multi-stage recursive learning on low-light images,and then extracts deeper feature information.To compute more accurate dependence,we design a novel FCA that focuses on the saliency of feature channels.FCA and self-attention are used to highlight the low-light regions and important channels of the feature.We also design a dense connection pyramid(DenCP)to extract the color features of the low-light inversion image,to compensate for the loss of the image's color information.Experimental results on six public datasets show that our method has outstanding performance in subjective and quantitative comparisons.展开更多
Due to the illumination,complex background,and occlusion of the litchi fruits,the accurate detection of litchi in the field is extremely challenging.In order to solve the problem of the low recognition rate of litchi-...Due to the illumination,complex background,and occlusion of the litchi fruits,the accurate detection of litchi in the field is extremely challenging.In order to solve the problem of the low recognition rate of litchi-picking robots in field conditions,this study was inspired by the ideas of ResNet and dense convolution and proposed an improved feature-extraction network model named“YOLOv3_Litchi”,combining dense connections and residuals for the detection of litchis.Firstly,based on the traditional YOLOv3 deep convolution neural network and regression detection,the idea of residuals was to be put into the feature-extraction network to effectively avoid the problem of decreasing detection accuracy due to the excessive depths of the network layers.Secondly,under the premise of a good receptive field and high detection accuracy,the large convolution kernel was replaced by a small convolution kernel in the shallow layer of the network,thereby effectively reducing the model parameters.Finally,the idea of feature pyramid was used to design the network to identify the small target litchi to ensure that the shallow features were not lost and simultaneously reduced the model parameters.Experimental results show that the improved YOLOv3_Litchi model achieved better results than the classic YOLOv3_DarkNet-53 model and the YOLOv3_Tiny model.The mean average precision(mAP)score was 97.07%,which was higher than the 95.18%mAP of the YOLOv3_DarkNet-53 model and the 94.48%mAP of the YOLOv3_Tiny model.The frame frequency was 58 fps,which was higher than 29 fps of the YOLOv3_DarkNet-53 model.Compared with the classic Faster R-CNN model with the feature-extraction network VGG16,the mAP was increased by 1%,and the FPS advantage was obvious.Compared with the classic single shot multibox detector(SSD)model,both the accuracy and the running efficiency were improved.The results show that the improved YOLOv3_Litchi model had stronger robustness,higher detection accuracy,and less computational complexity for the identification of litchi in the field conditions,which should be helpful for litchi orchard precision management.展开更多
基金This research is partially supported by grant from the National Natural Science Foundation of China(No.72071019)grant from the Natural Science Foundation of Chongqing(No.cstc2021jcyj-msxmX0185)grant from the Chongqing Graduate Education and Teaching Reform Research Project(No.yjg193096).
文摘Bone age assessment(BAA)helps doctors determine how a child’s bones grow and develop in clinical medicine.Traditional BAA methods rely on clinician expertise,leading to time-consuming predictions and inaccurate results.Most deep learning-based BAA methods feed the extracted critical points of images into the network by providing additional annotations.This operation is costly and subjective.To address these problems,we propose a multi-scale attentional densely connected network(MSADCN)in this paper.MSADCN constructs a multi-scale dense connectivity mechanism,which can avoid overfitting,obtain the local features effectively and prevent gradient vanishing even in limited training data.First,MSADCN designs multi-scale structures in the densely connected network to extract fine-grained features at different scales.Then,coordinate attention is embedded to focus on critical features and automatically locate the regions of interest(ROI)without additional annotation.In addition,to improve the model’s generalization,transfer learning is applied to train the proposed MSADCN on the public dataset IMDB-WIKI,and the obtained pre-trained weights are loaded onto the Radiological Society of North America(RSNA)dataset.Finally,label distribution learning(LDL)and expectation regression techniques are introduced into our model to exploit the correlation between hand bone images of different ages,which can obtain stable age estimates.Extensive experiments confirm that our model can converge more efficiently and obtain a mean absolute error(MAE)of 4.64 months,outperforming some state-of-the-art BAA methods.
基金funded by National Natural Science Foundation of China(under Grant No.61905201)。
文摘The field of finance heavily relies on cybersecurity to safeguard its systems and clients from harmful software.The identification of malevolent code within financial software is vital for protecting both the financial system and individual clients.Nevertheless,present detection models encounter limitations in their ability to identify malevolent code and its variations,all while encompassing a multitude of parameters.To overcome these obsta-cles,we introduce a lean model for classifying families of malevolent code,formulated on Ghost-DenseNet-SE.This model integrates the Ghost module,DenseNet,and the squeeze-and-excitation(SE)channel domain attention mechanism.It substitutes the standard convolutional layer in DenseNet with the Ghost module,thereby diminishing the model’s size and augmenting recognition speed.Additionally,the channel domain attention mechanism assigns distinctive weights to feature channels,facilitating the extraction of pivotal characteristics of malevolent code and bolstering detection precision.Experimental outcomes on the Malimg dataset indicate that the model attained an accuracy of 99.14%in discerning families of malevolent code,surpassing AlexNet(97.8%)and The visual geometry group network(VGGNet)(96.16%).The proposed model exhibits reduced parameters,leading to decreased model complexity alongside enhanced classification accuracy,rendering it a valuable asset for categorizing malevolent code.
基金supported by the School Doctoral Fund of Zhengzhou University of Light Industry No.2015BSJJ051.
文摘Automatic road damage detection using image processing is an important aspect of road maintenance.It is also a challenging problem due to the inhomogeneity of road damage and complicated background in the road images.In recent years,deep convolutional neural network based methods have been used to address the challenges of road damage detection and classification.In this paper,we propose a new approach to address those challenges.This approach uses densely connected convolution networks as the backbone of the Mask R-CNN to effectively extract image feature,a feature pyramid network for combining multiple scales features,a region proposal network to generate the road damage region,and a fully convolutional neural network to classify the road damage region and refine the region bounding box.This method can not only detect and classify the road damage,but also create a mask of the road damage.Experimental results show that the proposed approach can achieve better results compared with other existing methods.
基金National Natural Science Foundation of China,Grant/Award Number:62071039Beijing Natural Science Foundation,Grant/Award Number:L223033。
文摘The end-to-end separation algorithm with superior performance in the field of speech separation has not been effectively used in music separation.Moreover,since music signals are often dual channel data with a high sampling rate,how to model longsequence data and make rational use of the relevant information between channels is also an urgent problem to be solved.In order to solve the above problems,the performance of the end-to-end music separation algorithm is enhanced by improving the network structure.Our main contributions include the following:(1)A more reasonable densely connected U-Net is designed to capture the long-term characteristics of music,such as main melody,tone and so on.(2)On this basis,the multi-head attention and dualpath transformer are introduced in the separation module.Channel attention units are applied recursively on the feature map of each layer of the network,enabling the network to perform long-sequence separation.Experimental results show that after the introduction of the channel attention,the performance of the proposed algorithm has a stable improvement compared with the baseline system.On the MUSDB18 dataset,the average score of the separated audio exceeds that of the current best-performing music separation algorithm based on the time-frequency domain(T-F domain).
文摘BACKGROUND The nature of input data is an essential factor when training neural networks.Research concerning magnetic resonance imaging(MRI)-based diagnosis of liver tumors using deep learning has been rapidly advancing.Still,evidence to support the utilization of multi-dimensional and multi-parametric image data is lacking.Due to higher information content,three-dimensional input should presumably result in higher classification precision.Also,the differentiation between focal liver lesions(FLLs)can only be plausible with simultaneous analysis of multisequence MRI images.AIM To compare diagnostic efficiency of two-dimensional(2D)and three-dimensional(3D)-densely connected convolutional neural networks(DenseNet)for FLLs on multi-sequence MRI.METHODS We retrospectively collected T2-weighted,gadoxetate disodium-enhanced arterial phase,portal venous phase,and hepatobiliary phase MRI scans from patients with focal nodular hyperplasia(FNH),hepatocellular carcinomas(HCC)or liver metastases(MET).Our search identified 71 FNH,69 HCC and 76 MET.After volume registration,the same three most representative axial slices from all sequences were combined into four-channel images to train the 2D-DenseNet264 network.Identical bounding boxes were selected on all scans and stacked into 4D volumes to train the 3D-DenseNet264 model.The test set consisted of 10-10-10 tumors.The performance of the models was compared using area under the receiver operating characteristic curve(AUROC),specificity,sensitivity,positive predictive values(PPV),negative predictive values(NPV),and f1 scores.RESULTS The average AUC value of the 2D model(0.98)was slightly higher than that of the 3D model(0.94).Mean PPV,sensitivity,NPV,specificity and f1 scores(0.94,0.93,0.97,0.97,and 0.93)of the 2D model were also superior to metrics of the 3D model(0.84,0.83,0.92,0.92,and 0.83).The classification metrics of FNH were 0.91,1.00,1.00,0.95,and 0.95 using the 2D and 0.90,0.90,0.95,0.95,and 0.90 using the 3D models.The 2D and 3D networks'performance in the diagnosis of HCC were 1.00,0.80,0.91,1.00,and 0.89 and 0.88,0.70,0.86,0.95,and 0.78,respectively;while the evaluation of MET lesions resulted in 0.91,1.00,1.00,0.95,and 0.95 and 0.75,0.90,0.94,0.85,and 0.82 using the 2D and 3D networks,respectively.CONCLUSION Both 2D and 3D-DenseNets can differentiate FNH,HCC and MET with good accuracy when trained on hepatocyte-specific contrast-enhanced multi-sequence MRI volumes.
基金The authors would like to acknowledge the financial support from the National Science Foundation of China(Grant No.41875027).
文摘Aiming at the problem of radar base and ground observation stations on the Tibet is sparsely distributed and cannot achieve large-scale precipitation monitoring.U-Net,an advanced machine learning(ML)method,is used to develop a robust and rapid algorithm for precipitating cloud detection based on the new-generation geostationary satellite of FengYun-4A(FY-4A).First,in this algorithm,the real-time multi-band infrared brightness temperature from FY-4A combined with the data of Digital Elevation Model(DEM)has been used as predictor variables for our model.Second,the efficiency of the feature was improved by changing the traditional convolution layer serial connection method of U-Net to residual mapping.Then,in order to solve the problem of the network that would produce semantic differences when directly concentrated with low-level and high-level features,we use dense skip pathways to reuse feature maps of different layers as inputs for concatenate neural networks feature layers from different depths.Finally,according to the characteristics of precipitation clouds,the pooling layer of U-Net was replaced by a convolution operation to realize the detection of small precipitation clouds.It was experimentally concluded that the Pixel Accuracy(PA)and Mean Intersection over Union(MIoU)of the improved U-Net on the test set could reach 0.916 and 0.928,the detection of precipitation clouds over Tibet were well actualized.
文摘This work focuses on a novel lightweight machine learning approach to the task of plant disease classification,posing as a core component of a larger grow-light smart monitoring system.To the extent of our knowledge,this work is the first to implement lightweight convolutional neural network architectures leveraging down-scaled versions of inception blocks,residual connections,and dense residual connections applied without pre-training to the PlantVillage dataset.The novel contributions of this work include the proposal of a smart monitor-ing framework outline;responsible for detection and classification of ailments via the devised lightweight net-works as well as interfacing with LED grow-light fixtures to optimize environmental parameters and lighting control for the growth of plants in a greenhouse system.Lightweight adaptation of dense residual connections achieved the best balance of minimizing model parameters and maximizing performance metrics with accuracy,precision,recall,and F1-scores of 96.75%,97.62%,97.59%,and 97.58%respectively,while consisting of only 228,479 model parameters.These results are further compared against various full-scale state-of-the-art model architectures trained on the PlantVillage dataset,of which the proposed down-scaled lightweight models were capable of performing equally to,if not better than many large-scale counterparts with drastically less com-putational requirements.
基金supported by the open project of National Local Joint Engineering Research Center for Agro-Ecological Big Data Analysis and Application Technology,“Adaptive Agricultural Machinery Motion Detection and Recognition in Natural Scenes”,AE202210By the school-level key discipline of Suzhou University in China with No.2019xjzdxk12022 Anhui Province College Research Program Project of the Suzhou Vocational College of Civil Aviation,No.2022AH053155.
文摘In the model of the vehicle recognition algorithm implemented by the convolutional neural network,the model needs to compute and store a lot of parameters.Too many parameters occupy a lot of computational resources making it difficult to run on computers with poor performance.Therefore,obtaining more efficient feature information of target image or video with better accuracy on computers with limited arithmetic power becomes the main goal of this research.In this paper,a lightweight densely connected,and deeply separable convolutional network(DCDSNet)algorithmis proposed to achieve this goal.Visual Geometry Group(VGG)model is improved by utilizing the convolution instead of the fully connected module,the deeply separable convolution module,and the densely connected network module,with the first two modules reducing the parameters and the third module allowing the algorithm to have more features in a limited number of parameters.The algorithm achieves better results in the mine vehicle recognition dataset.Experiments show that the recognition accuracy is improved by 4.41% compared to VGG19 and the amount of parameters is reduced by 71% compared to VGG19.
基金Project supported by the National Natural Science Foundation of China(Nos.61772319,62002200,and 62202268)the Shandong Natural Science Foundation of China(Nos.ZR2021QF134and ZR2021MF107)+1 种基金the Shandong Provincial Science and Technology Support Program for Youth Innovation Team in Colleges(Nos.2021KJ069 and 2019KJN042)the Yantai Science and Technology Innovation Development Plan(No.2022JCYJ031)。
文摘The poor quality of images recorded in low-light environments affects their further applications.To improve the visibility of low-light images,we propose a recurrent network based on filter-cluster attention(FCA),the main body of which consists of three units:difference concern,gate recurrent,and iterative residual.The network performs multi-stage recursive learning on low-light images,and then extracts deeper feature information.To compute more accurate dependence,we design a novel FCA that focuses on the saliency of feature channels.FCA and self-attention are used to highlight the low-light regions and important channels of the feature.We also design a dense connection pyramid(DenCP)to extract the color features of the low-light inversion image,to compensate for the loss of the image's color information.Experimental results on six public datasets show that our method has outstanding performance in subjective and quantitative comparisons.
基金This work was financially supported by the National Natural Science Foundation of China(Grant No.32071912,No.61863011,No.31701325,No.31571568,No.31570180)the Guangzhou Science and Technology Project(Grant No.202002020016,No.202102080337)+4 种基金the Natural Science Foundation of Guangdong Province(Grant No.2018A030313330,No.2020A1515010793)the Second Batch of Industry-Education Cooperation Collaborative Projects in 2019,Ministry of Education(Grant No.201902062040)the Guangzhou Key Laboratory of Intelligent Agriculture(Grant No.201902010081)the Project of Rural Revitalization Strategy in Guangdong Province(Grant No.2020KJ261)the Applied Science and Technology Special Fund Project,Meizhou,China(Grant No.2019B0201005).
文摘Due to the illumination,complex background,and occlusion of the litchi fruits,the accurate detection of litchi in the field is extremely challenging.In order to solve the problem of the low recognition rate of litchi-picking robots in field conditions,this study was inspired by the ideas of ResNet and dense convolution and proposed an improved feature-extraction network model named“YOLOv3_Litchi”,combining dense connections and residuals for the detection of litchis.Firstly,based on the traditional YOLOv3 deep convolution neural network and regression detection,the idea of residuals was to be put into the feature-extraction network to effectively avoid the problem of decreasing detection accuracy due to the excessive depths of the network layers.Secondly,under the premise of a good receptive field and high detection accuracy,the large convolution kernel was replaced by a small convolution kernel in the shallow layer of the network,thereby effectively reducing the model parameters.Finally,the idea of feature pyramid was used to design the network to identify the small target litchi to ensure that the shallow features were not lost and simultaneously reduced the model parameters.Experimental results show that the improved YOLOv3_Litchi model achieved better results than the classic YOLOv3_DarkNet-53 model and the YOLOv3_Tiny model.The mean average precision(mAP)score was 97.07%,which was higher than the 95.18%mAP of the YOLOv3_DarkNet-53 model and the 94.48%mAP of the YOLOv3_Tiny model.The frame frequency was 58 fps,which was higher than 29 fps of the YOLOv3_DarkNet-53 model.Compared with the classic Faster R-CNN model with the feature-extraction network VGG16,the mAP was increased by 1%,and the FPS advantage was obvious.Compared with the classic single shot multibox detector(SSD)model,both the accuracy and the running efficiency were improved.The results show that the improved YOLOv3_Litchi model had stronger robustness,higher detection accuracy,and less computational complexity for the identification of litchi in the field conditions,which should be helpful for litchi orchard precision management.