Segmenting dark-field images of laser-induced damage on large-aperture optics in high-power laser facilities is challenged by complicated damage morphology, uneven illumination and stray light interference. Fully supe...Segmenting dark-field images of laser-induced damage on large-aperture optics in high-power laser facilities is challenged by complicated damage morphology, uneven illumination and stray light interference. Fully supervised semantic segmentation algorithms have achieved state-of-the-art performance but rely on a large number of pixel-level labels, which are time-consuming and labor-consuming to produce. LayerCAM, an advanced weakly supervised semantic segmentation algorithm, can generate pixel-accurate results using only image-level labels, but its scattered and partially underactivated class activation regions degrade segmentation performance. In this paper, we propose a weakly supervised semantic segmentation method, continuous gradient class activation mapping(CAM) and its nonlinear multiscale fusion(continuous gradient fusion CAM). The method redesigns backpropagating gradients and nonlinearly activates multiscale fused heatmaps to generate more fine-grained class activation maps with an appropriate activation degree for different damage site sizes. Experiments on our dataset show that the proposed method can achieve segmentation performance comparable to that of fully supervised algorithms.展开更多
Recently,convolutional neural network(CNN)-based visual inspec-tion has been developed to detect defects on building surfaces automatically.The CNN model demonstrates remarkable accuracy in image data analysis;however...Recently,convolutional neural network(CNN)-based visual inspec-tion has been developed to detect defects on building surfaces automatically.The CNN model demonstrates remarkable accuracy in image data analysis;however,the predicted results have uncertainty in providing accurate informa-tion to users because of the“black box”problem in the deep learning model.Therefore,this study proposes a visual explanation method to overcome the uncertainty limitation of CNN-based defect identification.The visual repre-sentative gradient-weights class activation mapping(Grad-CAM)method is adopted to provide visually explainable information.A visualizing evaluation index is proposed to quantitatively analyze visual representations;this index reflects a rough estimate of the concordance rate between the visualized heat map and intended defects.In addition,an ablation study,adopting three-branch combinations with the VGG16,is implemented to identify perfor-mance variations by visualizing predicted results.Experiments reveal that the proposed model,combined with hybrid pooling,batch normalization,and multi-attention modules,achieves the best performance with an accuracy of 97.77%,corresponding to an improvement of 2.49%compared with the baseline model.Consequently,this study demonstrates that reliable results from an automatic defect classification model can be provided to an inspector through the visual representation of the predicted results using CNN models.展开更多
COVID-19 is a growing problem worldwide with a high mortality rate.As a result,the World Health Organization(WHO)declared it a pandemic.In order to limit the spread of the disease,a fast and accurate diagnosis is requ...COVID-19 is a growing problem worldwide with a high mortality rate.As a result,the World Health Organization(WHO)declared it a pandemic.In order to limit the spread of the disease,a fast and accurate diagnosis is required.A reverse transcript polymerase chain reaction(RT-PCR)test is often used to detect the disease.However,since this test is time-consuming,a chest computed tomography(CT)or plain chest X-ray(CXR)is sometimes indicated.The value of automated diagnosis is that it saves time and money by minimizing human effort.Three significant contributions are made by our research.Its initial purpose is to use the essential finetuning methodology to test the action and efficiency of a variety of vision models,ranging from Inception to Neural Architecture Search(NAS)networks.Second,by plotting class activationmaps(CAMs)for individual networks and assessing classification efficiency with AUC-ROC curves,the behavior of these models is visually analyzed.Finally,stacked ensembles techniques were used to provide greater generalization by combining finetuned models with six ensemble neural networks.Using stacked ensembles,the generalization of the models improved.Furthermore,the ensemble model created by combining all of the finetuned networks obtained a state-of-the-art COVID-19 accuracy detection score of 99.17%.The precision and recall rates were 99.99%and 89.79%,respectively,highlighting the robustness of stacked ensembles.The proposed ensemble approach performed well in the classification of the COVID-19 lesions on CXR according to the experimental results.展开更多
In the field of medical images,pixel-level labels are time-consuming and expensive to acquire,while image-level labels are relatively easier to obtain.Therefore,it makes sense to learn more information(knowledge)from ...In the field of medical images,pixel-level labels are time-consuming and expensive to acquire,while image-level labels are relatively easier to obtain.Therefore,it makes sense to learn more information(knowledge)from a small number of hard-to-get pixel-level annotated images to apply to different tasks to maximize their usefulness and save time and training costs.In this paper,using Pixel-Level Labeled Images forMulti-Task Learning(PLDMLT),we focus on grading the severity of fundus images for Diabetic Retinopathy(DR).This is because,for the segmentation task,there is a finely labeled mask,while the severity grading task is without classification labels.To this end,we propose a two-stage multi-label learning weakly supervised algorithm,which generates initial classification pseudo labels in the first stage and visualizes heat maps at all levels of severity using Grad-Cam to further provide medical interpretability for the classification task.A multitask model framework with U-net as the baseline is proposed in the second stage.A label update network is designed to alleviate the gradient balance between the classification and segmentation tasks.Extensive experimental results show that our PLDMLTmethod significantly outperforms other stateof-the-art methods in DR segmentation on two public datasets,achieving up to 98.897%segmentation accuracy.In addition,our method achieves comparable competitiveness with single-task fully supervised learning in the DR severity grading task.展开更多
The most salient argument that needs to be addressed universally is Early Breast Cancer Detection(EBCD),which helps people live longer lives.The Computer-Aided Detection(CADs)/Computer-Aided Diagnosis(CADx)sys-tem is ...The most salient argument that needs to be addressed universally is Early Breast Cancer Detection(EBCD),which helps people live longer lives.The Computer-Aided Detection(CADs)/Computer-Aided Diagnosis(CADx)sys-tem is indeed a software automation tool developed to assist the health profes-sions in Breast Cancer Detection and Diagnosis(BCDD)and minimise mortality by the use of medical histopathological image classification in much less time.This paper purposes of examining the accuracy of the Convolutional Neural Network(CNN),which can be used to perceive breast malignancies for initial breast cancer detection to determine which strategy is efficient for the early iden-tification of breast cell malignancies formation of masses and Breast microcalci-fications on the mammogram.When we have insufficient data for a new domain that is desired to be handled by a pre-trained Convolutional Neural Network of Residual Network(ResNet50)for Breast Cancer Detection and Diagnosis,to obtain the Discriminative Localization,Convolutional Neural Network with Class Activation Map(CAM)has also been used to perform breast microcalcifications detection tofind a specific class in the Histopathological image.The test results indicate that this method performed almost 225.15%better at determining the exact location of disease(Discriminative Localization)through breast microcalci-fications images.ResNet50 seems to have the highest level of accuracy for images of Benign Tumour(BT)/Malignant Tumour(MT)cases at 97.11%.ResNet50’s average accuracy for pre-trained Convolutional Neural Network is 94.17%.展开更多
This paper proposes an accurate,efficient and explainable method for the classification of the surrounding rock based on a convolutional neural network(CNN).The state-of-the-art robust CNN model(EfficientNet)is applie...This paper proposes an accurate,efficient and explainable method for the classification of the surrounding rock based on a convolutional neural network(CNN).The state-of-the-art robust CNN model(EfficientNet)is applied to tunnel wall image recognition.Gaussian filtering,data augmentation and other data pre-processing techniques are used to improve the data quality and quantity.Combined with transfer learning,the generality,accuracy and efficiency of the deep learning(DL)model are further improved,and finally we achieve 89.96%accuracy.Compared with other state-of-the-art CNN architectures,such as ResNet and Inception-ResNet-V2(IRV2),the presented deep transfer learning model is more stable,accurate and efficient.To reveal the rock classification mechanism of the proposed model,Gradient-weight Class Activation Map(Grad-CAM)visualizations are integrated into the model to enable its explainability and accountability.The developed deep transfer learning model has been applied to support the tunneling of the Xingyi City Bypass in the high mountain area of Guizhou,China,with great results.展开更多
Scene recognition is a fundamental task in computer vision,which generally includes three vital stages,namely feature extraction,feature transformation and classification.Early research mainly focuses on feature extra...Scene recognition is a fundamental task in computer vision,which generally includes three vital stages,namely feature extraction,feature transformation and classification.Early research mainly focuses on feature extraction,but with the rise of Convolutional Neural Networks(CNNs),more and more feature transformation methods are proposed based on CNN features.In this work,a novel feature transformation algorithm called Graph Encoded Local Discriminative Region Representation(GEDRR)is proposed to find discriminative local representations for scene images and explore the relationship between the discriminative regions.In addition,we propose a method using the multi-head attention module to enhance and fuse convolutional feature maps.Combining the two methods and the global representation,a scene recognition framework called Global and Graph Encoded Local Discriminative Region Representation(G2ELDR2)is proposed.The experimental results on three scene datasets demonstrate the effectiveness of our model,which outperforms many state-of-the-arts.展开更多
Modern leather industries are focused on producing high quality leather products for sustaining the market com-petitiveness. However, various leather defects are introduced during various stages of manufacturing proce...Modern leather industries are focused on producing high quality leather products for sustaining the market com-petitiveness. However, various leather defects are introduced during various stages of manufacturing process such as material handling, tanning and dyeing. Manual inspection of leather surfaces is subjective and inconsistent in nature;hence machine vision systems have been widely adopted for the automated inspection of leather defects. It is neces-sary develop suitable image processing algorithms for localize leather defects such as folding marks, growth marks, grain off, loose grain, and pinhole due to the ambiguous texture pattern and tiny nature in the localized regions of the leather. This paper presents deep learning neural network-based approach for automatic localization and classifica-tion of leather defects using a machine vision system. In this work, popular convolutional neural networks are trained using leather images of different leather defects and a class activation mapping technique is followed to locate the region of interest for the class of leather defect. Convolution neural networks such as Google net, Squeeze-net, RestNet are found to provide better accuracy of classification as compared with the state-of-the-art neural network architectures and the results are presented.展开更多
Facial emotion recognition is an essential and important aspect of the field of human-machine interaction.Past research on facial emotion recognition focuses on the laboratory environment.However,it faces many challen...Facial emotion recognition is an essential and important aspect of the field of human-machine interaction.Past research on facial emotion recognition focuses on the laboratory environment.However,it faces many challenges in real-world conditions,i.e.,illumination changes,large pose variations and partial or full occlusions.Those challenges lead to different face areas with different degrees of sharpness and completeness.Inspired by this fact,we focus on the authenticity of predictions generated by different<emotion,region>pairs.For example,if only the mouth areas are available and the emotion classifier predicts happiness,then there is a question of how to judge the authenticity of predictions.This problem can be converted into the contribution of different face areas to different emotions.In this paper,we divide the whole face into six areas:nose areas,mouth areas,eyes areas,nose to mouth areas,nose to eyes areas and mouth to eyes areas.To obtain more convincing results,our experiments are conducted on three different databases:facial expression recognition+(FER+),real-world affective faces database(RAF-DB)and expression in-the-wild(ExpW)dataset.Through analysis of the classification accuracy,the confusion matrix and the class activation map(CAM),we can establish convincing results.To sum up,the contributions of this paper lie in two areas:1)We visualize concerned areas of human faces in emotion recognition;2)We analyze the contribution of different face areas to different emotions in real-world conditions through experimental analysis.Our findings can be combined with findings in psychology to promote the understanding of emotional expressions.展开更多
Current methods for radar target detection usually work on the basis of high signal-to-clutter ratios.In this paper we propose a novel convolutional neural network based dual-activated clutter suppression algorithm,to...Current methods for radar target detection usually work on the basis of high signal-to-clutter ratios.In this paper we propose a novel convolutional neural network based dual-activated clutter suppression algorithm,to solve the problem caused by low signal-to-clutter ratios in actual situations on the sea surface.Dual activation has two steps.First,we multiply the activated weights of the last dense layer with the activated feature maps from the upsample layer.Through this,we can obtain the class activation maps(CAMs),which correspond to the positive region of the sea clutter.Second,we obtain the suppression coefficients by mapping the CAM inversely to the sea clutter spectrum.Then,we obtain the activated range-Doppler maps by multiplying the coefficients with the raw range-Doppler maps.In addition,we propose a sampling-based data augmentation method and an effective multiclass coding method to improve the prediction accuracy.Measurement on real datasets verified the effectiveness of the proposed method.展开更多
文摘Segmenting dark-field images of laser-induced damage on large-aperture optics in high-power laser facilities is challenged by complicated damage morphology, uneven illumination and stray light interference. Fully supervised semantic segmentation algorithms have achieved state-of-the-art performance but rely on a large number of pixel-level labels, which are time-consuming and labor-consuming to produce. LayerCAM, an advanced weakly supervised semantic segmentation algorithm, can generate pixel-accurate results using only image-level labels, but its scattered and partially underactivated class activation regions degrade segmentation performance. In this paper, we propose a weakly supervised semantic segmentation method, continuous gradient class activation mapping(CAM) and its nonlinear multiscale fusion(continuous gradient fusion CAM). The method redesigns backpropagating gradients and nonlinearly activates multiscale fused heatmaps to generate more fine-grained class activation maps with an appropriate activation degree for different damage site sizes. Experiments on our dataset show that the proposed method can achieve segmentation performance comparable to that of fully supervised algorithms.
基金supported by a Korea Agency for Infrastructure Technology Advancement(KAIA)grant funded by the Ministry of Land,Infrastructure,and Transport(Grant 22CTAP-C163951-02).
文摘Recently,convolutional neural network(CNN)-based visual inspec-tion has been developed to detect defects on building surfaces automatically.The CNN model demonstrates remarkable accuracy in image data analysis;however,the predicted results have uncertainty in providing accurate informa-tion to users because of the“black box”problem in the deep learning model.Therefore,this study proposes a visual explanation method to overcome the uncertainty limitation of CNN-based defect identification.The visual repre-sentative gradient-weights class activation mapping(Grad-CAM)method is adopted to provide visually explainable information.A visualizing evaluation index is proposed to quantitatively analyze visual representations;this index reflects a rough estimate of the concordance rate between the visualized heat map and intended defects.In addition,an ablation study,adopting three-branch combinations with the VGG16,is implemented to identify perfor-mance variations by visualizing predicted results.Experiments reveal that the proposed model,combined with hybrid pooling,batch normalization,and multi-attention modules,achieves the best performance with an accuracy of 97.77%,corresponding to an improvement of 2.49%compared with the baseline model.Consequently,this study demonstrates that reliable results from an automatic defect classification model can be provided to an inspector through the visual representation of the predicted results using CNN models.
基金The research is funded by the Researchers Supporting Project at King Saud University,(Project#RSP-2021/305).
文摘COVID-19 is a growing problem worldwide with a high mortality rate.As a result,the World Health Organization(WHO)declared it a pandemic.In order to limit the spread of the disease,a fast and accurate diagnosis is required.A reverse transcript polymerase chain reaction(RT-PCR)test is often used to detect the disease.However,since this test is time-consuming,a chest computed tomography(CT)or plain chest X-ray(CXR)is sometimes indicated.The value of automated diagnosis is that it saves time and money by minimizing human effort.Three significant contributions are made by our research.Its initial purpose is to use the essential finetuning methodology to test the action and efficiency of a variety of vision models,ranging from Inception to Neural Architecture Search(NAS)networks.Second,by plotting class activationmaps(CAMs)for individual networks and assessing classification efficiency with AUC-ROC curves,the behavior of these models is visually analyzed.Finally,stacked ensembles techniques were used to provide greater generalization by combining finetuned models with six ensemble neural networks.Using stacked ensembles,the generalization of the models improved.Furthermore,the ensemble model created by combining all of the finetuned networks obtained a state-of-the-art COVID-19 accuracy detection score of 99.17%.The precision and recall rates were 99.99%and 89.79%,respectively,highlighting the robustness of stacked ensembles.The proposed ensemble approach performed well in the classification of the COVID-19 lesions on CXR according to the experimental results.
文摘In the field of medical images,pixel-level labels are time-consuming and expensive to acquire,while image-level labels are relatively easier to obtain.Therefore,it makes sense to learn more information(knowledge)from a small number of hard-to-get pixel-level annotated images to apply to different tasks to maximize their usefulness and save time and training costs.In this paper,using Pixel-Level Labeled Images forMulti-Task Learning(PLDMLT),we focus on grading the severity of fundus images for Diabetic Retinopathy(DR).This is because,for the segmentation task,there is a finely labeled mask,while the severity grading task is without classification labels.To this end,we propose a two-stage multi-label learning weakly supervised algorithm,which generates initial classification pseudo labels in the first stage and visualizes heat maps at all levels of severity using Grad-Cam to further provide medical interpretability for the classification task.A multitask model framework with U-net as the baseline is proposed in the second stage.A label update network is designed to alleviate the gradient balance between the classification and segmentation tasks.Extensive experimental results show that our PLDMLTmethod significantly outperforms other stateof-the-art methods in DR segmentation on two public datasets,achieving up to 98.897%segmentation accuracy.In addition,our method achieves comparable competitiveness with single-task fully supervised learning in the DR severity grading task.
基金This research has been funded by the Research General Direction at Universidad Santiago de Cali under call No.01-2021.
文摘The most salient argument that needs to be addressed universally is Early Breast Cancer Detection(EBCD),which helps people live longer lives.The Computer-Aided Detection(CADs)/Computer-Aided Diagnosis(CADx)sys-tem is indeed a software automation tool developed to assist the health profes-sions in Breast Cancer Detection and Diagnosis(BCDD)and minimise mortality by the use of medical histopathological image classification in much less time.This paper purposes of examining the accuracy of the Convolutional Neural Network(CNN),which can be used to perceive breast malignancies for initial breast cancer detection to determine which strategy is efficient for the early iden-tification of breast cell malignancies formation of masses and Breast microcalci-fications on the mammogram.When we have insufficient data for a new domain that is desired to be handled by a pre-trained Convolutional Neural Network of Residual Network(ResNet50)for Breast Cancer Detection and Diagnosis,to obtain the Discriminative Localization,Convolutional Neural Network with Class Activation Map(CAM)has also been used to perform breast microcalcifications detection tofind a specific class in the Histopathological image.The test results indicate that this method performed almost 225.15%better at determining the exact location of disease(Discriminative Localization)through breast microcalci-fications images.ResNet50 seems to have the highest level of accuracy for images of Benign Tumour(BT)/Malignant Tumour(MT)cases at 97.11%.ResNet50’s average accuracy for pre-trained Convolutional Neural Network is 94.17%.
文摘This paper proposes an accurate,efficient and explainable method for the classification of the surrounding rock based on a convolutional neural network(CNN).The state-of-the-art robust CNN model(EfficientNet)is applied to tunnel wall image recognition.Gaussian filtering,data augmentation and other data pre-processing techniques are used to improve the data quality and quantity.Combined with transfer learning,the generality,accuracy and efficiency of the deep learning(DL)model are further improved,and finally we achieve 89.96%accuracy.Compared with other state-of-the-art CNN architectures,such as ResNet and Inception-ResNet-V2(IRV2),the presented deep transfer learning model is more stable,accurate and efficient.To reveal the rock classification mechanism of the proposed model,Gradient-weight Class Activation Map(Grad-CAM)visualizations are integrated into the model to enable its explainability and accountability.The developed deep transfer learning model has been applied to support the tunneling of the Xingyi City Bypass in the high mountain area of Guizhou,China,with great results.
基金This research is partially supported by the Programme for Professor of Special Appointment(Eastern Scholar)at Shanghai Institutions of Higher Learning,and also partially supported by JSPS KAKENHI Grant No.15K00159.
文摘Scene recognition is a fundamental task in computer vision,which generally includes three vital stages,namely feature extraction,feature transformation and classification.Early research mainly focuses on feature extraction,but with the rise of Convolutional Neural Networks(CNNs),more and more feature transformation methods are proposed based on CNN features.In this work,a novel feature transformation algorithm called Graph Encoded Local Discriminative Region Representation(GEDRR)is proposed to find discriminative local representations for scene images and explore the relationship between the discriminative regions.In addition,we propose a method using the multi-head attention module to enhance and fuse convolutional feature maps.Combining the two methods and the global representation,a scene recognition framework called Global and Graph Encoded Local Discriminative Region Representation(G2ELDR2)is proposed.The experimental results on three scene datasets demonstrate the effectiveness of our model,which outperforms many state-of-the-arts.
文摘Modern leather industries are focused on producing high quality leather products for sustaining the market com-petitiveness. However, various leather defects are introduced during various stages of manufacturing process such as material handling, tanning and dyeing. Manual inspection of leather surfaces is subjective and inconsistent in nature;hence machine vision systems have been widely adopted for the automated inspection of leather defects. It is neces-sary develop suitable image processing algorithms for localize leather defects such as folding marks, growth marks, grain off, loose grain, and pinhole due to the ambiguous texture pattern and tiny nature in the localized regions of the leather. This paper presents deep learning neural network-based approach for automatic localization and classifica-tion of leather defects using a machine vision system. In this work, popular convolutional neural networks are trained using leather images of different leather defects and a class activation mapping technique is followed to locate the region of interest for the class of leather defect. Convolution neural networks such as Google net, Squeeze-net, RestNet are found to provide better accuracy of classification as compared with the state-of-the-art neural network architectures and the results are presented.
基金supported by the National Key Research & Development Plan of China (No. 2017YFB1002804)National Natural Science Foundation of China (Nos. 61425017, 61773379, 61332017, 61603390 and 61771472)the Major Program for the 325 National Social Science Fund of China (No. 13&ZD189)
文摘Facial emotion recognition is an essential and important aspect of the field of human-machine interaction.Past research on facial emotion recognition focuses on the laboratory environment.However,it faces many challenges in real-world conditions,i.e.,illumination changes,large pose variations and partial or full occlusions.Those challenges lead to different face areas with different degrees of sharpness and completeness.Inspired by this fact,we focus on the authenticity of predictions generated by different<emotion,region>pairs.For example,if only the mouth areas are available and the emotion classifier predicts happiness,then there is a question of how to judge the authenticity of predictions.This problem can be converted into the contribution of different face areas to different emotions.In this paper,we divide the whole face into six areas:nose areas,mouth areas,eyes areas,nose to mouth areas,nose to eyes areas and mouth to eyes areas.To obtain more convincing results,our experiments are conducted on three different databases:facial expression recognition+(FER+),real-world affective faces database(RAF-DB)and expression in-the-wild(ExpW)dataset.Through analysis of the classification accuracy,the confusion matrix and the class activation map(CAM),we can establish convincing results.To sum up,the contributions of this paper lie in two areas:1)We visualize concerned areas of human faces in emotion recognition;2)We analyze the contribution of different face areas to different emotions in real-world conditions through experimental analysis.Our findings can be combined with findings in psychology to promote the understanding of emotional expressions.
文摘Current methods for radar target detection usually work on the basis of high signal-to-clutter ratios.In this paper we propose a novel convolutional neural network based dual-activated clutter suppression algorithm,to solve the problem caused by low signal-to-clutter ratios in actual situations on the sea surface.Dual activation has two steps.First,we multiply the activated weights of the last dense layer with the activated feature maps from the upsample layer.Through this,we can obtain the class activation maps(CAMs),which correspond to the positive region of the sea clutter.Second,we obtain the suppression coefficients by mapping the CAM inversely to the sea clutter spectrum.Then,we obtain the activated range-Doppler maps by multiplying the coefficients with the raw range-Doppler maps.In addition,we propose a sampling-based data augmentation method and an effective multiclass coding method to improve the prediction accuracy.Measurement on real datasets verified the effectiveness of the proposed method.