Cross entropy is a measure in machine learning and deep learning that assesses the difference between predicted and actual probability distributions. In this study, we propose cross entropy as a performance evaluation...Cross entropy is a measure in machine learning and deep learning that assesses the difference between predicted and actual probability distributions. In this study, we propose cross entropy as a performance evaluation metric for image classifier models and apply it to the CT image classification of lung cancer. A convolutional neural network is employed as the deep neural network (DNN) image classifier, with the residual network (ResNet) 50 chosen as the DNN archi-tecture. The image data used comprise a lung CT image set. Two classification models are built from datasets with varying amounts of data, and lung cancer is categorized into four classes using 10-fold cross-validation. Furthermore, we employ t-distributed stochastic neighbor embedding to visually explain the data distribution after classification. Experimental results demonstrate that cross en-tropy is a highly useful metric for evaluating the reliability of image classifier models. It is noted that for a more comprehensive evaluation of model perfor-mance, combining with other evaluation metrics is considered essential. .展开更多
This study evaluates the performance and reliability of a vision transformer (ViT) compared to convolutional neural networks (CNNs) using the ResNet50 model in classifying lung cancer from CT images into four categori...This study evaluates the performance and reliability of a vision transformer (ViT) compared to convolutional neural networks (CNNs) using the ResNet50 model in classifying lung cancer from CT images into four categories: lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), large cell carcinoma (LULC), and normal. Although CNNs have made significant advancements in medical imaging, their limited capacity to capture long-range dependencies has led to the exploration of ViTs, which leverage self-attention mechanisms for a more comprehensive global understanding of images. The study utilized a dataset of 748 lung CT images to train both models with standardized input sizes, assessing their performance through conventional metrics—accuracy, precision, recall, F1 score, specificity, and AUC—as well as cross entropy, a novel metric for evaluating prediction uncertainty. Both models achieved similar accuracy rates (95%), with ViT demonstrating a slight edge over ResNet50 in precision and F1 scores for specific classes. However, ResNet50 exhibited higher recall for LULC, indicating fewer missed cases. Cross entropy analysis showed that the ViT model had lower average uncertainty, particularly in the LUAD, Normal, and LUSC classes, compared to ResNet50. This finding suggests that ViT predictions are generally more reliable, though ResNet50 performed better for LULC. The study underscores that accuracy alone is insufficient for model comparison, as cross entropy offers deeper insights into the reliability and confidence of model predictions. The results highlight the importance of incorporating cross entropy alongside traditional metrics for a more comprehensive evaluation of deep learning models in medical image classification, providing a nuanced understanding of their performance and reliability. While the ViT outperformed the CNN-based ResNet50 in lung cancer classification based on cross-entropy values, the performance differences were minor and may not hold clinical significance. Therefore, it may be premature to consider replacing CNNs with ViTs in this specific application.展开更多
In a convolutional neural network (CNN) classification model for diagnosing medical images, transparency and interpretability of the model’s behavior are crucial in addition to high classification accuracy, and it is...In a convolutional neural network (CNN) classification model for diagnosing medical images, transparency and interpretability of the model’s behavior are crucial in addition to high classification accuracy, and it is highly important to explicitly demonstrate them. In this study, we constructed an interpretable CNN-based model for breast density classification using spectral information from mammograms. We evaluated whether the model’s prediction scores provided reliable probability values using a reliability diagram and visualized the basis for the final prediction. In constructing the classification model, we modified ResNet50 and introduced algorithms for extracting and inputting image spectra, visualizing network behavior, and quantifying prediction ambiguity. From the experimental results, our proposed model demonstrated not only high classification accuracy but also higher reliability and interpretability compared to the conventional CNN models that use pixel information from images. Furthermore, our proposed model can detect misclassified data and indicate explicit basis for prediction. The results demonstrated the effectiveness and usefulness of our proposed model from the perspective of credibility and transparency.展开更多
Computer-aided detection and diagnosis (CAD) systems are increasingly being used as an aid by clinicians for detection and interpretation of diseases. In general, a CAD system employs a classifier to detect or disting...Computer-aided detection and diagnosis (CAD) systems are increasingly being used as an aid by clinicians for detection and interpretation of diseases. In general, a CAD system employs a classifier to detect or distinguish between abnormal and normal tissues on images. In the phase of classification, a set of image features and/or texture features extracted from the images are commonly used. In this article, we investigated the characteristic of the output entropy of an image and demonstrated the usefulness of the output entropy acting as a texture feature in CAD systems. In order to validate the effectiveness and superiority of the output-entropy-based texture feature, two well-known texture features, i.e., mean and standard deviation were used for comparison. The database used in this study comprised 50 CT images obtained from 10 patients with pulmonary nodules, and 50 CT images obtained from 5 normal subjects. We used a support vector machine for classification. A leave-one-out method was employed for training and classification. Three combinations of texture features, i.e., mean and entropy, standard deviation and entropy, and standard deviation and mean were used as the inputs to the classifier. Three different regions of interest (ROI) sizes, i.e., 11 × 11, 9 × 9 and 7 × 7 pixels from the database were selected for computation of the feature values. Our experimental results show that the combination of entropy and standard deviation is significantly better than both the combination of mean and entropy and that of standard deviation and mean in the case of the ROI size of 11 × 11 pixels (p < 0.05). These results suggest that information entropy of an image can be used as an effective feature for CAD applications.展开更多
In recent years, with numerous developments of convolutional neural network (CNN) classification models for medical diagnosis, the issue of misrecognition/misclassification has become more and more important. Thus, re...In recent years, with numerous developments of convolutional neural network (CNN) classification models for medical diagnosis, the issue of misrecognition/misclassification has become more and more important. Thus, research on misrecognition/misclassification has been progressing. This study focuses on the problem of misrecognition/misclassification of CNN classification models for coronavirus disease (COVID-19) using chest X-ray images. We construct two models for COVID-19 pneumonia classification by fine-tuning ResNet-50 architecture, i.e., a model retrained with full-sized original images and a model retrained with segmented images. The present study demonstrates the uncertainty (misrecognition/misclassification) of model performance caused by the discrepancy in the shapes of images at the phase of model construction and that of clinical applications. To achieve it, we apply three XAI methods to demonstrate and explain the uncertainty of classification results obtained from the two constructed models assuming for clinical applications. Experimental results indicate that the performance of classification models cannot be maintained when the type of constructed model and the geometric shape of input images are not matched, which may bring about misrecognition in clinical applications. We also notice that the effect of adversarial attack might be induced if the method of image segmentation is not performed properly. The results suggest that the best approach to obtaining a highly reliable prediction in the classification of COVID-19 pneumonia is to construct a model using full-sized original images as training data and use full-sized original images as the input when utilized in clinical applications.展开更多
文摘Cross entropy is a measure in machine learning and deep learning that assesses the difference between predicted and actual probability distributions. In this study, we propose cross entropy as a performance evaluation metric for image classifier models and apply it to the CT image classification of lung cancer. A convolutional neural network is employed as the deep neural network (DNN) image classifier, with the residual network (ResNet) 50 chosen as the DNN archi-tecture. The image data used comprise a lung CT image set. Two classification models are built from datasets with varying amounts of data, and lung cancer is categorized into four classes using 10-fold cross-validation. Furthermore, we employ t-distributed stochastic neighbor embedding to visually explain the data distribution after classification. Experimental results demonstrate that cross en-tropy is a highly useful metric for evaluating the reliability of image classifier models. It is noted that for a more comprehensive evaluation of model perfor-mance, combining with other evaluation metrics is considered essential. .
文摘This study evaluates the performance and reliability of a vision transformer (ViT) compared to convolutional neural networks (CNNs) using the ResNet50 model in classifying lung cancer from CT images into four categories: lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), large cell carcinoma (LULC), and normal. Although CNNs have made significant advancements in medical imaging, their limited capacity to capture long-range dependencies has led to the exploration of ViTs, which leverage self-attention mechanisms for a more comprehensive global understanding of images. The study utilized a dataset of 748 lung CT images to train both models with standardized input sizes, assessing their performance through conventional metrics—accuracy, precision, recall, F1 score, specificity, and AUC—as well as cross entropy, a novel metric for evaluating prediction uncertainty. Both models achieved similar accuracy rates (95%), with ViT demonstrating a slight edge over ResNet50 in precision and F1 scores for specific classes. However, ResNet50 exhibited higher recall for LULC, indicating fewer missed cases. Cross entropy analysis showed that the ViT model had lower average uncertainty, particularly in the LUAD, Normal, and LUSC classes, compared to ResNet50. This finding suggests that ViT predictions are generally more reliable, though ResNet50 performed better for LULC. The study underscores that accuracy alone is insufficient for model comparison, as cross entropy offers deeper insights into the reliability and confidence of model predictions. The results highlight the importance of incorporating cross entropy alongside traditional metrics for a more comprehensive evaluation of deep learning models in medical image classification, providing a nuanced understanding of their performance and reliability. While the ViT outperformed the CNN-based ResNet50 in lung cancer classification based on cross-entropy values, the performance differences were minor and may not hold clinical significance. Therefore, it may be premature to consider replacing CNNs with ViTs in this specific application.
文摘In a convolutional neural network (CNN) classification model for diagnosing medical images, transparency and interpretability of the model’s behavior are crucial in addition to high classification accuracy, and it is highly important to explicitly demonstrate them. In this study, we constructed an interpretable CNN-based model for breast density classification using spectral information from mammograms. We evaluated whether the model’s prediction scores provided reliable probability values using a reliability diagram and visualized the basis for the final prediction. In constructing the classification model, we modified ResNet50 and introduced algorithms for extracting and inputting image spectra, visualizing network behavior, and quantifying prediction ambiguity. From the experimental results, our proposed model demonstrated not only high classification accuracy but also higher reliability and interpretability compared to the conventional CNN models that use pixel information from images. Furthermore, our proposed model can detect misclassified data and indicate explicit basis for prediction. The results demonstrated the effectiveness and usefulness of our proposed model from the perspective of credibility and transparency.
文摘Computer-aided detection and diagnosis (CAD) systems are increasingly being used as an aid by clinicians for detection and interpretation of diseases. In general, a CAD system employs a classifier to detect or distinguish between abnormal and normal tissues on images. In the phase of classification, a set of image features and/or texture features extracted from the images are commonly used. In this article, we investigated the characteristic of the output entropy of an image and demonstrated the usefulness of the output entropy acting as a texture feature in CAD systems. In order to validate the effectiveness and superiority of the output-entropy-based texture feature, two well-known texture features, i.e., mean and standard deviation were used for comparison. The database used in this study comprised 50 CT images obtained from 10 patients with pulmonary nodules, and 50 CT images obtained from 5 normal subjects. We used a support vector machine for classification. A leave-one-out method was employed for training and classification. Three combinations of texture features, i.e., mean and entropy, standard deviation and entropy, and standard deviation and mean were used as the inputs to the classifier. Three different regions of interest (ROI) sizes, i.e., 11 × 11, 9 × 9 and 7 × 7 pixels from the database were selected for computation of the feature values. Our experimental results show that the combination of entropy and standard deviation is significantly better than both the combination of mean and entropy and that of standard deviation and mean in the case of the ROI size of 11 × 11 pixels (p < 0.05). These results suggest that information entropy of an image can be used as an effective feature for CAD applications.
文摘In recent years, with numerous developments of convolutional neural network (CNN) classification models for medical diagnosis, the issue of misrecognition/misclassification has become more and more important. Thus, research on misrecognition/misclassification has been progressing. This study focuses on the problem of misrecognition/misclassification of CNN classification models for coronavirus disease (COVID-19) using chest X-ray images. We construct two models for COVID-19 pneumonia classification by fine-tuning ResNet-50 architecture, i.e., a model retrained with full-sized original images and a model retrained with segmented images. The present study demonstrates the uncertainty (misrecognition/misclassification) of model performance caused by the discrepancy in the shapes of images at the phase of model construction and that of clinical applications. To achieve it, we apply three XAI methods to demonstrate and explain the uncertainty of classification results obtained from the two constructed models assuming for clinical applications. Experimental results indicate that the performance of classification models cannot be maintained when the type of constructed model and the geometric shape of input images are not matched, which may bring about misrecognition in clinical applications. We also notice that the effect of adversarial attack might be induced if the method of image segmentation is not performed properly. The results suggest that the best approach to obtaining a highly reliable prediction in the classification of COVID-19 pneumonia is to construct a model using full-sized original images as training data and use full-sized original images as the input when utilized in clinical applications.