As the importance of email increases,the amount of malicious email is also increasing,so the need for malicious email filtering is growing.Since it is more economical to combine commodity hardware consisting of a medi...As the importance of email increases,the amount of malicious email is also increasing,so the need for malicious email filtering is growing.Since it is more economical to combine commodity hardware consisting of a medium server or PC with a virtual environment to use as a single server resource and filter malicious email using machine learning techniques,we used a Hadoop MapReduce framework and Naïve Bayes among machine learning methods for malicious email filtering.Naïve Bayes was selected because it is one of the top machine learning methods(Support Vector Machine(SVM),Naïve Bayes,K-Nearest Neighbor(KNN),and Decision Tree)in terms of execution time and accuracy.Malicious email was filtered with MapReduce programming using the Naïve Bayes technique,which is a supervised machine learning method,in a Hadoop framework with optimized performance and also with the Python program technique with the Naïve Bayes technique applied in a bare metal server environment with the Hadoop environment not applied.According to the results of a comparison of the accuracy and predictive error rates of the two methods,the Hadoop MapReduce Naïve Bayes method improved the accuracy of spam and ham email identification 1.11 times and the prediction error rate 14.13 times compared to the non-Hadoop Python Naïve Bayes method.展开更多
The naïve Bayes classifier is one of the commonly used data mining methods for classification.Despite its simplicity,naïve Bayes is effective and computationally efficient.Although the strong attribute indep...The naïve Bayes classifier is one of the commonly used data mining methods for classification.Despite its simplicity,naïve Bayes is effective and computationally efficient.Although the strong attribute independence assumption in the naïve Bayes classifier makes it a tractable method for learning,this assumption may not hold in real-world applications.Many enhancements to the basic algorithm have been proposed in order to alleviate the violation of attribute independence assumption.While these methods improve the classification performance,they do not necessarily retain the mathematical structure of the naïve Bayes model and some at the expense of computational time.One approach to reduce the naïvetéof the classifier is to incorporate attribute weights in the conditional probability.In this paper,we proposed a method to incorporate attribute weights to naïve Bayes.To evaluate the performance of our method,we used the public benchmark datasets.We compared our method with the standard naïve Bayes and baseline attribute weighting methods.Experimental results show that our method to incorporate attribute weights improves the classification performance compared to both standard naïve Bayes and baseline attribute weighting methods in terms of classification accuracy and F1,especially when the independence assumption is strongly violated,which was validated using the Chi-square test of independence.展开更多
<span style="font-family:Verdana;">The presence of bearing faults reduces the efficiency of rotating machines and thus increases energy consumption or even the total stoppage of the machine. </span&...<span style="font-family:Verdana;">The presence of bearing faults reduces the efficiency of rotating machines and thus increases energy consumption or even the total stoppage of the machine. </span><span style="font-family:Verdana;">It becomes essential to correctly diagnose the fault caused by the bearing.</span><span style="font-family:Verdana;"> Hence the importance of determining an effective features extraction method that best describes the fault. The vision of this paper is to merge the features selection methods in order to define the most relevant featuresin the texture </span><span style="font-family:Verdana;">of the vibration signal images. In this study, the Gray Level Co-occurrence </span><span style="font-family:Verdana;">Matrix (GLCM) in texture analysis is applied on the vibration signal represented in images. Features</span><span><span><span style="font-family:;" "=""> </span></span></span><span><span><span style="font-family:;" "=""><span style="font-family:Verdana;">selection based on the merge of PCA (Principal component Analysis) method and SFE (Sequential Features Extraction) method is </span><span style="font-family:Verdana;">done to obtain the most relevant features. The multiclass-Na<span style="white-space:nowrap;">?</span>ve Bayesclassifi</span><span style="font-family:Verdana;">er is used to test the proposed approach. The success rate of this classification is 98.27%. The relevant features obtained give promising results and are more efficient than the methods observed in the literature.</span></span></span></span>展开更多
AIM: To characterize the prevalence of subpopulations of CD4+ cells along with that of major inhibitor or stimulator cell types in therapy-nave childhood Crohn's disease (CD) and to test whether abnormalities of...AIM: To characterize the prevalence of subpopulations of CD4+ cells along with that of major inhibitor or stimulator cell types in therapy-nave childhood Crohn's disease (CD) and to test whether abnormalities of immune phenotype are normalized with the improvement of clinical signs and symptoms of disease. METHODS: We enrolled 26 pediatric patients with CD. 14 therapy-nave CD children; of those, 10 children remitted on conventional therapy and formed the remission group. We also tested another group of 12 chil-dren who relapsed with conventional therapy and were given infliximab; and 15 healthy children who served as controls. The prevalence of Th1 and Th2, nave and memory, activated and regulatory T cells, along with the members of innate immunity such as natural killer (NK), NK-T, myeloid and plasmocytoid dendritic cells (DCs), monocytes and Toll-like receptor (TLR)-2 and TLR-4 expression were determined in peripheral blood samples. RESULTS: Children with therapy-nave CD and those in relapse showed a decrease in Th1 cell prevalence. Simultaneously, an increased prevalence of memory and activated lymphocytes along with that of DCs and monocytes was observed. In addition, the ratio of myeloid /plasmocytoid DCs and the prevalence of TLR-2 or TLR-4 positive DCs and monocytes were also higher in therapy-nave CD than in controls. The majority of alterations diminished in remitted CD irrespective of whether remission was obtained by conventional or biological therapy. CONCLUSION: The finding that immune phenotype is normalized in remission suggests a link between immune phenotype and disease activity in childhood CD. Our observations support the involvement of members of the adaptive and innate immune systems in childhood CD.展开更多
Intrusion detection is the investigation process of information about the system activities or its data to detect any malicious behavior or unauthorized activity.Most of the IDS implement K-means clustering technique ...Intrusion detection is the investigation process of information about the system activities or its data to detect any malicious behavior or unauthorized activity.Most of the IDS implement K-means clustering technique due to its linear complexity and fast computing ability.Nonetheless,it is Naïve use of the mean data value for the cluster core that presents a major drawback.The chances of two circular clusters having different radius and centering at the same mean will occur.This condition cannot be addressed by the K-means algorithm because the mean value of the various clusters is very similar together.However,if the clusters are not spherical,it fails.To overcome this issue,a new integrated hybrid model by integrating expectation maximizing(EM)clustering using a Gaussian mixture model(GMM)and naïve Bays classifier have been proposed.In this model,GMM give more flexibility than K-Means in terms of cluster covariance.Also,they use probabilities function and soft clustering,that’s why they can have multiple cluster for a single data.In GMM,we can define the cluster form in GMM by two parameters:the mean and the standard deviation.This means that by using these two parameters,the cluster can take any kind of elliptical shape.EM-GMM will be used to cluster data based on data activity into the corresponding category.展开更多
This paper proposed an improved Naïve Bayes Classifier for sentimental analysis from a large-scale dataset such as in YouTube.YouTube contains large unstructured and unorganized comments and reactions,which carry...This paper proposed an improved Naïve Bayes Classifier for sentimental analysis from a large-scale dataset such as in YouTube.YouTube contains large unstructured and unorganized comments and reactions,which carry important information.Organizing large amounts of data and extracting useful information is a challenging task.The extracted information can be considered as new knowledge and can be used for deci sion-making.We extract comments from YouTube on videos and categorized them in domain-specific,and then apply the Naïve Bayes classifier with improved techniques.Our method provided a decent 80%accuracy in classifying those comments.This experiment shows that the proposed method provides excellent adaptability for large-scale text classification.展开更多
Social media networks are becoming essential to our daily activities,and many issues are due to this great involvement in our lives.Cyberbullying is a social media network issue,a global crisis affecting the victims a...Social media networks are becoming essential to our daily activities,and many issues are due to this great involvement in our lives.Cyberbullying is a social media network issue,a global crisis affecting the victims and society as a whole.It results from a misunderstanding regarding freedom of speech.In this work,we proposed a methodology for detecting such behaviors(bullying,harassment,and hate-related texts)using supervised machine learning algo-rithms(SVM,Naïve Bayes,Logistic regression,and random forest)and for predicting a topic associated with these text data using unsupervised natural language processing,such as latent Dirichlet allocation.In addition,we used accuracy,precision,recall,and F1 score to assess prior classifiers.Results show that the use of logistic regression,support vector machine,random forest model,and Naïve Bayes has 95%,94.97%,94.66%,and 93.1%accuracy,respectively.展开更多
Classification model has received great attention in any domain of research and also a reliable tool for medical disease diagnosis. The domain of classification model is used in disease diagnosis, disease prediction, ...Classification model has received great attention in any domain of research and also a reliable tool for medical disease diagnosis. The domain of classification model is used in disease diagnosis, disease prediction, bio informatics, crime prediction and so on. However, an efficient disease diagnosis model was compromised the disease prediction. In this paper, a Rough Set Rule-based Multitude Classifier (RS-RMC) is developed to improve the disease prediction rate and enhance the class accuracy of disease being diagnosed. The RS-RMC involves two steps. Initially, a Rough Set model is used for Feature Selection aiming at minimizing the execution time for obtaining the disease feature set. A Multitude Classifier model is presented in second step for detection of heart disease and for efficient classification. The Na?ve Bayes Classifier algorithm is designed for efficient identification of classes to measure the relationship between disease features and improving disease prediction rate. Experimental analysis shows that RS-RMC is used to reduce the execution time for extracting the disease feature with minimum false positive rate compared to the state-of-the-art works.展开更多
The exponential pace of the spread of the digital world has served as one of the assisting forces to generate an enormous amount of informationflow-ing over the network.The data will always remain under the threat of t...The exponential pace of the spread of the digital world has served as one of the assisting forces to generate an enormous amount of informationflow-ing over the network.The data will always remain under the threat of technolo-gical suffering where intruders and hackers consistently try to breach the security systems by gaining personal information insights.In this paper,the authors pro-posed the HDTbNB(Hybrid Decision Tree-based Naïve Bayes)algorithm tofind the essential features without data scaling to maximize the model’s performance by reducing the false alarm rate and training period to reduce zero frequency with enhanced accuracy of IDS(Intrusion Detection System)and to further analyze the performance execution of distinct machine learning algorithms as Naïve Bayes,Decision Tree,K-Nearest Neighbors and Logistic Regression over KDD 99 data-set.The performance of algorithm is evaluated by making a comparative analysis of computed parameters as accuracy,macro average,and weighted average.Thefindings were concluded as a percentage increase in accuracy,precision,sensitiv-ity,specificity,and a decrease in misclassification as 9.3%,6.4%,12.5%,5.2%and 81%.展开更多
Machine learning algorithms (MLs) can potentially improve disease diagnostics, leading to early detection and treatment of these diseases. As a malignant tumor whose primary focus is located in the bronchial mucosal e...Machine learning algorithms (MLs) can potentially improve disease diagnostics, leading to early detection and treatment of these diseases. As a malignant tumor whose primary focus is located in the bronchial mucosal epithelium, lung cancer has the highest mortality and morbidity among cancer types, threatening health and life of patients suffering from the disease. Machine learning algorithms such as Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Naïve Bayes (NB) have been used for lung cancer prediction. However they still face challenges such as high dimensionality of the feature space, over-fitting, high computational complexity, noise and missing data, low accuracies, low precision and high error rates. Ensemble learning, which combines classifiers, may be helpful to boost prediction on new data. However, current ensemble ML techniques rarely consider comprehensive evaluation metrics to evaluate the performance of individual classifiers. The main purpose of this study was to develop an ensemble classifier that improves lung cancer prediction. An ensemble machine learning algorithm is developed based on RF, SVM, NB, and KNN. Feature selection is done based on Principal Component Analysis (PCA) and Analysis of Variance (ANOVA). This algorithm is then executed on lung cancer data and evaluated using execution time, true positives (TP), true negatives (TN), false positives (FP), false negatives (FN), false positive rate (FPR), recall (R), precision (P) and F-measure (FM). Experimental results show that the proposed ensemble classifier has the best classification of 0.9825% with the lowest error rate of 0.0193. This is followed by SVM in which the probability of having the best classification is 0.9652% at an error rate of 0.0206. On the other hand, NB had the worst performance of 0.8475% classification at 0.0738 error rate.展开更多
The freshness of fruits is considered to be one of the essential characteristics for consumers in determining their quality,flavor and nutritional value.The primary need for identifying rotten fruits is to ensure that...The freshness of fruits is considered to be one of the essential characteristics for consumers in determining their quality,flavor and nutritional value.The primary need for identifying rotten fruits is to ensure that only fresh and high-quality fruits are sold to consumers.The impact of rotten fruits can foster harmful bacteria,molds and other microorganisms that can cause food poisoning and other illnesses to the consumers.The overall purpose of the study is to classify rotten fruits,which can affect the taste,texture,and appearance of other fresh fruits,thereby reducing their shelf life.The agriculture and food industries are increasingly adopting computer vision technology to detect rotten fruits and forecast their shelf life.Hence,this research work mainly focuses on the Convolutional Neural Network’s(CNN)deep learning model,which helps in the classification of rotten fruits.The proposed methodology involves real-time analysis of a dataset of various types of fruits,including apples,bananas,oranges,papayas and guavas.Similarly,machine learningmodels such as GaussianNaïve Bayes(GNB)and random forest are used to predict the fruit’s shelf life.The results obtained from the various pre-trained models for rotten fruit detection are analysed based on an accuracy score to determine the best model.In comparison to other pre-trained models,the visual geometry group16(VGG16)obtained a higher accuracy score of 95%.Likewise,the random forest model delivers a better accuracy score of 88% when compared with GNB in forecasting the fruit’s shelf life.By developing an accurate classification model,only fresh and safe fruits reach consumers,reducing the risks associated with contaminated produce.Thereby,the proposed approach will have a significant impact on the food industry for efficient fruit distribution and also benefit customers to purchase fresh fruits.展开更多
文摘As the importance of email increases,the amount of malicious email is also increasing,so the need for malicious email filtering is growing.Since it is more economical to combine commodity hardware consisting of a medium server or PC with a virtual environment to use as a single server resource and filter malicious email using machine learning techniques,we used a Hadoop MapReduce framework and Naïve Bayes among machine learning methods for malicious email filtering.Naïve Bayes was selected because it is one of the top machine learning methods(Support Vector Machine(SVM),Naïve Bayes,K-Nearest Neighbor(KNN),and Decision Tree)in terms of execution time and accuracy.Malicious email was filtered with MapReduce programming using the Naïve Bayes technique,which is a supervised machine learning method,in a Hadoop framework with optimized performance and also with the Python program technique with the Naïve Bayes technique applied in a bare metal server environment with the Hadoop environment not applied.According to the results of a comparison of the accuracy and predictive error rates of the two methods,the Hadoop MapReduce Naïve Bayes method improved the accuracy of spam and ham email identification 1.11 times and the prediction error rate 14.13 times compared to the non-Hadoop Python Naïve Bayes method.
文摘The naïve Bayes classifier is one of the commonly used data mining methods for classification.Despite its simplicity,naïve Bayes is effective and computationally efficient.Although the strong attribute independence assumption in the naïve Bayes classifier makes it a tractable method for learning,this assumption may not hold in real-world applications.Many enhancements to the basic algorithm have been proposed in order to alleviate the violation of attribute independence assumption.While these methods improve the classification performance,they do not necessarily retain the mathematical structure of the naïve Bayes model and some at the expense of computational time.One approach to reduce the naïvetéof the classifier is to incorporate attribute weights in the conditional probability.In this paper,we proposed a method to incorporate attribute weights to naïve Bayes.To evaluate the performance of our method,we used the public benchmark datasets.We compared our method with the standard naïve Bayes and baseline attribute weighting methods.Experimental results show that our method to incorporate attribute weights improves the classification performance compared to both standard naïve Bayes and baseline attribute weighting methods in terms of classification accuracy and F1,especially when the independence assumption is strongly violated,which was validated using the Chi-square test of independence.
文摘<span style="font-family:Verdana;">The presence of bearing faults reduces the efficiency of rotating machines and thus increases energy consumption or even the total stoppage of the machine. </span><span style="font-family:Verdana;">It becomes essential to correctly diagnose the fault caused by the bearing.</span><span style="font-family:Verdana;"> Hence the importance of determining an effective features extraction method that best describes the fault. The vision of this paper is to merge the features selection methods in order to define the most relevant featuresin the texture </span><span style="font-family:Verdana;">of the vibration signal images. In this study, the Gray Level Co-occurrence </span><span style="font-family:Verdana;">Matrix (GLCM) in texture analysis is applied on the vibration signal represented in images. Features</span><span><span><span style="font-family:;" "=""> </span></span></span><span><span><span style="font-family:;" "=""><span style="font-family:Verdana;">selection based on the merge of PCA (Principal component Analysis) method and SFE (Sequential Features Extraction) method is </span><span style="font-family:Verdana;">done to obtain the most relevant features. The multiclass-Na<span style="white-space:nowrap;">?</span>ve Bayesclassifi</span><span style="font-family:Verdana;">er is used to test the proposed approach. The success rate of this classification is 98.27%. The relevant features obtained give promising results and are more efficient than the methods observed in the literature.</span></span></span></span>
文摘AIM: To characterize the prevalence of subpopulations of CD4+ cells along with that of major inhibitor or stimulator cell types in therapy-nave childhood Crohn's disease (CD) and to test whether abnormalities of immune phenotype are normalized with the improvement of clinical signs and symptoms of disease. METHODS: We enrolled 26 pediatric patients with CD. 14 therapy-nave CD children; of those, 10 children remitted on conventional therapy and formed the remission group. We also tested another group of 12 chil-dren who relapsed with conventional therapy and were given infliximab; and 15 healthy children who served as controls. The prevalence of Th1 and Th2, nave and memory, activated and regulatory T cells, along with the members of innate immunity such as natural killer (NK), NK-T, myeloid and plasmocytoid dendritic cells (DCs), monocytes and Toll-like receptor (TLR)-2 and TLR-4 expression were determined in peripheral blood samples. RESULTS: Children with therapy-nave CD and those in relapse showed a decrease in Th1 cell prevalence. Simultaneously, an increased prevalence of memory and activated lymphocytes along with that of DCs and monocytes was observed. In addition, the ratio of myeloid /plasmocytoid DCs and the prevalence of TLR-2 or TLR-4 positive DCs and monocytes were also higher in therapy-nave CD than in controls. The majority of alterations diminished in remitted CD irrespective of whether remission was obtained by conventional or biological therapy. CONCLUSION: The finding that immune phenotype is normalized in remission suggests a link between immune phenotype and disease activity in childhood CD. Our observations support the involvement of members of the adaptive and innate immune systems in childhood CD.
文摘Intrusion detection is the investigation process of information about the system activities or its data to detect any malicious behavior or unauthorized activity.Most of the IDS implement K-means clustering technique due to its linear complexity and fast computing ability.Nonetheless,it is Naïve use of the mean data value for the cluster core that presents a major drawback.The chances of two circular clusters having different radius and centering at the same mean will occur.This condition cannot be addressed by the K-means algorithm because the mean value of the various clusters is very similar together.However,if the clusters are not spherical,it fails.To overcome this issue,a new integrated hybrid model by integrating expectation maximizing(EM)clustering using a Gaussian mixture model(GMM)and naïve Bays classifier have been proposed.In this model,GMM give more flexibility than K-Means in terms of cluster covariance.Also,they use probabilities function and soft clustering,that’s why they can have multiple cluster for a single data.In GMM,we can define the cluster form in GMM by two parameters:the mean and the standard deviation.This means that by using these two parameters,the cluster can take any kind of elliptical shape.EM-GMM will be used to cluster data based on data activity into the corresponding category.
文摘This paper proposed an improved Naïve Bayes Classifier for sentimental analysis from a large-scale dataset such as in YouTube.YouTube contains large unstructured and unorganized comments and reactions,which carry important information.Organizing large amounts of data and extracting useful information is a challenging task.The extracted information can be considered as new knowledge and can be used for deci sion-making.We extract comments from YouTube on videos and categorized them in domain-specific,and then apply the Naïve Bayes classifier with improved techniques.Our method provided a decent 80%accuracy in classifying those comments.This experiment shows that the proposed method provides excellent adaptability for large-scale text classification.
文摘Social media networks are becoming essential to our daily activities,and many issues are due to this great involvement in our lives.Cyberbullying is a social media network issue,a global crisis affecting the victims and society as a whole.It results from a misunderstanding regarding freedom of speech.In this work,we proposed a methodology for detecting such behaviors(bullying,harassment,and hate-related texts)using supervised machine learning algo-rithms(SVM,Naïve Bayes,Logistic regression,and random forest)and for predicting a topic associated with these text data using unsupervised natural language processing,such as latent Dirichlet allocation.In addition,we used accuracy,precision,recall,and F1 score to assess prior classifiers.Results show that the use of logistic regression,support vector machine,random forest model,and Naïve Bayes has 95%,94.97%,94.66%,and 93.1%accuracy,respectively.
文摘Classification model has received great attention in any domain of research and also a reliable tool for medical disease diagnosis. The domain of classification model is used in disease diagnosis, disease prediction, bio informatics, crime prediction and so on. However, an efficient disease diagnosis model was compromised the disease prediction. In this paper, a Rough Set Rule-based Multitude Classifier (RS-RMC) is developed to improve the disease prediction rate and enhance the class accuracy of disease being diagnosed. The RS-RMC involves two steps. Initially, a Rough Set model is used for Feature Selection aiming at minimizing the execution time for obtaining the disease feature set. A Multitude Classifier model is presented in second step for detection of heart disease and for efficient classification. The Na?ve Bayes Classifier algorithm is designed for efficient identification of classes to measure the relationship between disease features and improving disease prediction rate. Experimental analysis shows that RS-RMC is used to reduce the execution time for extracting the disease feature with minimum false positive rate compared to the state-of-the-art works.
文摘The exponential pace of the spread of the digital world has served as one of the assisting forces to generate an enormous amount of informationflow-ing over the network.The data will always remain under the threat of technolo-gical suffering where intruders and hackers consistently try to breach the security systems by gaining personal information insights.In this paper,the authors pro-posed the HDTbNB(Hybrid Decision Tree-based Naïve Bayes)algorithm tofind the essential features without data scaling to maximize the model’s performance by reducing the false alarm rate and training period to reduce zero frequency with enhanced accuracy of IDS(Intrusion Detection System)and to further analyze the performance execution of distinct machine learning algorithms as Naïve Bayes,Decision Tree,K-Nearest Neighbors and Logistic Regression over KDD 99 data-set.The performance of algorithm is evaluated by making a comparative analysis of computed parameters as accuracy,macro average,and weighted average.Thefindings were concluded as a percentage increase in accuracy,precision,sensitiv-ity,specificity,and a decrease in misclassification as 9.3%,6.4%,12.5%,5.2%and 81%.
文摘Machine learning algorithms (MLs) can potentially improve disease diagnostics, leading to early detection and treatment of these diseases. As a malignant tumor whose primary focus is located in the bronchial mucosal epithelium, lung cancer has the highest mortality and morbidity among cancer types, threatening health and life of patients suffering from the disease. Machine learning algorithms such as Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Naïve Bayes (NB) have been used for lung cancer prediction. However they still face challenges such as high dimensionality of the feature space, over-fitting, high computational complexity, noise and missing data, low accuracies, low precision and high error rates. Ensemble learning, which combines classifiers, may be helpful to boost prediction on new data. However, current ensemble ML techniques rarely consider comprehensive evaluation metrics to evaluate the performance of individual classifiers. The main purpose of this study was to develop an ensemble classifier that improves lung cancer prediction. An ensemble machine learning algorithm is developed based on RF, SVM, NB, and KNN. Feature selection is done based on Principal Component Analysis (PCA) and Analysis of Variance (ANOVA). This algorithm is then executed on lung cancer data and evaluated using execution time, true positives (TP), true negatives (TN), false positives (FP), false negatives (FN), false positive rate (FPR), recall (R), precision (P) and F-measure (FM). Experimental results show that the proposed ensemble classifier has the best classification of 0.9825% with the lowest error rate of 0.0193. This is followed by SVM in which the probability of having the best classification is 0.9652% at an error rate of 0.0206. On the other hand, NB had the worst performance of 0.8475% classification at 0.0738 error rate.
文摘The freshness of fruits is considered to be one of the essential characteristics for consumers in determining their quality,flavor and nutritional value.The primary need for identifying rotten fruits is to ensure that only fresh and high-quality fruits are sold to consumers.The impact of rotten fruits can foster harmful bacteria,molds and other microorganisms that can cause food poisoning and other illnesses to the consumers.The overall purpose of the study is to classify rotten fruits,which can affect the taste,texture,and appearance of other fresh fruits,thereby reducing their shelf life.The agriculture and food industries are increasingly adopting computer vision technology to detect rotten fruits and forecast their shelf life.Hence,this research work mainly focuses on the Convolutional Neural Network’s(CNN)deep learning model,which helps in the classification of rotten fruits.The proposed methodology involves real-time analysis of a dataset of various types of fruits,including apples,bananas,oranges,papayas and guavas.Similarly,machine learningmodels such as GaussianNaïve Bayes(GNB)and random forest are used to predict the fruit’s shelf life.The results obtained from the various pre-trained models for rotten fruit detection are analysed based on an accuracy score to determine the best model.In comparison to other pre-trained models,the visual geometry group16(VGG16)obtained a higher accuracy score of 95%.Likewise,the random forest model delivers a better accuracy score of 88% when compared with GNB in forecasting the fruit’s shelf life.By developing an accurate classification model,only fresh and safe fruits reach consumers,reducing the risks associated with contaminated produce.Thereby,the proposed approach will have a significant impact on the food industry for efficient fruit distribution and also benefit customers to purchase fresh fruits.