The naïve Bayes classifier is one of the commonly used data mining methods for classification.Despite its simplicity,naïve Bayes is effective and computationally efficient.Although the strong attribute indep...The naïve Bayes classifier is one of the commonly used data mining methods for classification.Despite its simplicity,naïve Bayes is effective and computationally efficient.Although the strong attribute independence assumption in the naïve Bayes classifier makes it a tractable method for learning,this assumption may not hold in real-world applications.Many enhancements to the basic algorithm have been proposed in order to alleviate the violation of attribute independence assumption.While these methods improve the classification performance,they do not necessarily retain the mathematical structure of the naïve Bayes model and some at the expense of computational time.One approach to reduce the naïvetéof the classifier is to incorporate attribute weights in the conditional probability.In this paper,we proposed a method to incorporate attribute weights to naïve Bayes.To evaluate the performance of our method,we used the public benchmark datasets.We compared our method with the standard naïve Bayes and baseline attribute weighting methods.Experimental results show that our method to incorporate attribute weights improves the classification performance compared to both standard naïve Bayes and baseline attribute weighting methods in terms of classification accuracy and F1,especially when the independence assumption is strongly violated,which was validated using the Chi-square test of independence.展开更多
Neural architecture search(NAS)has become increasingly popular in the deep learning community recently,mainly because it can provide an opportunity to allow interested users without rich expertise to benefit from the ...Neural architecture search(NAS)has become increasingly popular in the deep learning community recently,mainly because it can provide an opportunity to allow interested users without rich expertise to benefit from the success of deep neural networks(DNNs).However,NAS is still laborious and time-consuming because a large number of performance estimations are required during the search process of NAS,and training DNNs is computationally intensive.To solve this major limitation of NAS,improving the computational efficiency is essential in the design of NAS.However,a systematic overview of computationally efficient NAS(CE-NAS)methods still lacks.To fill this gap,we provide a comprehensive survey of the state-of-the-art on CE-NAS by categorizing the existing work into proxy-based and surrogate-assisted NAS methods,together with a thorough discussion of their design principles and a quantitative comparison of their performances and computational complexities.The remaining challenges and open research questions are also discussed,and promising research topics in this emerging field are suggested.展开更多
<span style="font-family:Verdana;">The presence of bearing faults reduces the efficiency of rotating machines and thus increases energy consumption or even the total stoppage of the machine. </span&...<span style="font-family:Verdana;">The presence of bearing faults reduces the efficiency of rotating machines and thus increases energy consumption or even the total stoppage of the machine. </span><span style="font-family:Verdana;">It becomes essential to correctly diagnose the fault caused by the bearing.</span><span style="font-family:Verdana;"> Hence the importance of determining an effective features extraction method that best describes the fault. The vision of this paper is to merge the features selection methods in order to define the most relevant featuresin the texture </span><span style="font-family:Verdana;">of the vibration signal images. In this study, the Gray Level Co-occurrence </span><span style="font-family:Verdana;">Matrix (GLCM) in texture analysis is applied on the vibration signal represented in images. Features</span><span><span><span style="font-family:;" "=""> </span></span></span><span><span><span style="font-family:;" "=""><span style="font-family:Verdana;">selection based on the merge of PCA (Principal component Analysis) method and SFE (Sequential Features Extraction) method is </span><span style="font-family:Verdana;">done to obtain the most relevant features. The multiclass-Na<span style="white-space:nowrap;">?</span>ve Bayesclassifi</span><span style="font-family:Verdana;">er is used to test the proposed approach. The success rate of this classification is 98.27%. The relevant features obtained give promising results and are more efficient than the methods observed in the literature.</span></span></span></span>展开更多
As the importance of email increases,the amount of malicious email is also increasing,so the need for malicious email filtering is growing.Since it is more economical to combine commodity hardware consisting of a medi...As the importance of email increases,the amount of malicious email is also increasing,so the need for malicious email filtering is growing.Since it is more economical to combine commodity hardware consisting of a medium server or PC with a virtual environment to use as a single server resource and filter malicious email using machine learning techniques,we used a Hadoop MapReduce framework and Naïve Bayes among machine learning methods for malicious email filtering.Naïve Bayes was selected because it is one of the top machine learning methods(Support Vector Machine(SVM),Naïve Bayes,K-Nearest Neighbor(KNN),and Decision Tree)in terms of execution time and accuracy.Malicious email was filtered with MapReduce programming using the Naïve Bayes technique,which is a supervised machine learning method,in a Hadoop framework with optimized performance and also with the Python program technique with the Naïve Bayes technique applied in a bare metal server environment with the Hadoop environment not applied.According to the results of a comparison of the accuracy and predictive error rates of the two methods,the Hadoop MapReduce Naïve Bayes method improved the accuracy of spam and ham email identification 1.11 times and the prediction error rate 14.13 times compared to the non-Hadoop Python Naïve Bayes method.展开更多
AIM: To characterize the prevalence of subpopulations of CD4+ cells along with that of major inhibitor or stimulator cell types in therapy-nave childhood Crohn's disease (CD) and to test whether abnormalities of...AIM: To characterize the prevalence of subpopulations of CD4+ cells along with that of major inhibitor or stimulator cell types in therapy-nave childhood Crohn's disease (CD) and to test whether abnormalities of immune phenotype are normalized with the improvement of clinical signs and symptoms of disease. METHODS: We enrolled 26 pediatric patients with CD. 14 therapy-nave CD children; of those, 10 children remitted on conventional therapy and formed the remission group. We also tested another group of 12 chil-dren who relapsed with conventional therapy and were given infliximab; and 15 healthy children who served as controls. The prevalence of Th1 and Th2, nave and memory, activated and regulatory T cells, along with the members of innate immunity such as natural killer (NK), NK-T, myeloid and plasmocytoid dendritic cells (DCs), monocytes and Toll-like receptor (TLR)-2 and TLR-4 expression were determined in peripheral blood samples. RESULTS: Children with therapy-nave CD and those in relapse showed a decrease in Th1 cell prevalence. Simultaneously, an increased prevalence of memory and activated lymphocytes along with that of DCs and monocytes was observed. In addition, the ratio of myeloid /plasmocytoid DCs and the prevalence of TLR-2 or TLR-4 positive DCs and monocytes were also higher in therapy-nave CD than in controls. The majority of alterations diminished in remitted CD irrespective of whether remission was obtained by conventional or biological therapy. CONCLUSION: The finding that immune phenotype is normalized in remission suggests a link between immune phenotype and disease activity in childhood CD. Our observations support the involvement of members of the adaptive and innate immune systems in childhood CD.展开更多
Intrusion detection is the investigation process of information about the system activities or its data to detect any malicious behavior or unauthorized activity.Most of the IDS implement K-means clustering technique ...Intrusion detection is the investigation process of information about the system activities or its data to detect any malicious behavior or unauthorized activity.Most of the IDS implement K-means clustering technique due to its linear complexity and fast computing ability.Nonetheless,it is Naïve use of the mean data value for the cluster core that presents a major drawback.The chances of two circular clusters having different radius and centering at the same mean will occur.This condition cannot be addressed by the K-means algorithm because the mean value of the various clusters is very similar together.However,if the clusters are not spherical,it fails.To overcome this issue,a new integrated hybrid model by integrating expectation maximizing(EM)clustering using a Gaussian mixture model(GMM)and naïve Bays classifier have been proposed.In this model,GMM give more flexibility than K-Means in terms of cluster covariance.Also,they use probabilities function and soft clustering,that’s why they can have multiple cluster for a single data.In GMM,we can define the cluster form in GMM by two parameters:the mean and the standard deviation.This means that by using these two parameters,the cluster can take any kind of elliptical shape.EM-GMM will be used to cluster data based on data activity into the corresponding category.展开更多
This paper proposed an improved Naïve Bayes Classifier for sentimental analysis from a large-scale dataset such as in YouTube.YouTube contains large unstructured and unorganized comments and reactions,which carry...This paper proposed an improved Naïve Bayes Classifier for sentimental analysis from a large-scale dataset such as in YouTube.YouTube contains large unstructured and unorganized comments and reactions,which carry important information.Organizing large amounts of data and extracting useful information is a challenging task.The extracted information can be considered as new knowledge and can be used for deci sion-making.We extract comments from YouTube on videos and categorized them in domain-specific,and then apply the Naïve Bayes classifier with improved techniques.Our method provided a decent 80%accuracy in classifying those comments.This experiment shows that the proposed method provides excellent adaptability for large-scale text classification.展开更多
Classification model has received great attention in any domain of research and also a reliable tool for medical disease diagnosis. The domain of classification model is used in disease diagnosis, disease prediction, ...Classification model has received great attention in any domain of research and also a reliable tool for medical disease diagnosis. The domain of classification model is used in disease diagnosis, disease prediction, bio informatics, crime prediction and so on. However, an efficient disease diagnosis model was compromised the disease prediction. In this paper, a Rough Set Rule-based Multitude Classifier (RS-RMC) is developed to improve the disease prediction rate and enhance the class accuracy of disease being diagnosed. The RS-RMC involves two steps. Initially, a Rough Set model is used for Feature Selection aiming at minimizing the execution time for obtaining the disease feature set. A Multitude Classifier model is presented in second step for detection of heart disease and for efficient classification. The Na?ve Bayes Classifier algorithm is designed for efficient identification of classes to measure the relationship between disease features and improving disease prediction rate. Experimental analysis shows that RS-RMC is used to reduce the execution time for extracting the disease feature with minimum false positive rate compared to the state-of-the-art works.展开更多
:Social media data are rapidly increasing and constitute a source of user opinions and tips on a wide range of products and services.The increasing availability of such big data on biased reviews and blogs creates cha...:Social media data are rapidly increasing and constitute a source of user opinions and tips on a wide range of products and services.The increasing availability of such big data on biased reviews and blogs creates challenges for customers and businesses in reviewing all content in their decision-making process.To overcome this challenge,extracting suggestions from opinionated text is a possible solution.In this study,the characteristics of suggestions are analyzed and a suggestion mining extraction process is presented for classifying suggestive sentences from online customers’reviews.A classification using a word-embedding approach is used via the XGBoost classifier.The two datasets used in this experiment relate to online hotel reviews and Microsoft Windows App Studio discussion reviews.F1,precision,recall,and accuracy scores are calculated.The results demonstrated that the XGBoost classifier outperforms—with an accuracy of more than 80%.Moreover,the results revealed that suggestion keywords and phrases are the predominant features for suggestion extraction.Thus,this study contributes to knowledge and practice by comparing feature extraction classifiers and identifying XGBoost as a better suggestion mining process for identifying online reviews.展开更多
Syndrome differentiation is the core diagnosis method of Traditional Chinese Medicine(TCM).We propose a method that simulates syndrome differentiation through deductive reasoning on a knowledge graph to achieve automa...Syndrome differentiation is the core diagnosis method of Traditional Chinese Medicine(TCM).We propose a method that simulates syndrome differentiation through deductive reasoning on a knowledge graph to achieve automated diagnosis in TCM.We analyze the reasoning path patterns from symptom to syndromes on the knowledge graph.There are two kinds of path patterns in the knowledge graph:one-hop and two-hop.The one-hop path pattern maps the symptom to syndromes immediately.The two-hop path pattern maps the symptom to syndromes through the nature of disease,etiology,and pathomechanism to support the diagnostic reasoning.Considering the different support strengths for the knowledge paths in reasoning,we design a dynamic weight mechanism.We utilize Naïve Bayes and TF-IDF to implement the reasoning method and the weighted score calculation.The proposed method reasons the syndrome results by calculating the possibility according to the weighted score of the path in the knowledge graph based on the reasoning path patterns.We evaluate the method with clinical records and clinical practice in hospitals.The preliminary results suggest that the method achieves high performance and can help TCM doctors make better diagnosis decisions in practice.Meanwhile,the method is robust and explainable under the guide of the knowledge graph.It could help TCM physicians,especially primary physicians in rural areas,and provide clinical decision support in clinical practice.展开更多
文摘The naïve Bayes classifier is one of the commonly used data mining methods for classification.Despite its simplicity,naïve Bayes is effective and computationally efficient.Although the strong attribute independence assumption in the naïve Bayes classifier makes it a tractable method for learning,this assumption may not hold in real-world applications.Many enhancements to the basic algorithm have been proposed in order to alleviate the violation of attribute independence assumption.While these methods improve the classification performance,they do not necessarily retain the mathematical structure of the naïve Bayes model and some at the expense of computational time.One approach to reduce the naïvetéof the classifier is to incorporate attribute weights in the conditional probability.In this paper,we proposed a method to incorporate attribute weights to naïve Bayes.To evaluate the performance of our method,we used the public benchmark datasets.We compared our method with the standard naïve Bayes and baseline attribute weighting methods.Experimental results show that our method to incorporate attribute weights improves the classification performance compared to both standard naïve Bayes and baseline attribute weighting methods in terms of classification accuracy and F1,especially when the independence assumption is strongly violated,which was validated using the Chi-square test of independence.
基金This work was supported by a Ulucu PhD studentshipY.Jin is funded by an Alexander von Humboldt Professorship for Artificial Intelligence endowed by the German Federal Ministry of Education and Research.
文摘Neural architecture search(NAS)has become increasingly popular in the deep learning community recently,mainly because it can provide an opportunity to allow interested users without rich expertise to benefit from the success of deep neural networks(DNNs).However,NAS is still laborious and time-consuming because a large number of performance estimations are required during the search process of NAS,and training DNNs is computationally intensive.To solve this major limitation of NAS,improving the computational efficiency is essential in the design of NAS.However,a systematic overview of computationally efficient NAS(CE-NAS)methods still lacks.To fill this gap,we provide a comprehensive survey of the state-of-the-art on CE-NAS by categorizing the existing work into proxy-based and surrogate-assisted NAS methods,together with a thorough discussion of their design principles and a quantitative comparison of their performances and computational complexities.The remaining challenges and open research questions are also discussed,and promising research topics in this emerging field are suggested.
文摘<span style="font-family:Verdana;">The presence of bearing faults reduces the efficiency of rotating machines and thus increases energy consumption or even the total stoppage of the machine. </span><span style="font-family:Verdana;">It becomes essential to correctly diagnose the fault caused by the bearing.</span><span style="font-family:Verdana;"> Hence the importance of determining an effective features extraction method that best describes the fault. The vision of this paper is to merge the features selection methods in order to define the most relevant featuresin the texture </span><span style="font-family:Verdana;">of the vibration signal images. In this study, the Gray Level Co-occurrence </span><span style="font-family:Verdana;">Matrix (GLCM) in texture analysis is applied on the vibration signal represented in images. Features</span><span><span><span style="font-family:;" "=""> </span></span></span><span><span><span style="font-family:;" "=""><span style="font-family:Verdana;">selection based on the merge of PCA (Principal component Analysis) method and SFE (Sequential Features Extraction) method is </span><span style="font-family:Verdana;">done to obtain the most relevant features. The multiclass-Na<span style="white-space:nowrap;">?</span>ve Bayesclassifi</span><span style="font-family:Verdana;">er is used to test the proposed approach. The success rate of this classification is 98.27%. The relevant features obtained give promising results and are more efficient than the methods observed in the literature.</span></span></span></span>
文摘As the importance of email increases,the amount of malicious email is also increasing,so the need for malicious email filtering is growing.Since it is more economical to combine commodity hardware consisting of a medium server or PC with a virtual environment to use as a single server resource and filter malicious email using machine learning techniques,we used a Hadoop MapReduce framework and Naïve Bayes among machine learning methods for malicious email filtering.Naïve Bayes was selected because it is one of the top machine learning methods(Support Vector Machine(SVM),Naïve Bayes,K-Nearest Neighbor(KNN),and Decision Tree)in terms of execution time and accuracy.Malicious email was filtered with MapReduce programming using the Naïve Bayes technique,which is a supervised machine learning method,in a Hadoop framework with optimized performance and also with the Python program technique with the Naïve Bayes technique applied in a bare metal server environment with the Hadoop environment not applied.According to the results of a comparison of the accuracy and predictive error rates of the two methods,the Hadoop MapReduce Naïve Bayes method improved the accuracy of spam and ham email identification 1.11 times and the prediction error rate 14.13 times compared to the non-Hadoop Python Naïve Bayes method.
文摘AIM: To characterize the prevalence of subpopulations of CD4+ cells along with that of major inhibitor or stimulator cell types in therapy-nave childhood Crohn's disease (CD) and to test whether abnormalities of immune phenotype are normalized with the improvement of clinical signs and symptoms of disease. METHODS: We enrolled 26 pediatric patients with CD. 14 therapy-nave CD children; of those, 10 children remitted on conventional therapy and formed the remission group. We also tested another group of 12 chil-dren who relapsed with conventional therapy and were given infliximab; and 15 healthy children who served as controls. The prevalence of Th1 and Th2, nave and memory, activated and regulatory T cells, along with the members of innate immunity such as natural killer (NK), NK-T, myeloid and plasmocytoid dendritic cells (DCs), monocytes and Toll-like receptor (TLR)-2 and TLR-4 expression were determined in peripheral blood samples. RESULTS: Children with therapy-nave CD and those in relapse showed a decrease in Th1 cell prevalence. Simultaneously, an increased prevalence of memory and activated lymphocytes along with that of DCs and monocytes was observed. In addition, the ratio of myeloid /plasmocytoid DCs and the prevalence of TLR-2 or TLR-4 positive DCs and monocytes were also higher in therapy-nave CD than in controls. The majority of alterations diminished in remitted CD irrespective of whether remission was obtained by conventional or biological therapy. CONCLUSION: The finding that immune phenotype is normalized in remission suggests a link between immune phenotype and disease activity in childhood CD. Our observations support the involvement of members of the adaptive and innate immune systems in childhood CD.
文摘Intrusion detection is the investigation process of information about the system activities or its data to detect any malicious behavior or unauthorized activity.Most of the IDS implement K-means clustering technique due to its linear complexity and fast computing ability.Nonetheless,it is Naïve use of the mean data value for the cluster core that presents a major drawback.The chances of two circular clusters having different radius and centering at the same mean will occur.This condition cannot be addressed by the K-means algorithm because the mean value of the various clusters is very similar together.However,if the clusters are not spherical,it fails.To overcome this issue,a new integrated hybrid model by integrating expectation maximizing(EM)clustering using a Gaussian mixture model(GMM)and naïve Bays classifier have been proposed.In this model,GMM give more flexibility than K-Means in terms of cluster covariance.Also,they use probabilities function and soft clustering,that’s why they can have multiple cluster for a single data.In GMM,we can define the cluster form in GMM by two parameters:the mean and the standard deviation.This means that by using these two parameters,the cluster can take any kind of elliptical shape.EM-GMM will be used to cluster data based on data activity into the corresponding category.
文摘This paper proposed an improved Naïve Bayes Classifier for sentimental analysis from a large-scale dataset such as in YouTube.YouTube contains large unstructured and unorganized comments and reactions,which carry important information.Organizing large amounts of data and extracting useful information is a challenging task.The extracted information can be considered as new knowledge and can be used for deci sion-making.We extract comments from YouTube on videos and categorized them in domain-specific,and then apply the Naïve Bayes classifier with improved techniques.Our method provided a decent 80%accuracy in classifying those comments.This experiment shows that the proposed method provides excellent adaptability for large-scale text classification.
文摘Classification model has received great attention in any domain of research and also a reliable tool for medical disease diagnosis. The domain of classification model is used in disease diagnosis, disease prediction, bio informatics, crime prediction and so on. However, an efficient disease diagnosis model was compromised the disease prediction. In this paper, a Rough Set Rule-based Multitude Classifier (RS-RMC) is developed to improve the disease prediction rate and enhance the class accuracy of disease being diagnosed. The RS-RMC involves two steps. Initially, a Rough Set model is used for Feature Selection aiming at minimizing the execution time for obtaining the disease feature set. A Multitude Classifier model is presented in second step for detection of heart disease and for efficient classification. The Na?ve Bayes Classifier algorithm is designed for efficient identification of classes to measure the relationship between disease features and improving disease prediction rate. Experimental analysis shows that RS-RMC is used to reduce the execution time for extracting the disease feature with minimum false positive rate compared to the state-of-the-art works.
基金This research is funded by Taif University, TURSP-2020/115.
文摘:Social media data are rapidly increasing and constitute a source of user opinions and tips on a wide range of products and services.The increasing availability of such big data on biased reviews and blogs creates challenges for customers and businesses in reviewing all content in their decision-making process.To overcome this challenge,extracting suggestions from opinionated text is a possible solution.In this study,the characteristics of suggestions are analyzed and a suggestion mining extraction process is presented for classifying suggestive sentences from online customers’reviews.A classification using a word-embedding approach is used via the XGBoost classifier.The two datasets used in this experiment relate to online hotel reviews and Microsoft Windows App Studio discussion reviews.F1,precision,recall,and accuracy scores are calculated.The results demonstrated that the XGBoost classifier outperforms—with an accuracy of more than 80%.Moreover,the results revealed that suggestion keywords and phrases are the predominant features for suggestion extraction.Thus,this study contributes to knowledge and practice by comparing feature extraction classifiers and identifying XGBoost as a better suggestion mining process for identifying online reviews.
基金This work is supported by the National Key Research and Development Program of China under Grant 2017YFB1002304the China Scholarship Council under Grant 201906465021.
文摘Syndrome differentiation is the core diagnosis method of Traditional Chinese Medicine(TCM).We propose a method that simulates syndrome differentiation through deductive reasoning on a knowledge graph to achieve automated diagnosis in TCM.We analyze the reasoning path patterns from symptom to syndromes on the knowledge graph.There are two kinds of path patterns in the knowledge graph:one-hop and two-hop.The one-hop path pattern maps the symptom to syndromes immediately.The two-hop path pattern maps the symptom to syndromes through the nature of disease,etiology,and pathomechanism to support the diagnostic reasoning.Considering the different support strengths for the knowledge paths in reasoning,we design a dynamic weight mechanism.We utilize Naïve Bayes and TF-IDF to implement the reasoning method and the weighted score calculation.The proposed method reasons the syndrome results by calculating the possibility according to the weighted score of the path in the knowledge graph based on the reasoning path patterns.We evaluate the method with clinical records and clinical practice in hospitals.The preliminary results suggest that the method achieves high performance and can help TCM doctors make better diagnosis decisions in practice.Meanwhile,the method is robust and explainable under the guide of the knowledge graph.It could help TCM physicians,especially primary physicians in rural areas,and provide clinical decision support in clinical practice.