Classification model has received great attention in any domain of research and also a reliable tool for medical disease diagnosis. The domain of classification model is used in disease diagnosis, disease prediction, ...Classification model has received great attention in any domain of research and also a reliable tool for medical disease diagnosis. The domain of classification model is used in disease diagnosis, disease prediction, bio informatics, crime prediction and so on. However, an efficient disease diagnosis model was compromised the disease prediction. In this paper, a Rough Set Rule-based Multitude Classifier (RS-RMC) is developed to improve the disease prediction rate and enhance the class accuracy of disease being diagnosed. The RS-RMC involves two steps. Initially, a Rough Set model is used for Feature Selection aiming at minimizing the execution time for obtaining the disease feature set. A Multitude Classifier model is presented in second step for detection of heart disease and for efficient classification. The Na?ve Bayes Classifier algorithm is designed for efficient identification of classes to measure the relationship between disease features and improving disease prediction rate. Experimental analysis shows that RS-RMC is used to reduce the execution time for extracting the disease feature with minimum false positive rate compared to the state-of-the-art works.展开更多
Roman Urdu has been used for text messaging over the Internet for years especially in Indo-Pak Subcontinent.Persons from the subcontinent may speak the same Urdu language but they might be using different scripts for ...Roman Urdu has been used for text messaging over the Internet for years especially in Indo-Pak Subcontinent.Persons from the subcontinent may speak the same Urdu language but they might be using different scripts for writing.The communication using the Roman characters,which are used in the script of Urdu language on social media,is now considered the most typical standard of communication in an Indian landmass that makes it an expensive information supply.English Text classification is a solved problem but there have been only a few efforts to examine the rich information supply of Roman Urdu in the past.This is due to the numerous complexities involved in the processing of Roman Urdu data.The complexities associated with Roman Urdu include the non-availability of the tagged corpus,lack of a set of rules,and lack of standardized spellings.A large amount of Roman Urdu news data is available on mainstream news websites and social media websites like Facebook,Twitter but meaningful information can only be extracted if data is in a structured format.We have developed a Roman Urdu news headline classifier,which will help to classify news into relevant categories on which further analysis and modeling can be done.The author of this research aims to develop the Roman Urdu news classifier,which will classify the news into five categories(health,business,technology,sports,international).First,we will develop the news dataset using scraping tools and then after preprocessing,we will compare the results of different machine learning algorithms like Logistic Regression(LR),Multinomial Naïve Bayes(MNB),Long short term memory(LSTM),and Convolutional Neural Network(CNN).After this,we will use a phonetic algorithm to control lexical variation and test news from different websites.The preliminary results suggest that a more accurate classification can be accomplished by monitoring noise inside data and by classifying the news.After applying above mentioned different machine learning algorithms,results have shown that Multinomial Naïve Bayes classifier is giving the best accuracy of 90.17%which is due to the noise lexical variation.展开更多
The Washington,DC crash statistic report for the period from 2013 to 2015 shows that the city recorded about 41789 crashes at unsignalized intersections,which resulted in 14168 injuries and 51 fatalities.The economic ...The Washington,DC crash statistic report for the period from 2013 to 2015 shows that the city recorded about 41789 crashes at unsignalized intersections,which resulted in 14168 injuries and 51 fatalities.The economic cost of these fatalities has been estimated to be in the millions of dollars.It is therefore necessary to investigate the predictability of the occurrence of theses crashes,based on pertinent factors,in order to provide mitigating measures.This research focused on the development of models to predict the injury severity of crashes using support vector machines(SVMs)and Gaussian naïve Bayes classifiers(GNBCs).The models were developed based on 3307 crashes that occurred from 2008 to 2015.Eight SVM models and a GNBC model were developed.The most accurate model was the SVM with a radial basis kernel function.This model predicted the severity of an injury sustained in a crash with an accuracy of approximately 83.2%.The GNBC produced the worst-performing model with an accuracy of 48.5%.These models will enable transport officials to identify crash-prone unsignalized intersections to provide the necessary countermeasures beforehand.展开更多
提出了一种新的嵌入高斯混合模型(GMM,Gaussian Mixture Model)遥感影像朴素贝叶斯网络模型GMM-NBC(GMMbased Na ve Bayesian Classifier)。针对连续型朴素贝叶斯网络分类器中假设地物服从单一高斯分布的缺点,该方法将地物在特征空间的...提出了一种新的嵌入高斯混合模型(GMM,Gaussian Mixture Model)遥感影像朴素贝叶斯网络模型GMM-NBC(GMMbased Na ve Bayesian Classifier)。针对连续型朴素贝叶斯网络分类器中假设地物服从单一高斯分布的缺点,该方法将地物在特征空间的分布用高斯混合模型来模拟,用改进EM算法自动获取高斯混合模型的参数;高斯混合模型整体作为一个子节点嵌入朴素贝叶斯网络中,将其输出作为节点(特征)的中间类后验概率,在朴素贝叶斯网络的框架下进行融合获得最终的类后验概率。对多光谱和高光谱数据的分类实验结果表明,该方法较传统贝叶斯分类器分类效果要好,且有较强的鲁棒性。展开更多
朴素贝叶斯分类器是一种简单高效的分类算法,但其属性独立性假设影响了分类效果。通过放松朴素贝叶斯假设可以增强朴素贝叶斯的分类效果,但是通常会导致计算代价大幅提高。针对以上问题,提出了一种基于粗糙集的特征加权朴素贝叶斯算法,...朴素贝叶斯分类器是一种简单高效的分类算法,但其属性独立性假设影响了分类效果。通过放松朴素贝叶斯假设可以增强朴素贝叶斯的分类效果,但是通常会导致计算代价大幅提高。针对以上问题,提出了一种基于粗糙集的特征加权朴素贝叶斯算法,加权参数直接从训练数据中学习得到,可以看作是计算某个后验概率时,某个特征对于该类别的影响程度。将该分类算法与朴素贝叶斯分类器(na ve bayesian classifier,NB)、贝叶斯网(bayes networks)和NBTree分类器进行实验比较。结果表明:在大多数数据集上,FWNB分类器在较小的计算代价下,具有较高的分类正确率。展开更多
Hazards and disasters have always negative impacts on the way of life.Landslide is an overwhelming natural as well as man-made disaster that causes loss of natural resources and human properties throughout theworld.Th...Hazards and disasters have always negative impacts on the way of life.Landslide is an overwhelming natural as well as man-made disaster that causes loss of natural resources and human properties throughout theworld.The present study aimed to assess and compare the prediction efficiency of different models in landslide susceptibility in the Kysuca river basin,Slovakia.In this regard,the fuzzy decision-making trial and evaluation laboratory combining with the analytic network process(FDEMATEL-ANP),Naïve Bayes(NB)classifier,and random forest(RF)classifier were considered.Initially,a landslide inventory map was produced with 2000 landslide and nonlandslide points by randomly dividedwith a ratio of 70%:30%for training and testing,respectively.The geospatial database for assessing the landslide susceptibility was generated with the help of 16 landslide conditioning factors by allowing for topographical,hydrological,lithological,and land cover factors.The ReliefF methodwas considered for determining the significance of selected conditioning factors and inclusion in the model building.Consequently,the landslide susceptibility maps(LSMs)were generated using the FDEMATEL-ANP,Naïve Bayes(NB)classifier,and random forest(RF)classifier models.Finally,the area under curve(AUC)and different arithmetic evaluation were used for validating and comparing the results and models.The results revealed that random forest(RF)classifier is a promising and optimum model for landslide susceptibility in the study area with a very high value of area under curve(AUC=0.954),lower value of mean absolute error(MAE=0.1238)and root mean square error(RMSE=0.2555),and higher value of Kappa index(K=0.8435)and overall accuracy(OAC=92.2%).展开更多
文摘Classification model has received great attention in any domain of research and also a reliable tool for medical disease diagnosis. The domain of classification model is used in disease diagnosis, disease prediction, bio informatics, crime prediction and so on. However, an efficient disease diagnosis model was compromised the disease prediction. In this paper, a Rough Set Rule-based Multitude Classifier (RS-RMC) is developed to improve the disease prediction rate and enhance the class accuracy of disease being diagnosed. The RS-RMC involves two steps. Initially, a Rough Set model is used for Feature Selection aiming at minimizing the execution time for obtaining the disease feature set. A Multitude Classifier model is presented in second step for detection of heart disease and for efficient classification. The Na?ve Bayes Classifier algorithm is designed for efficient identification of classes to measure the relationship between disease features and improving disease prediction rate. Experimental analysis shows that RS-RMC is used to reduce the execution time for extracting the disease feature with minimum false positive rate compared to the state-of-the-art works.
基金This work is supported by the KIAS(Research Number:CG076601)and in part by Sejong University Faculty Research Fund.
文摘Roman Urdu has been used for text messaging over the Internet for years especially in Indo-Pak Subcontinent.Persons from the subcontinent may speak the same Urdu language but they might be using different scripts for writing.The communication using the Roman characters,which are used in the script of Urdu language on social media,is now considered the most typical standard of communication in an Indian landmass that makes it an expensive information supply.English Text classification is a solved problem but there have been only a few efforts to examine the rich information supply of Roman Urdu in the past.This is due to the numerous complexities involved in the processing of Roman Urdu data.The complexities associated with Roman Urdu include the non-availability of the tagged corpus,lack of a set of rules,and lack of standardized spellings.A large amount of Roman Urdu news data is available on mainstream news websites and social media websites like Facebook,Twitter but meaningful information can only be extracted if data is in a structured format.We have developed a Roman Urdu news headline classifier,which will help to classify news into relevant categories on which further analysis and modeling can be done.The author of this research aims to develop the Roman Urdu news classifier,which will classify the news into five categories(health,business,technology,sports,international).First,we will develop the news dataset using scraping tools and then after preprocessing,we will compare the results of different machine learning algorithms like Logistic Regression(LR),Multinomial Naïve Bayes(MNB),Long short term memory(LSTM),and Convolutional Neural Network(CNN).After this,we will use a phonetic algorithm to control lexical variation and test news from different websites.The preliminary results suggest that a more accurate classification can be accomplished by monitoring noise inside data and by classifying the news.After applying above mentioned different machine learning algorithms,results have shown that Multinomial Naïve Bayes classifier is giving the best accuracy of 90.17%which is due to the noise lexical variation.
文摘The Washington,DC crash statistic report for the period from 2013 to 2015 shows that the city recorded about 41789 crashes at unsignalized intersections,which resulted in 14168 injuries and 51 fatalities.The economic cost of these fatalities has been estimated to be in the millions of dollars.It is therefore necessary to investigate the predictability of the occurrence of theses crashes,based on pertinent factors,in order to provide mitigating measures.This research focused on the development of models to predict the injury severity of crashes using support vector machines(SVMs)and Gaussian naïve Bayes classifiers(GNBCs).The models were developed based on 3307 crashes that occurred from 2008 to 2015.Eight SVM models and a GNBC model were developed.The most accurate model was the SVM with a radial basis kernel function.This model predicted the severity of an injury sustained in a crash with an accuracy of approximately 83.2%.The GNBC produced the worst-performing model with an accuracy of 48.5%.These models will enable transport officials to identify crash-prone unsignalized intersections to provide the necessary countermeasures beforehand.
文摘提出了一种新的嵌入高斯混合模型(GMM,Gaussian Mixture Model)遥感影像朴素贝叶斯网络模型GMM-NBC(GMMbased Na ve Bayesian Classifier)。针对连续型朴素贝叶斯网络分类器中假设地物服从单一高斯分布的缺点,该方法将地物在特征空间的分布用高斯混合模型来模拟,用改进EM算法自动获取高斯混合模型的参数;高斯混合模型整体作为一个子节点嵌入朴素贝叶斯网络中,将其输出作为节点(特征)的中间类后验概率,在朴素贝叶斯网络的框架下进行融合获得最终的类后验概率。对多光谱和高光谱数据的分类实验结果表明,该方法较传统贝叶斯分类器分类效果要好,且有较强的鲁棒性。
文摘朴素贝叶斯分类器是一种简单高效的分类算法,但其属性独立性假设影响了分类效果。通过放松朴素贝叶斯假设可以增强朴素贝叶斯的分类效果,但是通常会导致计算代价大幅提高。针对以上问题,提出了一种基于粗糙集的特征加权朴素贝叶斯算法,加权参数直接从训练数据中学习得到,可以看作是计算某个后验概率时,某个特征对于该类别的影响程度。将该分类算法与朴素贝叶斯分类器(na ve bayesian classifier,NB)、贝叶斯网(bayes networks)和NBTree分类器进行实验比较。结果表明:在大多数数据集上,FWNB分类器在较小的计算代价下,具有较高的分类正确率。
文摘Hazards and disasters have always negative impacts on the way of life.Landslide is an overwhelming natural as well as man-made disaster that causes loss of natural resources and human properties throughout theworld.The present study aimed to assess and compare the prediction efficiency of different models in landslide susceptibility in the Kysuca river basin,Slovakia.In this regard,the fuzzy decision-making trial and evaluation laboratory combining with the analytic network process(FDEMATEL-ANP),Naïve Bayes(NB)classifier,and random forest(RF)classifier were considered.Initially,a landslide inventory map was produced with 2000 landslide and nonlandslide points by randomly dividedwith a ratio of 70%:30%for training and testing,respectively.The geospatial database for assessing the landslide susceptibility was generated with the help of 16 landslide conditioning factors by allowing for topographical,hydrological,lithological,and land cover factors.The ReliefF methodwas considered for determining the significance of selected conditioning factors and inclusion in the model building.Consequently,the landslide susceptibility maps(LSMs)were generated using the FDEMATEL-ANP,Naïve Bayes(NB)classifier,and random forest(RF)classifier models.Finally,the area under curve(AUC)and different arithmetic evaluation were used for validating and comparing the results and models.The results revealed that random forest(RF)classifier is a promising and optimum model for landslide susceptibility in the study area with a very high value of area under curve(AUC=0.954),lower value of mean absolute error(MAE=0.1238)and root mean square error(RMSE=0.2555),and higher value of Kappa index(K=0.8435)and overall accuracy(OAC=92.2%).