The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parall...The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parallel naive Bayes algorithm(PNBA)for Chinese text classification based on Spark,a parallel memory computing platform for big data.This algorithm has implemented parallel operation throughout the entire training and prediction process of naive Bayes classifier mainly by adopting the programming model of resilient distributed datasets(RDD).For comparison,a PNBA based on Hadoop is also implemented.The test results show that in the same computing environment and for the same text sets,the Spark PNBA is obviously superior to the Hadoop PNBA in terms of key indicators such as speedup ratio and scalability.Therefore,Spark-based parallel algorithms can better meet the requirement of large-scale Chinese text data mining.展开更多
Landslide hazard mapping is a fundamental tool for disaster management activities in Loess terrains. Aiming at major issues with these landslide hazard assessment methods based on Naive Bayesian classification techniq...Landslide hazard mapping is a fundamental tool for disaster management activities in Loess terrains. Aiming at major issues with these landslide hazard assessment methods based on Naive Bayesian classification technique, which is difficult in quantifying those uncertain triggering factors, the main purpose of this work is to evaluate the predictive power of landslide spatial models based on uncertain Naive Bayesian classification method in Baota district of Yan'an city in Shaanxi province, China. Firstly, thematic maps representing various factors that are related to landslide activity were generated. Secondly, by using field data and GIS techniques, a landslide hazard map was performed. To improve the accuracy of the resulting landslide hazard map, the strategies were designed, which quantified the uncertain triggering factor to design landslide spatial models based on uncertain Naive Bayesian classification method named NBU algorithm. The accuracies of the area under relative operating characteristics curves(AUC) in NBU and Naive Bayesian algorithm are 87.29% and 82.47% respectively. Thus, NBU algorithm can be used efficiently for landslide hazard analysis and might be widely used for the prediction of various spatial events based on uncertain classification technique.展开更多
基金Project(KC18071)supported by the Application Foundation Research Program of Xuzhou,ChinaProjects(2017YFC0804401,2017YFC0804409)supported by the National Key R&D Program of China
文摘The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parallel naive Bayes algorithm(PNBA)for Chinese text classification based on Spark,a parallel memory computing platform for big data.This algorithm has implemented parallel operation throughout the entire training and prediction process of naive Bayes classifier mainly by adopting the programming model of resilient distributed datasets(RDD).For comparison,a PNBA based on Hadoop is also implemented.The test results show that in the same computing environment and for the same text sets,the Spark PNBA is obviously superior to the Hadoop PNBA in terms of key indicators such as speedup ratio and scalability.Therefore,Spark-based parallel algorithms can better meet the requirement of large-scale Chinese text data mining.
基金Projects(41362015,51164012) supported by the National Natural Science Foundation of ChinaProject(2012AA061901) supported by the National High-tech Research and Development Program of China
文摘Landslide hazard mapping is a fundamental tool for disaster management activities in Loess terrains. Aiming at major issues with these landslide hazard assessment methods based on Naive Bayesian classification technique, which is difficult in quantifying those uncertain triggering factors, the main purpose of this work is to evaluate the predictive power of landslide spatial models based on uncertain Naive Bayesian classification method in Baota district of Yan'an city in Shaanxi province, China. Firstly, thematic maps representing various factors that are related to landslide activity were generated. Secondly, by using field data and GIS techniques, a landslide hazard map was performed. To improve the accuracy of the resulting landslide hazard map, the strategies were designed, which quantified the uncertain triggering factor to design landslide spatial models based on uncertain Naive Bayesian classification method named NBU algorithm. The accuracies of the area under relative operating characteristics curves(AUC) in NBU and Naive Bayesian algorithm are 87.29% and 82.47% respectively. Thus, NBU algorithm can be used efficiently for landslide hazard analysis and might be widely used for the prediction of various spatial events based on uncertain classification technique.