针对超像素分割算法中普遍存在的过分割问题,结合Mean Shift算法和非参数贝叶斯聚类模型,提出了一种新的图像分割算法MS-BRM(Mean Shift based Bayesian Region Merging)。首先,利用Mean Shift算法对图像进行超像素分割,然后根据非参数...针对超像素分割算法中普遍存在的过分割问题,结合Mean Shift算法和非参数贝叶斯聚类模型,提出了一种新的图像分割算法MS-BRM(Mean Shift based Bayesian Region Merging)。首先,利用Mean Shift算法对图像进行超像素分割,然后根据非参数贝叶斯聚类模型,融合超像素的空间信息,提出一种区域合并策略对超像素进行合并,得到了最终的分割结果。实验结果表明,MS-BRM算法改善了超像素的过分割问题,对图像进行分割的结果保留了图像的边界信息,更加符合人类视觉的判断结果。展开更多
Combined with the characteristics of crop growth and environmental data and the basic principle of Bayesian algorithm,the crop product quality is analyzed and forecasted in this study.Test with a randomly selected sam...Combined with the characteristics of crop growth and environmental data and the basic principle of Bayesian algorithm,the crop product quality is analyzed and forecasted in this study.Test with a randomly selected sample group ensures high forecasting accuracy,which shows that the algorithm is effective.展开更多
According to the current research status of component retrieval, the component description model based on facet classification is improved by adding semantic features. Furthermore, the component retrieval process mode...According to the current research status of component retrieval, the component description model based on facet classification is improved by adding semantic features. Furthermore, the component retrieval process model is put forward by combining the domain ontology with the relative concept match algorithm. A detailed illustration of a component reasoning engine and a component classification engine is given and the component classification algorithm is provided by using the Naive Bayes algorithm based on domain ontology. The experimental results show that the recall ratio and the precision ratio are obviously improved by using the method based on semantics, and demonstrate the feasibility and effectiveness of the proposed method.展开更多
Presented is a multiple model soft sensing method based on Affinity Propagation (AP), Gaussian process (GP) and Bayesian committee machine (BCM). AP clustering arithmetic is used to cluster training samples acco...Presented is a multiple model soft sensing method based on Affinity Propagation (AP), Gaussian process (GP) and Bayesian committee machine (BCM). AP clustering arithmetic is used to cluster training samples according to their operating points. Then, the sub-models are estimated by Gaussian Process Regression (GPR). Finally, in order to get a global probabilistic prediction, Bayesian committee mactnne is used to combine the outputs of the sub-estimators. The proposed method has been applied to predict the light naphtha end point in hydrocracker fractionators. Practical applications indicate that it is useful for the online prediction of quality monitoring in chemical processes.展开更多
The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parall...The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parallel naive Bayes algorithm(PNBA)for Chinese text classification based on Spark,a parallel memory computing platform for big data.This algorithm has implemented parallel operation throughout the entire training and prediction process of naive Bayes classifier mainly by adopting the programming model of resilient distributed datasets(RDD).For comparison,a PNBA based on Hadoop is also implemented.The test results show that in the same computing environment and for the same text sets,the Spark PNBA is obviously superior to the Hadoop PNBA in terms of key indicators such as speedup ratio and scalability.Therefore,Spark-based parallel algorithms can better meet the requirement of large-scale Chinese text data mining.展开更多
An effective domain ontology automatically constructed is proposed in this paper. The main concept is using the Formal Concept Analysis to automatically establish domain ontology. Finally, the ontology is acted as the...An effective domain ontology automatically constructed is proposed in this paper. The main concept is using the Formal Concept Analysis to automatically establish domain ontology. Finally, the ontology is acted as the base for the Naive Bayes classifier to approve the effectiveness of the domain ontology for document classification. The 1752 documents divided into 10 categories are used to assess the effectiveness of the ontology, where 1252 and 500 documents are the training and testing documents, respectively. The Fl-measure is as the assessment criteria and the following three results are obtained. The average recall of Naive Bayes classifier is 0.94. Therefore, in recall, the performance of Naive Bayes classifier is excellent based on the automatically constructed ontology. The average precision of Naive Bayes classifier is 0.81. Therefore, in precision, the performance of Naive Bayes classifier is gored based on the automatically constructed ontology. The average Fl-measure for 10 categories by Naive Bayes classifier is 0.86. Therefore, the performance of Naive Bayes classifier is effective based on the automatically constructed ontology in the point of F 1-measure. Thus, the domain ontology automatically constructed could indeed be acted as the document categories to reach the effectiveness for document classification.展开更多
In this paper, we discuss building an information dissemination model based on individual behavior. We analyze the individual behavior related to information dissemination and the factors that affect the sharing behav...In this paper, we discuss building an information dissemination model based on individual behavior. We analyze the individual behavior related to information dissemination and the factors that affect the sharing behavior of individuals, and we define and quantify these factors. We consider these factors as characteristic attributes and use a Bayesian classifier to classify individuals. Considering the forwarding delay characteristics of information dissemination, we present a random time generation method that simulates the delay of information dissemination. Given time and other constraints, a user might not look at all the information that his/her friends published. Therefore, this paper proposes an algorithm to predict information visibility, i.e., it estimates the probability that an individual will see the information. Based on the classification of individual behavior and combined with our random time generation and information visibility prediction method, we propose an information dissemination model based on individual behavior. The model can be used to predict the scale and speed of information propagation. We use data sets from Sina Weibo to validate and analyze the prediction methods of the individual behavior and information dissemination model based on individual behavior. A previously proposedinformation dissemination model provides the foundation for a subsequent study on the evolution of the network and social network analysis. Predicting the scale and speed of information dissemination can also be used for public opinion monitoring.展开更多
An important problem in wireless communication networks (WCNs) is that they have a minimum number of resources, which leads to high-security threats. An approach to find and detect the attacks is the intrusion detecti...An important problem in wireless communication networks (WCNs) is that they have a minimum number of resources, which leads to high-security threats. An approach to find and detect the attacks is the intrusion detection system (IDS). In this paper, the fuzzy lion Bayes system (FLBS) is proposed for intrusion detection mechanism. Initially, the data set is grouped into a number of clusters by the fuzzy clustering algorithm. Here, the Naive Bayes classifier is integrated with the lion optimization algorithm and the new lion naive Bayes (LNB) is created for optimally generating the probability measures. Then, the LNB model is applied to each data group, and the aggregated data is generated. After generating the aggregated data, the LNB model is applied to the aggregated data, and the abnormal nodes are identified based on the posterior probability function. The performance of the proposed FLBS system is evaluated using the KDD Cup 99 data and the comparative analysis is performed by the existing methods for the evaluation metrics accuracy and false acceptance rate (FAR). From the experimental results, it can be shown that the proposed system has the maximum performance, which shows the effectiveness of the proposed system in the intrusion detection.展开更多
The performance of six statistical approaches,which can be used for selection of the best model to describe the growth of individual fish,was analyzed using simulated and real length-at-age data.The six approaches inc...The performance of six statistical approaches,which can be used for selection of the best model to describe the growth of individual fish,was analyzed using simulated and real length-at-age data.The six approaches include coefficient of determination(R2),adjusted coefficient of determination(adj.-R2),root mean squared error(RMSE),Akaike's information criterion(AIC),bias correction of AIC(AICc) and Bayesian information criterion(BIC).The simulation data were generated by five growth models with different numbers of parameters.Four sets of real data were taken from the literature.The parameters in each of the five growth models were estimated using the maximum likelihood method under the assumption of the additive error structure for the data.The best supported model by the data was identified using each of the six approaches.The results show that R2 and RMSE have the same properties and perform worst.The sample size has an effect on the performance of adj.-R2,AIC,AICc and BIC.Adj.-R2 does better in small samples than in large samples.AIC is not suitable to use in small samples and tends to select more complex model when the sample size becomes large.AICc and BIC have best performance in small and large sample cases,respectively.Use of AICc or BIC is recommended for selection of fish growth model according to the size of the length-at-age data.展开更多
Liver hydatid disease is a common parasitic disease in farm and pastoral areas, which seriously influences people's health. Based on CT imaging features of this disease, an iterative approach for liver segmentatio...Liver hydatid disease is a common parasitic disease in farm and pastoral areas, which seriously influences people's health. Based on CT imaging features of this disease, an iterative approach for liver segmentation and hydatid lesion extraction simultaneously is proposed. In each iteration, our algorithm consists of two main steps: 1) according to the user-defined pixel seeds in the liver and hydatid lesion, Gaussian probability model fitting and smoothed Bayesian classification are applied to get initial segmentation of liver and lesion; 2) the parametric active contour model using priori shape force field is adopted to refine initial segmentation. We make subjective and objective evaluation on the proposed algorithm validity by the experiments of liver and hydatid lesion segmentation on different patients' CT slices. In comparison with ground-truth manual segmentation results, the experimental results show the effectiveness of our method to segment liver and hydatid lesion.展开更多
文摘针对超像素分割算法中普遍存在的过分割问题,结合Mean Shift算法和非参数贝叶斯聚类模型,提出了一种新的图像分割算法MS-BRM(Mean Shift based Bayesian Region Merging)。首先,利用Mean Shift算法对图像进行超像素分割,然后根据非参数贝叶斯聚类模型,融合超像素的空间信息,提出一种区域合并策略对超像素进行合并,得到了最终的分割结果。实验结果表明,MS-BRM算法改善了超像素的过分割问题,对图像进行分割的结果保留了图像的边界信息,更加符合人类视觉的判断结果。
基金Supported by Natural Science Fund in Hebei Province(F2009000653)Project of Science and Technology Bureau in Hebei Province(072135126)Project of Education Department in Hebei Province(Z2009122)~~
文摘Combined with the characteristics of crop growth and environmental data and the basic principle of Bayesian algorithm,the crop product quality is analyzed and forecasted in this study.Test with a randomly selected sample group ensures high forecasting accuracy,which shows that the algorithm is effective.
基金The National Natural Science Foundation of China(No60072006)
文摘According to the current research status of component retrieval, the component description model based on facet classification is improved by adding semantic features. Furthermore, the component retrieval process model is put forward by combining the domain ontology with the relative concept match algorithm. A detailed illustration of a component reasoning engine and a component classification engine is given and the component classification algorithm is provided by using the Naive Bayes algorithm based on domain ontology. The experimental results show that the recall ratio and the precision ratio are obviously improved by using the method based on semantics, and demonstrate the feasibility and effectiveness of the proposed method.
基金Supported by the National High Technology Research and Development Program of China (2006AA040309)National BasicResearch Program of China (2007CB714000)
文摘Presented is a multiple model soft sensing method based on Affinity Propagation (AP), Gaussian process (GP) and Bayesian committee machine (BCM). AP clustering arithmetic is used to cluster training samples according to their operating points. Then, the sub-models are estimated by Gaussian Process Regression (GPR). Finally, in order to get a global probabilistic prediction, Bayesian committee mactnne is used to combine the outputs of the sub-estimators. The proposed method has been applied to predict the light naphtha end point in hydrocracker fractionators. Practical applications indicate that it is useful for the online prediction of quality monitoring in chemical processes.
基金Project(KC18071)supported by the Application Foundation Research Program of Xuzhou,ChinaProjects(2017YFC0804401,2017YFC0804409)supported by the National Key R&D Program of China
文摘The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parallel naive Bayes algorithm(PNBA)for Chinese text classification based on Spark,a parallel memory computing platform for big data.This algorithm has implemented parallel operation throughout the entire training and prediction process of naive Bayes classifier mainly by adopting the programming model of resilient distributed datasets(RDD).For comparison,a PNBA based on Hadoop is also implemented.The test results show that in the same computing environment and for the same text sets,the Spark PNBA is obviously superior to the Hadoop PNBA in terms of key indicators such as speedup ratio and scalability.Therefore,Spark-based parallel algorithms can better meet the requirement of large-scale Chinese text data mining.
文摘An effective domain ontology automatically constructed is proposed in this paper. The main concept is using the Formal Concept Analysis to automatically establish domain ontology. Finally, the ontology is acted as the base for the Naive Bayes classifier to approve the effectiveness of the domain ontology for document classification. The 1752 documents divided into 10 categories are used to assess the effectiveness of the ontology, where 1252 and 500 documents are the training and testing documents, respectively. The Fl-measure is as the assessment criteria and the following three results are obtained. The average recall of Naive Bayes classifier is 0.94. Therefore, in recall, the performance of Naive Bayes classifier is excellent based on the automatically constructed ontology. The average precision of Naive Bayes classifier is 0.81. Therefore, in precision, the performance of Naive Bayes classifier is gored based on the automatically constructed ontology. The average Fl-measure for 10 categories by Naive Bayes classifier is 0.86. Therefore, the performance of Naive Bayes classifier is effective based on the automatically constructed ontology in the point of F 1-measure. Thus, the domain ontology automatically constructed could indeed be acted as the document categories to reach the effectiveness for document classification.
基金sponsored by the National Natural Science Foundation of China under grant number No. 61100008 the Natural Science Foundation of Heilongjiang Province of China under Grant No. LC2016024
文摘In this paper, we discuss building an information dissemination model based on individual behavior. We analyze the individual behavior related to information dissemination and the factors that affect the sharing behavior of individuals, and we define and quantify these factors. We consider these factors as characteristic attributes and use a Bayesian classifier to classify individuals. Considering the forwarding delay characteristics of information dissemination, we present a random time generation method that simulates the delay of information dissemination. Given time and other constraints, a user might not look at all the information that his/her friends published. Therefore, this paper proposes an algorithm to predict information visibility, i.e., it estimates the probability that an individual will see the information. Based on the classification of individual behavior and combined with our random time generation and information visibility prediction method, we propose an information dissemination model based on individual behavior. The model can be used to predict the scale and speed of information propagation. We use data sets from Sina Weibo to validate and analyze the prediction methods of the individual behavior and information dissemination model based on individual behavior. A previously proposedinformation dissemination model provides the foundation for a subsequent study on the evolution of the network and social network analysis. Predicting the scale and speed of information dissemination can also be used for public opinion monitoring.
文摘An important problem in wireless communication networks (WCNs) is that they have a minimum number of resources, which leads to high-security threats. An approach to find and detect the attacks is the intrusion detection system (IDS). In this paper, the fuzzy lion Bayes system (FLBS) is proposed for intrusion detection mechanism. Initially, the data set is grouped into a number of clusters by the fuzzy clustering algorithm. Here, the Naive Bayes classifier is integrated with the lion optimization algorithm and the new lion naive Bayes (LNB) is created for optimally generating the probability measures. Then, the LNB model is applied to each data group, and the aggregated data is generated. After generating the aggregated data, the LNB model is applied to the aggregated data, and the abnormal nodes are identified based on the posterior probability function. The performance of the proposed FLBS system is evaluated using the KDD Cup 99 data and the comparative analysis is performed by the existing methods for the evaluation metrics accuracy and false acceptance rate (FAR). From the experimental results, it can be shown that the proposed system has the maximum performance, which shows the effectiveness of the proposed system in the intrusion detection.
基金Supported by the High Technology Research and Development Program of China (863 Program,No2006AA100301)
文摘The performance of six statistical approaches,which can be used for selection of the best model to describe the growth of individual fish,was analyzed using simulated and real length-at-age data.The six approaches include coefficient of determination(R2),adjusted coefficient of determination(adj.-R2),root mean squared error(RMSE),Akaike's information criterion(AIC),bias correction of AIC(AICc) and Bayesian information criterion(BIC).The simulation data were generated by five growth models with different numbers of parameters.Four sets of real data were taken from the literature.The parameters in each of the five growth models were estimated using the maximum likelihood method under the assumption of the additive error structure for the data.The best supported model by the data was identified using each of the six approaches.The results show that R2 and RMSE have the same properties and perform worst.The sample size has an effect on the performance of adj.-R2,AIC,AICc and BIC.Adj.-R2 does better in small samples than in large samples.AIC is not suitable to use in small samples and tends to select more complex model when the sample size becomes large.AICc and BIC have best performance in small and large sample cases,respectively.Use of AICc or BIC is recommended for selection of fish growth model according to the size of the length-at-age data.
基金Science Special Fund for "Special Training" of Ethnical Minority Professional and Technical Intelligent in Xinjiang sponsored by the Scienceand Technology Department of Xinjiang Uygur Autonomous Regiongrant number:200723104+1 种基金National Natural Science Foundation of Chinagrant number:30960097
文摘Liver hydatid disease is a common parasitic disease in farm and pastoral areas, which seriously influences people's health. Based on CT imaging features of this disease, an iterative approach for liver segmentation and hydatid lesion extraction simultaneously is proposed. In each iteration, our algorithm consists of two main steps: 1) according to the user-defined pixel seeds in the liver and hydatid lesion, Gaussian probability model fitting and smoothed Bayesian classification are applied to get initial segmentation of liver and lesion; 2) the parametric active contour model using priori shape force field is adopted to refine initial segmentation. We make subjective and objective evaluation on the proposed algorithm validity by the experiments of liver and hydatid lesion segmentation on different patients' CT slices. In comparison with ground-truth manual segmentation results, the experimental results show the effectiveness of our method to segment liver and hydatid lesion.