In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore,...In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore, this paper proposes an improved affinity propagation clustering algorithm. First, add the subtraction clustering, using the density value of the data points to obtain the point of initial clusters. Then, calculate the similarity distance between the initial cluster points, and reference the idea of semi-supervised clustering, adding pairs restriction information, structure sparse similarity matrix. Finally, the cluster representative points conduct AP clustering until a suitable cluster division.Experimental results show that the algorithm allows the calculation is greatly reduced, the similarity matrix storage capacity is also reduced, and better than the original algorithm on the clustering effect and processing speed.展开更多
This paper proposes a clustering technique that minimizes the need for subjective human intervention and is based on elements of rough set theory (RST). The proposed algorithm is unified in its approach to clusterin...This paper proposes a clustering technique that minimizes the need for subjective human intervention and is based on elements of rough set theory (RST). The proposed algorithm is unified in its approach to clustering and makes use of both local and global data properties to obtain clustering solutions. It handles single-type and mixed attribute data sets with ease. The results from three data sets of single and mixed attribute types are used to illustrate the technique and establish its efficiency.展开更多
This paper presents a fully automatic segmentation method of liver CT scans using fuzzy c-mean clustering and level set. First, the contrast of original image is enhanced to make boundaries clearer;second, a spatial f...This paper presents a fully automatic segmentation method of liver CT scans using fuzzy c-mean clustering and level set. First, the contrast of original image is enhanced to make boundaries clearer;second, a spatial fuzzy c-mean clustering combining with anatomical prior knowledge is employed to extract liver region automatically;thirdly, a distance regularized level set is used for refinement;finally, morphological operations are used as post-processing. The experiment result shows that the method can achieve high accuracy (0.9986) and specificity (0.9989). Comparing with standard level set method, our method is more effective in dealing with over-segmentation problem.展开更多
The demand for individualized teaching from E-learning websites is rapidly increasing due to the huge differences existed among Web learners. A method for clustering Web learners based on rough set is proposed. The ba...The demand for individualized teaching from E-learning websites is rapidly increasing due to the huge differences existed among Web learners. A method for clustering Web learners based on rough set is proposed. The basic idea of the method is to reduce the learning attributes prior to clustering, and therefore the clustering of Web learners is carried out in a relative low-dimensional space. Using this method, the E-learning websites can arrange corresponding teaching content for different clusters of learners so that the learners’ individual requirements can be more satisfied. Key words rough set - attributes reduction - k-means clustering - individualized teaching CLC number TP 391.6 Foundation item: Supported by the National “863” Program of China (2002AA111010, 2003AA001032)Biography: LIU Shuai-dong (1979-), male, Master candidate, research direction: knowledge discovery and individualized learning techniques.展开更多
To investigate the judging problem of optimal dividing matrix among several fuzzy dividing matrices in fuzzy dividing space, correspondingly, which is determined by the various choices of cluster samples in the totali...To investigate the judging problem of optimal dividing matrix among several fuzzy dividing matrices in fuzzy dividing space, correspondingly, which is determined by the various choices of cluster samples in the totality sample space, two algorithms are proposed on the basis of the data analysis method in rough sets theory: information system discrete algorithm (algorithm 1) and samples representatives judging algorithm (algorithm 2). On the principle of the farthest distance, algorithm 1 transforms continuous data into discrete form which could be transacted by rough sets theory. Taking the approximate precision as a criterion, algorithm 2 chooses the sample space with a good representative. Hence, the clustering sample set in inducing and computing optimal dividing matrix can be achieved. Several theorems are proposed to provide strict theoretic foundations for the execution of the algorithm model. An applied example based on the new algorithm model is given, whose result verifies the feasibility of this new algorithm model.展开更多
In this paper, we applied the rough sets to the point cluster and river network selection. In order to meet the requirements of rough sets, first, we structuralize and quantify the spatial information of objects by co...In this paper, we applied the rough sets to the point cluster and river network selection. In order to meet the requirements of rough sets, first, we structuralize and quantify the spatial information of objects by convex hull, triangulated irregular network (TIN), Voronoi diagram, etc.;second, we manually assign decisional attributes to the information table according to conditional attributes. In doing so, the spatial information and attribute information are integrated together to evaluate the importance of points and rivers by rough sets theory. Finally, we select the point cluster and the river network in a progressive manner. The experimental results show that our method is valid and effective. In comparison with previous work, our method has the advantage to adaptively consider the spatial and attribute information at the same time without any a priori knowledge.展开更多
Partition-based clustering with weighted feature is developed in the framework of shadowed sets. The objects in the core and boundary regions, generated by shadowed sets-based clustering, have different impact on the ...Partition-based clustering with weighted feature is developed in the framework of shadowed sets. The objects in the core and boundary regions, generated by shadowed sets-based clustering, have different impact on the prototype of each cluster. By integrating feature weights, a formula for weight calculation is introduced to the clustering algorithm. The selection of weight exponent is crucial for good result and the weights are updated iteratively with each partition of clusters. The convergence of the weighted algorithms is given, and the feasible cluster validity indices of data mining application are utilized. Experimental results on both synthetic and real-life numerical data with different feature weights demonstrate that the weighted algorithm is better than the other unweighted algorithms.展开更多
Due to the limitation and hesitation in one's knowledge, the membership degree of an element to a given set usually has a few different values, in which the conventional fuzzy sets are invalid. Hesitant fuzzy sets ar...Due to the limitation and hesitation in one's knowledge, the membership degree of an element to a given set usually has a few different values, in which the conventional fuzzy sets are invalid. Hesitant fuzzy sets are a powerful tool to treat this case. The present paper focuses on investigating the clustering technique for hesitant fuzzy sets based on the K-means clustering algorithm which takes the results of hierarchical clustering as the initial clusters. Finally, two examples demonstrate the validity of our algorithm.展开更多
Intuitionistic fuzzy sets(IFSs) are useful means to describe and deal with vague and uncertain data.An intuitionistic fuzzy C-means algorithm to cluster IFSs is developed.In each stage of the intuitionistic fuzzy C-me...Intuitionistic fuzzy sets(IFSs) are useful means to describe and deal with vague and uncertain data.An intuitionistic fuzzy C-means algorithm to cluster IFSs is developed.In each stage of the intuitionistic fuzzy C-means method the seeds are modified,and for each IFS a membership degree to each of the clusters is estimated.In the end of the algorithm,all the given IFSs are clustered according to the estimated membership degrees.Furthermore,the algorithm is extended for clustering interval-valued intuitionistic fuzzy sets(IVIFSs).Finally,the developed algorithms are illustrated through conducting experiments on both the real-world and simulated data sets.展开更多
This paper uses Gaussian interval type-2 fuzzy se theory on historical traffic volume data processing to obtain a 24-hour prediction of traffic volume with high precision. A K-means clustering method is used in this p...This paper uses Gaussian interval type-2 fuzzy se theory on historical traffic volume data processing to obtain a 24-hour prediction of traffic volume with high precision. A K-means clustering method is used in this paper to get 5 minutes traffic volume variation as input data for the Gaussian interval type-2 fuzzy sets which can reflect the distribution of historical traffic volume in one statistical period. Moreover, the cluster with the largest collection of data obtained by K-means clustering method is calculated to get the key parameters of type-2 fuzzy sets, mean and standard deviation of the Gaussian membership function.Using the range of data as the input of Gaussian interval type-2 fuzzy sets leads to the range of traffic volume forecasting output with the ability of describing the possible range of the traffic volume as well as the traffic volume prediction data with high accuracy. The simulation results show that the average relative error is reduced to 8% based on the combined K-means Gaussian interval type-2 fuzzy sets forecasting method. The fluctuation range in terms of an upper and a lower forecasting traffic volume completely envelopes the actual traffic volume and reproduces the fluctuation range of traffic flow.展开更多
Intuitionistic fuzzy set (IFS) is a set of 2-tuple arguments, each of which is characterized by a membership degree and a nonmembership degree. The generalized form of IFS is interval-valued intuitionistic fuzzy set...Intuitionistic fuzzy set (IFS) is a set of 2-tuple arguments, each of which is characterized by a membership degree and a nonmembership degree. The generalized form of IFS is interval-valued intuitionistic fuzzy set (IVIFS), whose components are intervals rather than exact numbers. IFSs and IVIFSs have been found to be very useful to describe vagueness and uncertainty. However, it seems that little attention has been focused on the clustering analysis of IFSs and IVIFSs. An intuitionistic fuzzy hierarchical algorithm is introduced for clustering IFSs, which is based on the traditional hierarchical clustering procedure, the intuitionistic fuzzy aggregation operator, and the basic distance measures between IFSs: the Hamming distance, normalized Hamming, weighted Hamming, the Euclidean distance, the normalized Euclidean distance, and the weighted Euclidean distance. Subsequently, the algorithm is extended for clustering IVIFSs. Finally the algorithm and its extended form are applied to the classifications of building materials and enterprises respectively.展开更多
Due to the widespread use of the Internet,customer information is vulnerable to computer systems attack,which brings urgent need for the intrusion detection technology.Recently,network intrusion detection has been one...Due to the widespread use of the Internet,customer information is vulnerable to computer systems attack,which brings urgent need for the intrusion detection technology.Recently,network intrusion detection has been one of the most important technologies in network security detection.The accuracy of network intrusion detection has reached higher accuracy so far.However,these methods have very low efficiency in network intrusion detection,even the most popular SOM neural network method.In this paper,an efficient and fast network intrusion detection method was proposed.Firstly,the fundamental of the two different methods are introduced respectively.Then,the selforganizing feature map neural network based on K-means clustering(KSOM)algorithms was presented to improve the efficiency of network intrusion detection.Finally,the NSLKDD is used as network intrusion data set to demonstrate that the KSOM method can significantly reduce the number of clustering iteration than SOM method without substantially affecting the clustering results and the accuracy is much higher than Kmeans method.The Experimental results show that our method can relatively improve the accuracy of network intrusion and significantly reduce the number of clustering iteration.展开更多
In this paper, for multiple attribute decision-making problem in which attribute values are interval grey numbers and some of them are null values, a decision model based on grey rough sets integration with incomplete...In this paper, for multiple attribute decision-making problem in which attribute values are interval grey numbers and some of them are null values, a decision model based on grey rough sets integration with incomplete information is proposed. We put forward incidence degree coefficient formula for grey interval, by information entropy theory and analysis technique, the method and principle is presented to fill up null values. We also establish the method of grey interval incidence cluster. Because grey system theory and Rough set theory are complementary each other, decision table with preference information is obtained by the result of grey incidence cluster. An algorithm for inducing decision rules based on rough set theory and the dominance relationship is presented. In some extent, this algorithm can deal with decision-making problem in which the attribute values are interval grey numbers and some of them are null values. Contrasted with classical model of cluster decision-making, the algorithm has an advantage of flexibility and compatibility to new information.展开更多
Raw data are classified using clustering techniques in a reasonable manner to create disjoint clusters.A lot of clustering algorithms based on specific parameters have been proposed to access a high volume of datasets...Raw data are classified using clustering techniques in a reasonable manner to create disjoint clusters.A lot of clustering algorithms based on specific parameters have been proposed to access a high volume of datasets.This paper focuses on cluster analysis based on neutrosophic set implication,i.e.,a k-means algorithm with a threshold-based clustering technique.This algorithm addresses the shortcomings of the k-means clustering algorithm by overcoming the limitations of the threshold-based clustering algorithm.To evaluate the validity of the proposed method,several validity measures and validity indices are applied to the Iris dataset(from the University of California,Irvine,Machine Learning Repository)along with k-means and threshold-based clustering algorithms.The proposed method results in more segregated datasets with compacted clusters,thus achieving higher validity indices.The method also eliminates the limitations of threshold-based clustering algorithm and validates measures and respective indices along with k-means and threshold-based clustering algorithms.展开更多
In this paper,a blind multiband spectrum sensing(BMSS)method requiring no knowledge of noise power,primary signal and wireless channel is proposed based on the K-means clustering(KMC).In this approach,the KMC algorith...In this paper,a blind multiband spectrum sensing(BMSS)method requiring no knowledge of noise power,primary signal and wireless channel is proposed based on the K-means clustering(KMC).In this approach,the KMC algorithm is used to identify the occupied subband set(OSS)and the idle subband set(ISS),and then the location and number information of the occupied channels are obtained according to the elements in the OSS.Compared with the classical BMSS methods based on the information theoretic criteria(ITC),the new method shows more excellent performance especially in the low signal-to-noise ratio(SNR)and the small sampling number scenarios,and more robust detection performance in noise uncertainty or unequal noise variance applications.Meanwhile,the new method performs more stablely than the ITC-based methods when the occupied subband number increases or the primary signals suffer multi-path fading.Simulation result verifies the effectiveness of the proposed method.展开更多
基金Supported by National Natural Science Foundation of China(60675039)National High Technology Research and Development Program of China(863 Program)(2006AA04Z217)Hundred Talents Program of Chinese Academy of Sciences
基金This research has been partially supported by the national natural science foundation of China (51175169) and the national science and technology support program (2012BAF02B01).
文摘In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore, this paper proposes an improved affinity propagation clustering algorithm. First, add the subtraction clustering, using the density value of the data points to obtain the point of initial clusters. Then, calculate the similarity distance between the initial cluster points, and reference the idea of semi-supervised clustering, adding pairs restriction information, structure sparse similarity matrix. Finally, the cluster representative points conduct AP clustering until a suitable cluster division.Experimental results show that the algorithm allows the calculation is greatly reduced, the similarity matrix storage capacity is also reduced, and better than the original algorithm on the clustering effect and processing speed.
文摘This paper proposes a clustering technique that minimizes the need for subjective human intervention and is based on elements of rough set theory (RST). The proposed algorithm is unified in its approach to clustering and makes use of both local and global data properties to obtain clustering solutions. It handles single-type and mixed attribute data sets with ease. The results from three data sets of single and mixed attribute types are used to illustrate the technique and establish its efficiency.
文摘This paper presents a fully automatic segmentation method of liver CT scans using fuzzy c-mean clustering and level set. First, the contrast of original image is enhanced to make boundaries clearer;second, a spatial fuzzy c-mean clustering combining with anatomical prior knowledge is employed to extract liver region automatically;thirdly, a distance regularized level set is used for refinement;finally, morphological operations are used as post-processing. The experiment result shows that the method can achieve high accuracy (0.9986) and specificity (0.9989). Comparing with standard level set method, our method is more effective in dealing with over-segmentation problem.
文摘The demand for individualized teaching from E-learning websites is rapidly increasing due to the huge differences existed among Web learners. A method for clustering Web learners based on rough set is proposed. The basic idea of the method is to reduce the learning attributes prior to clustering, and therefore the clustering of Web learners is carried out in a relative low-dimensional space. Using this method, the E-learning websites can arrange corresponding teaching content for different clusters of learners so that the learners’ individual requirements can be more satisfied. Key words rough set - attributes reduction - k-means clustering - individualized teaching CLC number TP 391.6 Foundation item: Supported by the National “863” Program of China (2002AA111010, 2003AA001032)Biography: LIU Shuai-dong (1979-), male, Master candidate, research direction: knowledge discovery and individualized learning techniques.
文摘To investigate the judging problem of optimal dividing matrix among several fuzzy dividing matrices in fuzzy dividing space, correspondingly, which is determined by the various choices of cluster samples in the totality sample space, two algorithms are proposed on the basis of the data analysis method in rough sets theory: information system discrete algorithm (algorithm 1) and samples representatives judging algorithm (algorithm 2). On the principle of the farthest distance, algorithm 1 transforms continuous data into discrete form which could be transacted by rough sets theory. Taking the approximate precision as a criterion, algorithm 2 chooses the sample space with a good representative. Hence, the clustering sample set in inducing and computing optimal dividing matrix can be achieved. Several theorems are proposed to provide strict theoretic foundations for the execution of the algorithm model. An applied example based on the new algorithm model is given, whose result verifies the feasibility of this new algorithm model.
文摘In this paper, we applied the rough sets to the point cluster and river network selection. In order to meet the requirements of rough sets, first, we structuralize and quantify the spatial information of objects by convex hull, triangulated irregular network (TIN), Voronoi diagram, etc.;second, we manually assign decisional attributes to the information table according to conditional attributes. In doing so, the spatial information and attribute information are integrated together to evaluate the importance of points and rivers by rough sets theory. Finally, we select the point cluster and the river network in a progressive manner. The experimental results show that our method is valid and effective. In comparison with previous work, our method has the advantage to adaptively consider the spatial and attribute information at the same time without any a priori knowledge.
基金Supported by the National Natural Science Foundation of China(61139002)~~
文摘Partition-based clustering with weighted feature is developed in the framework of shadowed sets. The objects in the core and boundary regions, generated by shadowed sets-based clustering, have different impact on the prototype of each cluster. By integrating feature weights, a formula for weight calculation is introduced to the clustering algorithm. The selection of weight exponent is crucial for good result and the weights are updated iteratively with each partition of clusters. The convergence of the weighted algorithms is given, and the feasible cluster validity indices of data mining application are utilized. Experimental results on both synthetic and real-life numerical data with different feature weights demonstrate that the weighted algorithm is better than the other unweighted algorithms.
基金Supported by the National Natural Science Foundation of China(61273209)
文摘Due to the limitation and hesitation in one's knowledge, the membership degree of an element to a given set usually has a few different values, in which the conventional fuzzy sets are invalid. Hesitant fuzzy sets are a powerful tool to treat this case. The present paper focuses on investigating the clustering technique for hesitant fuzzy sets based on the K-means clustering algorithm which takes the results of hierarchical clustering as the initial clusters. Finally, two examples demonstrate the validity of our algorithm.
基金supported by the National Natural Science Foundation of China for Distinguished Young Scholars(70625005)
文摘Intuitionistic fuzzy sets(IFSs) are useful means to describe and deal with vague and uncertain data.An intuitionistic fuzzy C-means algorithm to cluster IFSs is developed.In each stage of the intuitionistic fuzzy C-means method the seeds are modified,and for each IFS a membership degree to each of the clusters is estimated.In the end of the algorithm,all the given IFSs are clustered according to the estimated membership degrees.Furthermore,the algorithm is extended for clustering interval-valued intuitionistic fuzzy sets(IVIFSs).Finally,the developed algorithms are illustrated through conducting experiments on both the real-world and simulated data sets.
基金supported by the National Key Research and Development Program of China(2018YFB1201500)
文摘This paper uses Gaussian interval type-2 fuzzy se theory on historical traffic volume data processing to obtain a 24-hour prediction of traffic volume with high precision. A K-means clustering method is used in this paper to get 5 minutes traffic volume variation as input data for the Gaussian interval type-2 fuzzy sets which can reflect the distribution of historical traffic volume in one statistical period. Moreover, the cluster with the largest collection of data obtained by K-means clustering method is calculated to get the key parameters of type-2 fuzzy sets, mean and standard deviation of the Gaussian membership function.Using the range of data as the input of Gaussian interval type-2 fuzzy sets leads to the range of traffic volume forecasting output with the ability of describing the possible range of the traffic volume as well as the traffic volume prediction data with high accuracy. The simulation results show that the average relative error is reduced to 8% based on the combined K-means Gaussian interval type-2 fuzzy sets forecasting method. The fluctuation range in terms of an upper and a lower forecasting traffic volume completely envelopes the actual traffic volume and reproduces the fluctuation range of traffic flow.
基金supported by the National Natural Science Foundation of China (70571087)the National Science Fund for Distinguished Young Scholars of China (70625005)
文摘Intuitionistic fuzzy set (IFS) is a set of 2-tuple arguments, each of which is characterized by a membership degree and a nonmembership degree. The generalized form of IFS is interval-valued intuitionistic fuzzy set (IVIFS), whose components are intervals rather than exact numbers. IFSs and IVIFSs have been found to be very useful to describe vagueness and uncertainty. However, it seems that little attention has been focused on the clustering analysis of IFSs and IVIFSs. An intuitionistic fuzzy hierarchical algorithm is introduced for clustering IFSs, which is based on the traditional hierarchical clustering procedure, the intuitionistic fuzzy aggregation operator, and the basic distance measures between IFSs: the Hamming distance, normalized Hamming, weighted Hamming, the Euclidean distance, the normalized Euclidean distance, and the weighted Euclidean distance. Subsequently, the algorithm is extended for clustering IVIFSs. Finally the algorithm and its extended form are applied to the classifications of building materials and enterprises respectively.
文摘Due to the widespread use of the Internet,customer information is vulnerable to computer systems attack,which brings urgent need for the intrusion detection technology.Recently,network intrusion detection has been one of the most important technologies in network security detection.The accuracy of network intrusion detection has reached higher accuracy so far.However,these methods have very low efficiency in network intrusion detection,even the most popular SOM neural network method.In this paper,an efficient and fast network intrusion detection method was proposed.Firstly,the fundamental of the two different methods are introduced respectively.Then,the selforganizing feature map neural network based on K-means clustering(KSOM)algorithms was presented to improve the efficiency of network intrusion detection.Finally,the NSLKDD is used as network intrusion data set to demonstrate that the KSOM method can significantly reduce the number of clustering iteration than SOM method without substantially affecting the clustering results and the accuracy is much higher than Kmeans method.The Experimental results show that our method can relatively improve the accuracy of network intrusion and significantly reduce the number of clustering iteration.
基金Supported by the NSF of Henan Province(082300410040)Supported by the NSF of Zhumadian City(087006)
文摘In this paper, for multiple attribute decision-making problem in which attribute values are interval grey numbers and some of them are null values, a decision model based on grey rough sets integration with incomplete information is proposed. We put forward incidence degree coefficient formula for grey interval, by information entropy theory and analysis technique, the method and principle is presented to fill up null values. We also establish the method of grey interval incidence cluster. Because grey system theory and Rough set theory are complementary each other, decision table with preference information is obtained by the result of grey incidence cluster. An algorithm for inducing decision rules based on rough set theory and the dominance relationship is presented. In some extent, this algorithm can deal with decision-making problem in which the attribute values are interval grey numbers and some of them are null values. Contrasted with classical model of cluster decision-making, the algorithm has an advantage of flexibility and compatibility to new information.
文摘Raw data are classified using clustering techniques in a reasonable manner to create disjoint clusters.A lot of clustering algorithms based on specific parameters have been proposed to access a high volume of datasets.This paper focuses on cluster analysis based on neutrosophic set implication,i.e.,a k-means algorithm with a threshold-based clustering technique.This algorithm addresses the shortcomings of the k-means clustering algorithm by overcoming the limitations of the threshold-based clustering algorithm.To evaluate the validity of the proposed method,several validity measures and validity indices are applied to the Iris dataset(from the University of California,Irvine,Machine Learning Repository)along with k-means and threshold-based clustering algorithms.The proposed method results in more segregated datasets with compacted clusters,thus achieving higher validity indices.The method also eliminates the limitations of threshold-based clustering algorithm and validates measures and respective indices along with k-means and threshold-based clustering algorithms.
基金Projects(61362018,61861019)supported by the National Natural Science Foundation of ChinaProject(1402041B)supported by the Jiangsu Province Postdoctoral Scientific Research Project,China+1 种基金Project(16A174)supported by the Scientific Research Fund of Hunan Provincial Education Department,ChinaProject([2016]283)supported by the Research Study and Innovative Experiment Project of College Students,China
文摘In this paper,a blind multiband spectrum sensing(BMSS)method requiring no knowledge of noise power,primary signal and wireless channel is proposed based on the K-means clustering(KMC).In this approach,the KMC algorithm is used to identify the occupied subband set(OSS)and the idle subband set(ISS),and then the location and number information of the occupied channels are obtained according to the elements in the OSS.Compared with the classical BMSS methods based on the information theoretic criteria(ITC),the new method shows more excellent performance especially in the low signal-to-noise ratio(SNR)and the small sampling number scenarios,and more robust detection performance in noise uncertainty or unequal noise variance applications.Meanwhile,the new method performs more stablely than the ITC-based methods when the occupied subband number increases or the primary signals suffer multi-path fading.Simulation result verifies the effectiveness of the proposed method.