The K-multiple-means(KMM)retains the simple and efficient advantages of the K-means algorithm by setting multiple subclasses,and improves its effect on non-convex data sets.And aiming at the problem that it cannot be ...The K-multiple-means(KMM)retains the simple and efficient advantages of the K-means algorithm by setting multiple subclasses,and improves its effect on non-convex data sets.And aiming at the problem that it cannot be applied to the Internet on a multi-view data set,a multi-view K-multiple-means(MKMM)clustering method is proposed in this paper.The new algorithm introduces view weight parameter,reserves the design of setting multiple subclasses,makes the number of clusters as constraint and obtains clusters by solving optimization problem.The new algorithm is compared with some popular multi-view clustering algorithms.The effectiveness of the new algorithm is proved through the analysis of the experimental results.展开更多
For photovoltaic power prediction,a kind of sparse representation modeling method using feature extraction techniques is proposed.Firstly,all these factors affecting the photovoltaic power output are regarded as the i...For photovoltaic power prediction,a kind of sparse representation modeling method using feature extraction techniques is proposed.Firstly,all these factors affecting the photovoltaic power output are regarded as the input data of the model.Next,the dictionary learning techniques using the K-mean singular value decomposition(K-SVD)algorithm and the orthogonal matching pursuit(OMP)algorithm are used to obtain the corresponding sparse encoding based on all the input data,i.e.the initial dictionary.Then,to build the global prediction model,the sparse coding vectors are used as the input of the model of the kernel extreme learning machine(KELM).Finally,to verify the effectiveness of the combined K-SVD-OMP and KELM method,the proposed method is applied to a instance of the photovoltaic power prediction.Compared with KELM,SVM and ELM under the same conditions,experimental results show that different combined sparse representation methods achieve better prediction results,among which the combined K-SVD-OMP and KELM method shows better prediction results and modeling accuracy.展开更多
Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experien...Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experience-based criteria. In order to eliminate linguistic criteria resulted from experience-based judgments and account for uncertainties in determining class boundaries developed by SMR system,the system classification results were corrected using two clustering algorithms, namely K-means and fuzzy c-means(FCM), for the ratings obtained via continuous and discrete functions. By applying clustering algorithms in SMR classification system, no in-advance experience-based judgment was made on the number of extracted classes in this system, and it was only after all steps of the clustering algorithms were accomplished that new classification scheme was proposed for SMR system under different failure modes based on the ratings obtained via continuous and discrete functions. The results of this study showed that, engineers can achieve more reliable and objective evaluations over slope stability by using SMR system based on the ratings calculated via continuous and discrete functions.展开更多
Recently clustering techniques have been used to automatically discover typical user profiles. In general, it is a challenging problem to design effective similarity measure between the session vectors which are usual...Recently clustering techniques have been used to automatically discover typical user profiles. In general, it is a challenging problem to design effective similarity measure between the session vectors which are usually high-dimensional and sparse. Two approaches for mining typical user profiles, based on matrix dimensionality reduction, are presented. In these approaches, non-negative matrix factorization is applied to reduce dimensionality of the session-URL matrix, and the projecting vectors of the user-session vectors are clustered into typical user-session profiles using the spherical k -means algorithm. The results show that two algorithms are successful in mining many typical user profiles in the user sessions.展开更多
In 5 G Ultra-dense Network(UDN), resource allocation is an efficient method to manage inter-small-cell interference. In this paper, a two-stage resource allocation scheme is proposed to supervise interference and reso...In 5 G Ultra-dense Network(UDN), resource allocation is an efficient method to manage inter-small-cell interference. In this paper, a two-stage resource allocation scheme is proposed to supervise interference and resource allocation while establishing a realistic scenario of three-tier heterogeneous network architecture. The scheme consists of two stages: in stage I, a two-level sub-channel allocation algorithm and a power control method based on the logarithmic function are applied to allocate resource for Macrocell and Picocells, guaranteeing the minimum system capacity by considering the power limitation and interference coordination; in stage II, an interference management approach based on K-means clustering is introduced to divide Femtocells into different clusters. Then, a prior sub-channel allocation algorithm is employed for Femtocells in diverse clusters to mitigate the interference and promote system performance. Simulation results show that the proposed scheme contributes to the enhancement of system throughput and spectrum efficiency while ensuring the system energy efficiency.展开更多
Sample entropy can reflect the change of level of new information in signal sequence as well as the size of the new information. Based on the sample entropy as the features of speech classification, the paper firstly ...Sample entropy can reflect the change of level of new information in signal sequence as well as the size of the new information. Based on the sample entropy as the features of speech classification, the paper firstly extract the sample entropy of mixed signal, mean and variance to calculate each signal sample entropy, finally uses the K mean clustering to recognize. The simulation results show that: the recognition rate can be increased to 89.2% based on sample entropy.展开更多
One of the most important problems of clustering is to define the number of classes. In fact, it is not easy to find an appropriate method to measure whether the cluster configuration is acceptable or not. In this pap...One of the most important problems of clustering is to define the number of classes. In fact, it is not easy to find an appropriate method to measure whether the cluster configuration is acceptable or not. In this paper we propose a possible and non-automatic solution considering different criteria of clustering and comparing their results. In this way robust structures of an analyzed dataset can be often caught (or established) and an optimal cluster configuration, which presents a meaningful association, may be defined. In particular, we also focus on the variables which may be used in cluster analysis. In fact, variables which contain little clustering information can cause misleading and not-robustness results. Therefore, three algorithms are employed in this study: K-means partitioning methods, Partitioning Around Medoids (PAM) and the Heuristic Identification of Noisy Variables (HINoV). The results are compared with robust methods ones.展开更多
Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities...Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities in intelligent data analyzing applications are mostly represented with the help of IF-THEN rules. With the help of these rules the following tasks are solved: prediction, classification, pattern recognition and others. Using different approaches---clustering algorithms, neural network methods, fuzzy rule processing methods--we can extract rules that in an understandable language characterize the data. This allows interpreting the data, finding relationships in the data and extracting new rules that characterize them. Knowledge acquisition in this paper is defined as the process of extracting knowledge from numerical data in the form of rules. Extraction of rules in this context is based on clustering methods K-means and fuzzy C-means. With the assistance of K-means, clustering algorithm rules are derived from trained neural networks. Fuzzy C-means is used in fuzzy rule based design method. Rule extraction methodology is demonstrated in the Fisher's Iris flower data set samples. The effectiveness of the extracted rules is evaluated. Clustering and rule extraction methodology can be widely used in evaluating and analyzing various economic and financial processes.展开更多
It is illegal to spread and transmit pornographic images over internet,either in real or in artificial format.The traditional methods are designed to identify real pornographic images and they are less efficient in de...It is illegal to spread and transmit pornographic images over internet,either in real or in artificial format.The traditional methods are designed to identify real pornographic images and they are less efficient in dealing with artificial images.Therefore,criminals turn to release artificial pornographic images in some specific scenes,e.g.,in social networks.To efficiently identify artificial pornographic images,a novel bag-of-visual-words based approach is proposed in the work.In the bag-of-words(Bo W)framework,speeded-up robust feature(SURF)is adopted for feature extraction at first,then a visual vocabulary is constructed through K-means clustering and images are represented by an improved Bo W encoding method,and finally the visual words are fed into a learning machine for training and classification.Different from the traditional BoW method,the proposed method sets a weight on each visual word according to the number of features that each cluster contains.Moreover,a non-binary encoding method and cross-matching strategy are utilized to improve the discriminative power of the visual words.Experimental results indicate that the proposed method outperforms the traditional method.展开更多
Customers are of great importance to E-commerce in intense competition.It is known that twenty percent customers produce eighty percent profiles.Thus,how to find these customers is very critical.Customer lifetime valu...Customers are of great importance to E-commerce in intense competition.It is known that twenty percent customers produce eighty percent profiles.Thus,how to find these customers is very critical.Customer lifetime value(CLV) is presented to evaluate customers in terms of recency,frequency and monetary(RFM) variables.A novel model is proposed to analyze customers purchase data and RFM variables based on ordered weighting averaging(OWA) and K-Means cluster algorithm.OWA is employed to determine the weights of RFM variables in evaluating customer lifetime value or loyalty.K-Means algorithm is used to cluster customers according to RFM values.Churn customers could be found out by comparing RFM values of every cluster group with average RFM.Questionnaire is conducted to investigate which reasons cause customers dissatisfaction.Rank these reasons to help E-commerce improve services.The experimental results have demonstrated that the model is effective and reasonable.展开更多
基金National Youth Natural Science Foundationof China(No.61806006)Innovation Program for Graduate of Jiangsu Province(No.KYLX160-781)Project Supported by Jiangsu University Superior Discipline Construction Project。
文摘The K-multiple-means(KMM)retains the simple and efficient advantages of the K-means algorithm by setting multiple subclasses,and improves its effect on non-convex data sets.And aiming at the problem that it cannot be applied to the Internet on a multi-view data set,a multi-view K-multiple-means(MKMM)clustering method is proposed in this paper.The new algorithm introduces view weight parameter,reserves the design of setting multiple subclasses,makes the number of clusters as constraint and obtains clusters by solving optimization problem.The new algorithm is compared with some popular multi-view clustering algorithms.The effectiveness of the new algorithm is proved through the analysis of the experimental results.
基金National Natural Science Foundation of China(No.51467008)。
文摘For photovoltaic power prediction,a kind of sparse representation modeling method using feature extraction techniques is proposed.Firstly,all these factors affecting the photovoltaic power output are regarded as the input data of the model.Next,the dictionary learning techniques using the K-mean singular value decomposition(K-SVD)algorithm and the orthogonal matching pursuit(OMP)algorithm are used to obtain the corresponding sparse encoding based on all the input data,i.e.the initial dictionary.Then,to build the global prediction model,the sparse coding vectors are used as the input of the model of the kernel extreme learning machine(KELM).Finally,to verify the effectiveness of the combined K-SVD-OMP and KELM method,the proposed method is applied to a instance of the photovoltaic power prediction.Compared with KELM,SVM and ELM under the same conditions,experimental results show that different combined sparse representation methods achieve better prediction results,among which the combined K-SVD-OMP and KELM method shows better prediction results and modeling accuracy.
文摘Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experience-based criteria. In order to eliminate linguistic criteria resulted from experience-based judgments and account for uncertainties in determining class boundaries developed by SMR system,the system classification results were corrected using two clustering algorithms, namely K-means and fuzzy c-means(FCM), for the ratings obtained via continuous and discrete functions. By applying clustering algorithms in SMR classification system, no in-advance experience-based judgment was made on the number of extracted classes in this system, and it was only after all steps of the clustering algorithms were accomplished that new classification scheme was proposed for SMR system under different failure modes based on the ratings obtained via continuous and discrete functions. The results of this study showed that, engineers can achieve more reliable and objective evaluations over slope stability by using SMR system based on the ratings calculated via continuous and discrete functions.
文摘Recently clustering techniques have been used to automatically discover typical user profiles. In general, it is a challenging problem to design effective similarity measure between the session vectors which are usually high-dimensional and sparse. Two approaches for mining typical user profiles, based on matrix dimensionality reduction, are presented. In these approaches, non-negative matrix factorization is applied to reduce dimensionality of the session-URL matrix, and the projecting vectors of the user-session vectors are clustered into typical user-session profiles using the spherical k -means algorithm. The results show that two algorithms are successful in mining many typical user profiles in the user sessions.
基金partially supported by the Major Project of National Science and Technology of China under Grants No. 2016ZX03002010003 and No. 2015ZX03001033-002
文摘In 5 G Ultra-dense Network(UDN), resource allocation is an efficient method to manage inter-small-cell interference. In this paper, a two-stage resource allocation scheme is proposed to supervise interference and resource allocation while establishing a realistic scenario of three-tier heterogeneous network architecture. The scheme consists of two stages: in stage I, a two-level sub-channel allocation algorithm and a power control method based on the logarithmic function are applied to allocate resource for Macrocell and Picocells, guaranteeing the minimum system capacity by considering the power limitation and interference coordination; in stage II, an interference management approach based on K-means clustering is introduced to divide Femtocells into different clusters. Then, a prior sub-channel allocation algorithm is employed for Femtocells in diverse clusters to mitigate the interference and promote system performance. Simulation results show that the proposed scheme contributes to the enhancement of system throughput and spectrum efficiency while ensuring the system energy efficiency.
文摘Sample entropy can reflect the change of level of new information in signal sequence as well as the size of the new information. Based on the sample entropy as the features of speech classification, the paper firstly extract the sample entropy of mixed signal, mean and variance to calculate each signal sample entropy, finally uses the K mean clustering to recognize. The simulation results show that: the recognition rate can be increased to 89.2% based on sample entropy.
文摘One of the most important problems of clustering is to define the number of classes. In fact, it is not easy to find an appropriate method to measure whether the cluster configuration is acceptable or not. In this paper we propose a possible and non-automatic solution considering different criteria of clustering and comparing their results. In this way robust structures of an analyzed dataset can be often caught (or established) and an optimal cluster configuration, which presents a meaningful association, may be defined. In particular, we also focus on the variables which may be used in cluster analysis. In fact, variables which contain little clustering information can cause misleading and not-robustness results. Therefore, three algorithms are employed in this study: K-means partitioning methods, Partitioning Around Medoids (PAM) and the Heuristic Identification of Noisy Variables (HINoV). The results are compared with robust methods ones.
文摘Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities in intelligent data analyzing applications are mostly represented with the help of IF-THEN rules. With the help of these rules the following tasks are solved: prediction, classification, pattern recognition and others. Using different approaches---clustering algorithms, neural network methods, fuzzy rule processing methods--we can extract rules that in an understandable language characterize the data. This allows interpreting the data, finding relationships in the data and extracting new rules that characterize them. Knowledge acquisition in this paper is defined as the process of extracting knowledge from numerical data in the form of rules. Extraction of rules in this context is based on clustering methods K-means and fuzzy C-means. With the assistance of K-means, clustering algorithm rules are derived from trained neural networks. Fuzzy C-means is used in fuzzy rule based design method. Rule extraction methodology is demonstrated in the Fisher's Iris flower data set samples. The effectiveness of the extracted rules is evaluated. Clustering and rule extraction methodology can be widely used in evaluating and analyzing various economic and financial processes.
基金Projects(41001260,61173122,61573380) supported by the National Natural Science Foundation of ChinaProject(11JJ5044) supported by the Hunan Provincial Natural Science Foundation of China
文摘It is illegal to spread and transmit pornographic images over internet,either in real or in artificial format.The traditional methods are designed to identify real pornographic images and they are less efficient in dealing with artificial images.Therefore,criminals turn to release artificial pornographic images in some specific scenes,e.g.,in social networks.To efficiently identify artificial pornographic images,a novel bag-of-visual-words based approach is proposed in the work.In the bag-of-words(Bo W)framework,speeded-up robust feature(SURF)is adopted for feature extraction at first,then a visual vocabulary is constructed through K-means clustering and images are represented by an improved Bo W encoding method,and finally the visual words are fed into a learning machine for training and classification.Different from the traditional BoW method,the proposed method sets a weight on each visual word according to the number of features that each cluster contains.Moreover,a non-binary encoding method and cross-matching strategy are utilized to improve the discriminative power of the visual words.Experimental results indicate that the proposed method outperforms the traditional method.
基金supported by the Natural Science Foundation under Grant Nos.71273139,60804047the Social Science Foundation of Chinese Ministry of Education under Grant No.12YJC630271
文摘Customers are of great importance to E-commerce in intense competition.It is known that twenty percent customers produce eighty percent profiles.Thus,how to find these customers is very critical.Customer lifetime value(CLV) is presented to evaluate customers in terms of recency,frequency and monetary(RFM) variables.A novel model is proposed to analyze customers purchase data and RFM variables based on ordered weighting averaging(OWA) and K-Means cluster algorithm.OWA is employed to determine the weights of RFM variables in evaluating customer lifetime value or loyalty.K-Means algorithm is used to cluster customers according to RFM values.Churn customers could be found out by comparing RFM values of every cluster group with average RFM.Questionnaire is conducted to investigate which reasons cause customers dissatisfaction.Rank these reasons to help E-commerce improve services.The experimental results have demonstrated that the model is effective and reasonable.