In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering a...In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering algorithm is proposed. First, the concept of a silhouette coefficient is introduced, and the optimal clustering number Kopt of a data set with unknown class information is confirmed by calculating the silhouette coefficient of objects in clusters under different K values. Then the distribution of the data set is obtained through hierarchical clustering and the initial clustering-centers are confirmed. Finally, the clustering is completed by the traditional k-means clustering. By the theoretical analysis, it is proved that the improved k-means clustering algorithm has proper computational complexity. The experimental results of IRIS testing data set show that the algorithm can distinguish different clusters reasonably and recognize the outliers efficiently, and the entropy generated by the algorithm is lower.展开更多
The K-multiple-means(KMM)retains the simple and efficient advantages of the K-means algorithm by setting multiple subclasses,and improves its effect on non-convex data sets.And aiming at the problem that it cannot be ...The K-multiple-means(KMM)retains the simple and efficient advantages of the K-means algorithm by setting multiple subclasses,and improves its effect on non-convex data sets.And aiming at the problem that it cannot be applied to the Internet on a multi-view data set,a multi-view K-multiple-means(MKMM)clustering method is proposed in this paper.The new algorithm introduces view weight parameter,reserves the design of setting multiple subclasses,makes the number of clusters as constraint and obtains clusters by solving optimization problem.The new algorithm is compared with some popular multi-view clustering algorithms.The effectiveness of the new algorithm is proved through the analysis of the experimental results.展开更多
In order to accurately identify the characters associated with consumption behavior of apparel online shopping, a typical B/ C clothing enterprise in China was chosen. The target experimental database containing 2000 ...In order to accurately identify the characters associated with consumption behavior of apparel online shopping, a typical B/ C clothing enterprise in China was chosen. The target experimental database containing 2000 data records was obtained based on web service logs of sample enterprise. By means of clustering algorithm of Clementine Data Mining Software, K-means model was set up and 8 clusters of consumer were concluded. Meanwhile, the implicit information existed in consumer's characters and preferences for clothing was found. At last, 31 valuable association rules among casual wear, formal wear, and tie-in products were explored by using web analysis and Aprior algorithm. This finding will help to better understand the nature of online apparel consumption behavior and make a good progress in personalization and intelligent recommendation strategies.展开更多
We propose a novel texture clustering method. A classical type of(approximate) shift invariant discrete wavelet transform(DWT),dual tree DWT,is used to decompose texture images. Multiple signatures are generated from ...We propose a novel texture clustering method. A classical type of(approximate) shift invariant discrete wavelet transform(DWT),dual tree DWT,is used to decompose texture images. Multiple signatures are generated from the obtained high-frequency bands. A locality preserving approach is applied subsequently to project data from high-dimensional space to low-dimensional space. Shift invariant DWT can represent image texture information efficiently in combination with a histogram signature,and the local geometrical structure of the dataset is preserved well during clustering. Experimental results show that the proposed method remarkably outperforms traditional ones.展开更多
Customers are of great importance to E-commerce in intense competition.It is known that twenty percent customers produce eighty percent profiles.Thus,how to find these customers is very critical.Customer lifetime valu...Customers are of great importance to E-commerce in intense competition.It is known that twenty percent customers produce eighty percent profiles.Thus,how to find these customers is very critical.Customer lifetime value(CLV) is presented to evaluate customers in terms of recency,frequency and monetary(RFM) variables.A novel model is proposed to analyze customers purchase data and RFM variables based on ordered weighting averaging(OWA) and K-Means cluster algorithm.OWA is employed to determine the weights of RFM variables in evaluating customer lifetime value or loyalty.K-Means algorithm is used to cluster customers according to RFM values.Churn customers could be found out by comparing RFM values of every cluster group with average RFM.Questionnaire is conducted to investigate which reasons cause customers dissatisfaction.Rank these reasons to help E-commerce improve services.The experimental results have demonstrated that the model is effective and reasonable.展开更多
基金The National Natural Science Foundation of China(No50674086)Specialized Research Fund for the Doctoral Program of Higher Education (No20060290508)the Youth Scientific Research Foundation of China University of Mining and Technology (No2006A047)
文摘In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering algorithm is proposed. First, the concept of a silhouette coefficient is introduced, and the optimal clustering number Kopt of a data set with unknown class information is confirmed by calculating the silhouette coefficient of objects in clusters under different K values. Then the distribution of the data set is obtained through hierarchical clustering and the initial clustering-centers are confirmed. Finally, the clustering is completed by the traditional k-means clustering. By the theoretical analysis, it is proved that the improved k-means clustering algorithm has proper computational complexity. The experimental results of IRIS testing data set show that the algorithm can distinguish different clusters reasonably and recognize the outliers efficiently, and the entropy generated by the algorithm is lower.
基金National Youth Natural Science Foundationof China(No.61806006)Innovation Program for Graduate of Jiangsu Province(No.KYLX160-781)Project Supported by Jiangsu University Superior Discipline Construction Project。
文摘The K-multiple-means(KMM)retains the simple and efficient advantages of the K-means algorithm by setting multiple subclasses,and improves its effect on non-convex data sets.And aiming at the problem that it cannot be applied to the Internet on a multi-view data set,a multi-view K-multiple-means(MKMM)clustering method is proposed in this paper.The new algorithm introduces view weight parameter,reserves the design of setting multiple subclasses,makes the number of clusters as constraint and obtains clusters by solving optimization problem.The new algorithm is compared with some popular multi-view clustering algorithms.The effectiveness of the new algorithm is proved through the analysis of the experimental results.
基金Scientific Research Program Funded by Shaanxi Provincial Education Department,China(No.2013JK0749)
文摘In order to accurately identify the characters associated with consumption behavior of apparel online shopping, a typical B/ C clothing enterprise in China was chosen. The target experimental database containing 2000 data records was obtained based on web service logs of sample enterprise. By means of clustering algorithm of Clementine Data Mining Software, K-means model was set up and 8 clusters of consumer were concluded. Meanwhile, the implicit information existed in consumer's characters and preferences for clothing was found. At last, 31 valuable association rules among casual wear, formal wear, and tie-in products were explored by using web analysis and Aprior algorithm. This finding will help to better understand the nature of online apparel consumption behavior and make a good progress in personalization and intelligent recommendation strategies.
基金supported by the Hi-Tech Research and Development Program (863) of China (Nos. 2007AA01Z311 and 2007AA04Z1A5)the National Basic Research Program (973) of China (No. 2009CB32 0804)+1 种基金the National Research Foundation for the Doctoral Program of Higher Education of China (No. 20060335114)the Science and Technology Program of Zhejiang Province, China (No. 2007C21006)
文摘We propose a novel texture clustering method. A classical type of(approximate) shift invariant discrete wavelet transform(DWT),dual tree DWT,is used to decompose texture images. Multiple signatures are generated from the obtained high-frequency bands. A locality preserving approach is applied subsequently to project data from high-dimensional space to low-dimensional space. Shift invariant DWT can represent image texture information efficiently in combination with a histogram signature,and the local geometrical structure of the dataset is preserved well during clustering. Experimental results show that the proposed method remarkably outperforms traditional ones.
基金supported by the Natural Science Foundation under Grant Nos.71273139,60804047the Social Science Foundation of Chinese Ministry of Education under Grant No.12YJC630271
文摘Customers are of great importance to E-commerce in intense competition.It is known that twenty percent customers produce eighty percent profiles.Thus,how to find these customers is very critical.Customer lifetime value(CLV) is presented to evaluate customers in terms of recency,frequency and monetary(RFM) variables.A novel model is proposed to analyze customers purchase data and RFM variables based on ordered weighting averaging(OWA) and K-Means cluster algorithm.OWA is employed to determine the weights of RFM variables in evaluating customer lifetime value or loyalty.K-Means algorithm is used to cluster customers according to RFM values.Churn customers could be found out by comparing RFM values of every cluster group with average RFM.Questionnaire is conducted to investigate which reasons cause customers dissatisfaction.Rank these reasons to help E-commerce improve services.The experimental results have demonstrated that the model is effective and reasonable.