期刊文献+

基于优化K-means聚类算法的用户画像 被引量:2

User Portraits Based on Optimized K-means Clustering Algorithm
下载PDF
导出
摘要 近年来,用户画像作为一种有效的大数据工具,在电子商务、社交网络等互联网行业得到广泛应用。然而,对于传统企业,数据维度往往较少,同时分散在多个信息系统,难以通过一般的方法得到较准确的结果。针对此问题,文章提出基于优化K-means聚类算法的用户画像方法,即同时利用K-means++初始聚类中心优化算法提高聚类精度、Mini Batch K-means小批量优化算法提高收敛速度,以充分结合二者的强互补性,提高算法的分析处理能力。基于企业数据和公开数据集的实验结果显示,相比经典K-means算法,该方法的速度和精度分别提高150倍、20%左右。 In recent years,as an effective big data tool,user portraits have been widely applied in Internet industries,such ase-commerce and social networks.However,for traditional enterprises,where the data dimensions are usually small and scattered in multiple information systems,it is difficult to obtain accurate results through general methods.In response to this problem,the article proposes a user portraits method based on the optimized K-means clustering algorithms,namely,exploiting the K-means++initial clustering center optimization algorithm to improve the clustering accuracy and the Mini Batch K-means small batch optimization algorithm to improve the convergence speed,with the high complementarity of the two combined to improve the analysis and processing capabilities of the algorithm.The experimental results conducted on enterprise data and public data sets show that compared with the classic K-means,the speed and accuracy of this method are increased by about 150 times and 20%,respectively.
作者 王晨光 WANG Chenguang
出处 《科技创新与应用》 2022年第18期18-21,共4页 Technology Innovation and Application
关键词 优化K-means均值算法 用户画像 聚类分析 有限维度 高分散度 optimized K-means algorithm user portrait clustering analysis finite dimension high dispersion
  • 相关文献

参考文献4

二级参考文献91

  • 1Http://hadoop.apache.org/.
  • 2陈俊杰,刘炜.一种基于本体的个性化模式库建模方法[J].计算机研究与发展,2007,44(7):1151-1159. 被引量:7
  • 3谭旁宁,STEINBACHM, KUMAR V.数据挖掘导论[M],北京:人民邮电出版社,2012.
  • 4HARTIGAN J A. Clustering Algorithms[ M] . New York:John Wiley & Sons, 1975.
  • 5HAN J, KAMBER M, PEI J. Data Mining Concepts andTechniques Orlando[ M]. San Francisco: Morgan Kaufmann Publishers ,2001.
  • 6MACQUEEN J. Some methods for classification and analy-sis of multivariate observations [C] // Proceedings of the5th Berkeley Symposium on Mathematical Statistics andProbability. 1967 : 281 -297.
  • 7BALL G H,HALL D J. A Clustering Technique for Sum-marizing Multivariate Data [ J ]. Behavior Science,1967,12(2) :153 -155.
  • 8REZAEE M R,LELIEVELDT B P F,REIBER J H C. ANew Cluster Validity Index for the Fuzzy C-Means [ J ].Pattern Recognition Letters, 1998,19(3/4) :237 - 246.
  • 9BANDYOPADHYAY S,MAUUK U. Genetic clustering forautomatic evolution of clusters and application to imageclassification [ J ]. Pattern Recognition,2002, 35 ( 6 ):1197-1208.
  • 10XU L,KRZYZAK A, OJA E. Rival penalized competitivelearning for clustering analysis,RBF net,and curve detec-tion[ J]. IEEE Transactions on Neural Networks, 1993 ,4(4):636-649.

共引文献177

同被引文献10

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部