期刊文献+

基于改进K-means的电商页面数据分析与挖掘 被引量:4

Analysis and Mining of E-commerce Page Data Based on Improved K-means
下载PDF
导出
摘要 数据挖掘技术是利用计算机强大的计算能力来代替部分人工分析的一项技术。传统的数据分析是人们利用自己的大脑对数据进行分析、思考和解读,但人脑所能承载的计算量是有限的。目前,计算机强大的计算能力代替了人脑,它们不仅可以处理一些不需要自主思考的增删改查类工作,有时还可以担任一些需要自我学习能力的任务,比如对网页数据进行高质量分析与挖掘。为了进一步探究网页数据分析与挖掘,本文提出了一种基于优化样本距离计算方法,从而改进了K-means算法的聚类中心计算方法。具体来说,本文获取常见电商页面“当当网”公开的以“手机”为关键词的近12000条数据,使用文本挖掘技术对其进行数据挖掘,对数据的文本信息进行清洗、中文分词以及关键词权重计算等全面预处理,最终使用聚类中心优化的K-means算法,挖掘看似毫无关联的数据集中的隐藏信息为电商用户提供市场导向。 Data mining technology is a technique that utilizes the powerful computing power of computers to replace some manual analysis.Traditional data analysis involves people using their own brains to analyze,think and interpret data,but the amount of computation that the human brain can carry is limited.At present,the powerful computing power of computers has replaced the human brain.They can not only handle tasks such as adding,deleting,modifying,and searching that do not require independent thinking,but also sometimes perform tasks that require self-learning ability,such as high-quality analysis and mining of web data.In order to further explore web data analysis and mining,this article proposes a clustering center calculation method based on optimized sample distance,thereby improving the K-means algorithm.Specifically,this article obtained nearly 12000 pieces of data publicly available on the common e-commerce page"Dangdang.com"with the keyword"mobile phone".Text mining technology was used to mine the data,which underwent comprehensive preprocessing such as text information cleaning,Chinese word segmentation,and keyword weight calculation.Finally,the K-means algorithm optimized by the clustering center was used,mining hidden information in seemingly unrelated datasets to provide market orientation for e-commerce users.
作者 叶昊 缪宜恒 张宏俊 YE Hao;MIAO Yiheng;ZHANG Hongjun(School of Modern Posts,Nanjing University of Posts and Telecommunications,Nanjing Jiangsu 210003;School of Communications and Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing Jiangsu 210003;China Communications Services Co.,Ltd.,Beijing 100005;School of Internet of Things,Nanjing University of Posts and Telecommunications,Nanjing Jiangsu 210003)
出处 《软件》 2023年第6期35-43,共9页 Software
基金 江苏省研究生科研与实践创新计划项目(KYCX22_1019)。
关键词 电商页面 数据挖掘 数据预处理 中文文本聚类 e-commerce page data mining data preprocessing Chinese text clustering
  • 相关文献

参考文献7

二级参考文献61

共引文献239

同被引文献26

引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部