期刊文献+

基于Hadoop的数据挖掘算法在葡萄酒信息数据分析系统中的应用 被引量:6

Applicaiton of data mining algorithm based on Hadoop in wine information data analysis system
下载PDF
导出
摘要 针对宁夏贺兰山东麓葡萄酒庄在销售自产葡萄酒的过程中存在葡萄酒信息数据分析不准确、销售渠道窄、销售信息更新慢、销量不高等问题,提出了一种基于Hadoop分布式框架的数据挖掘算法,对极大量的葡萄酒信息数据分析系统的数据进行采集、分析,并根据分析结果为用户推荐适合其口味的葡萄酒饮品。该系统以宁夏贺兰山东麓葡萄酒庄为实验基地,自主开发葡萄酒信息数据分析系统,并利用数据挖据算法中的聚类算法K-means算法和分类算法中的C4.5算法进行数据挖据;其中针对聚类算法中的K-means算法和分类算法中的C4.5算法实现分布式研究,实现了基于Map Reduce的分布式聚类和分类算法,并且在此基础上对其各自的缺点进行相应的改进。实验证明了基于Hadoop框架的数据挖掘算法在算法的稳定性和准确率上均有很好的提升,并且可以对葡萄酒信息数据分析系统的海量数据进行很好的处理。 Aiming at the problems of inaccurate wine information data analysis, narrow sales channel, slow update of sales information and the low sales volume at Ningxia Helanshan foothill winery, this paper proposed a data mining algorithm based on Hadoop distributed framework for the wine winery at Ningxia Helanshan. A very large amount of wine information data was collected and analyzed, and according to the analysis results, the wine suitable for the user's taste was recommended. Firstly, this system based on the Ningxia Helanshan foothill winery estate as the experiment base, carried on the independent development of wine information data analysis system, and used K-means algorithm as the clustering algorithm and C4.5 algorithm as the classification algorithm for data mining. The distributed clustering and classification algorithms based on Map Reduce were implemented for K-means algorithm and C4.5 algorithm, and the corresponding shortcomings were improved. Experiments show that the data mining algorithm based on Hadoop framework can improve the stability and accuracy of the algorithm, and it can deal with the massive data of wine information data analysis system well.
出处 《计算机应用》 CSCD 北大核心 2017年第A01期72-74,79,共4页 journal of Computer Applications
基金 宁夏大学研究生创新项目(GIP201625)
关键词 HADOOP 数据挖据算法 C4.5算法 K-MEANS算法 葡萄酒 数据分析 Hadoop data mining algorithm C4.5 algorithm K-means algorithm wine data analysis
  • 相关文献

参考文献7

二级参考文献233

  • 1林闯,汪洋,李泉林.网络安全的随机模型方法与评价技术[J].计算机学报,2005,28(12):1943-1956. 被引量:92
  • 2宁焕生,张瑜,刘芳丽,刘文明,渠慎丰.中国物联网信息服务系统研究[J].电子学报,2006,34(B12):2514-2517. 被引量:151
  • 3樊亚军,刘久文.TPM安全芯片设计与实现[J].信息安全与通信保密,2007,29(6):136-137. 被引量:5
  • 4张旻晋 桂文明 苏递生 等.从终端到网络的可信计算技术.信息技术快报,2006,4(2):21-34.
  • 5Dean J, Ghemawat S. MapReduee: Simplied Data Processing on Large Clusters[ C ]//Proceedings of the 6th Conference on Symposium on Operating Systems. Design & Implementation. [ s. 1. ] : USENIX Association, 2004.
  • 6Catanzaro B C, Sundaram N, Keutzer K. A Map Reduce Framework for Programming Graphics Processors[ C ]//Workshop on Software Tools for MultiCore. [ s. 1. ] : [ s. n. ] ,2006.
  • 7Ranger C, Raghuraman R, Penmetsa A, et al. Evaluating MapReduce for Multi-core and Multi-processor Systems [ C ]// HPCA. [s. 1. ] :[s. n. ] ,2007:13-24.
  • 8Sarje A, Alum S. A MapReduce Style Framework for Trees [ R ]. [ s. l. ] : Department of Electrical and Computer Engineering,2008 : 17-18.
  • 9Hadoop. The Apache Software Foundation[ EB/OL]. 2010. http ://hadoop. apache, org/core.
  • 10Bialecki A,Cafarella M,Cutting D,et al. Hadoop:a framework for running applications on large clusters buih of commodity hardware[ EB/OL]. 2005. Wiki at http://lucene. apache. org/hadoop.

共引文献587

同被引文献46

引证文献6

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部