摘要
针对大数据时代如何存储、处理、分析、利用海量的电子数据,以及传统数据中心向云数据中心转型进程中大量服务器被闲置的问题,对Hadoop家族中的关键技术HDFS、Map Reduce、Mahout进行深入研究,并在此基础上提出了基于云平台的Hadoop集群应用研究方案。方案包括Hadoop集群拓扑结构、开发运行环境部署流程及基于Hadoop集群的Mahout中贝叶斯分类算法的实现。实验作为整合数据中心资源进行规模部署Hadoop集群的研究基础,证明了Hadoop集群的可用性及其在数据分析方面良好的适应性。
To solve the issues of how to store, process, analyze and utilize the vast amount of electronic da-ta in the big data era, and utilize the large number of servers in the transition process of traditional datacenter to cloud data center, the article advanced a set of Hadoop cluster application research schemebased on cloud platform through in-depth study of the key technologies of HDFS, Map Reduce, Mahout inthe Hadoop family. The scheme includes the topology of the Hadoop cluster, the development process ofdevelopment and operating environment, and the implementation of the bayesian classification algorithm inMahout based on Hadoop cluster.
出处
《河南科技》
2017年第21期25-28,共4页
Henan Science and Technology