摘要
随着信息时代的到来和计算机技术的发展,各行各业的数据呈指数型增长。同时,大数据的快速发展在无时无刻影响着人们的生活。而对大数据的开发和处理成为当下信息时代的一大挑战,因此文章利用Hadoop生态体系构建一个完全分布式集群,利用分布式文件系统(HadoopDistributedFileSystem,HDFS)存储数据,利用MapReduce框架分布式处理数据分析任务,分析了搭建Hadoop系统所需的设备、环境、安装和设置等,为大数据实践提供了基础环境,也为下一步深层次理论研究提供技术依托。
With the arrival of the information age and the development of computer technology,the data of all walks of life show exponential growth.At the same time,the rapid development of big data affects people’s lives all the time.The development and processing of big data has become a major challenge in the current information age.Therefore,this paper uses the Hadoop ecosystem to build a fully distributed cluster,uses the Hadoop Distributed File System(HDFS)to store data,and uses the MapReduce framework to process data analysis tasks in a distributed manner.It analyzes the equipment,environment,installation and settings required for building the Hadoop system,It provides a basic environment for big data practice and a technical support for further in-depth theoretical research.
作者
杨治学
王静静
YANG Zhixue;WANG Jingjing(School of Information Engineering,Changji University,Changji Xinjiang 831100,China)
出处
《信息与电脑》
2022年第20期130-133,共4页
Information & Computer