With the rapid development of the Internet,many enterprises have launched their network platforms.When users browse,search,and click the products of these platforms,most platforms will keep records of these network be...With the rapid development of the Internet,many enterprises have launched their network platforms.When users browse,search,and click the products of these platforms,most platforms will keep records of these network behaviors,these records are often heterogeneous,and it is called log data.To effectively to analyze and manage these heterogeneous log data,so that enterprises can grasp the behavior characteristics of their platform users in time,to realize targeted recommendation of users,increase the sales volume of enterprises’products,and accelerate the development of enterprises.Firstly,we follow the process of big data collection,storage,analysis,and visualization to design the system,then,we adopt HDFS storage technology,Yarn resource management technology,and gink load balancing technology to build a Hadoop cluster to process the log data,and adopt MapReduce processing technology and data warehouse hive technology analyze the log data to obtain the results.Finally,the obtained results are displayed visually,and a log data analysis system is successfully constructed.It has been proved by practice that the system effectively realizes the collection,analysis and visualization of log data,and can accurately realize the recommendation of products by enterprises.The system is stable and effective.展开更多
The data and internet are highly growing which causes problems in management of the big-data.For these kinds of problems,there are many software frameworks used to increase the performance of the distributed system.Th...The data and internet are highly growing which causes problems in management of the big-data.For these kinds of problems,there are many software frameworks used to increase the performance of the distributed system.This software is used for the availability of large data storage.One of the most beneficial software frameworks used to utilize data in distributed systems is Hadoop.This paper introduces Apache Hadoop architecture,components of Hadoop,their significance in managing vast volumes of data in a distributed system.Hadoop Distributed File System enables the storage of enormous chunks of data over a distributed network.Hadoop Framework maintains fsImage and edits files,which supports the availability and integrity of data.This paper includes cases of Hadoop implementation,such as monitoring weather,processing bioinformatics.展开更多
In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose...In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average.展开更多
Hadoop technology is followed by some security issues. At its beginnings, developers paid attention to the development of basic functionalities mostly, and proposal of security components was not of prime interest. Be...Hadoop technology is followed by some security issues. At its beginnings, developers paid attention to the development of basic functionalities mostly, and proposal of security components was not of prime interest. Because of that, the technology remained vulnerable to malicious activities of unauthorized users whose purpose is to endanger system functionalities or to compromise private user data. Researchers and developers are continuously trying to solve these issues by upgrading Hadoop’s security mechanisms and preventing undesirable malicious activities. In this paper, the most common HDFS security problems and a review of unauthorized access issues are presented. First, Hadoop mechanism and its main components are described as the introduction part of the leading research problem. Then, HDFS architecture is given, and all including components and functionalities are introduced. Further, all possible types of users are listed with an accent on unauthorized users, which are of great importance for the paper. One part of the research is dedicated to the consideration of Hadoop security levels, environment and user assessments. The review also includes an explanation of Log Monitoring and Audit features, and detail consideration of authorization and authentication issues. Possible consequences of unauthorized access to a system are covered, and a few recommendations for solving problems of unauthorized access are offered. Honeypot nodes, security mechanisms for collecting valuable information about malicious parties, are presented in the last part of the paper. Finally, the idea for developing a new type of Intrusion Detector, which will be based on using an artificial neural network, is presented. The detector will be an integral part of a new kind of virtual honeypot mechanism and represents the initial base for future scientific work of authors.展开更多
In the market of agricultural products, the price of agricultural products is affected by production cost, market supply and other factors. In order to obtain the market information of agricultural products, the price...In the market of agricultural products, the price of agricultural products is affected by production cost, market supply and other factors. In order to obtain the market information of agricultural products, the price fluctuation can be analyzed and predicted. A distributed big data software platform based on Hadoop, Hive and Spark is proposed to analyze and forecast agricultural price data. Firstly, Hadoop, Hive and Spark big data frameworks were built to store the data information of agricultural products crawled into MYSQL. Secondly, the information of agricultural products crawled from MYSQL was exported to a text file, uploaded to HDFS, and mapped to spark SQL database. The data was cleaned and improved by Holt-Winters (three times exponential smoothing method) model to predict the price of agricultural products in the future. The data cleaned by spark SQL was imported and predicted by improved Holt-Winters into MYSQL database. The technologies of pringMVC, Ajax and Echarts were used to visualize the data.展开更多
Hadoop framework emerged at the right moment when traditional tools were powerless in terms of handling big data. Hadoop Distributed File System (HDFS) which serves as a highly fault-tolerance distributed file system ...Hadoop framework emerged at the right moment when traditional tools were powerless in terms of handling big data. Hadoop Distributed File System (HDFS) which serves as a highly fault-tolerance distributed file system in Hadoop, can improve the throughput of data access effectively. It is very suitable for the application of handling large amounts of datasets. However, Hadoop has the disadvantage that the memory usage rate in NameNode is so high when processing large amounts of small files that it has become the limit of the whole system. In this paper, we propose an approach to optimize the performance of HDFS with small files. The basic idea is to merge small files into a large one whose size is suitable for a block. Furthermore, indexes are built to meet the requirements for fast access to all files in HDFS. Preliminary experiment results show that our approach achieves better performance.展开更多
In view of the existing recommendation system in the Big Data have two insufficiencies:poor scalability of the data storage and poor expansibility of the recommendation algorithm,research and analysis the IBCF algorit...In view of the existing recommendation system in the Big Data have two insufficiencies:poor scalability of the data storage and poor expansibility of the recommendation algorithm,research and analysis the IBCF algorithm and the working principle of Hadoop and HBase platform,a scheme for optimizing the design of personalized recommendation system based on Hadoop and HBase platform is proposed.The experimental results show that,using the HBase database can effectively solve the problem of mass data storage,using the MapReduce programming model of Hadoop platform parallel processing recommendation problem,can significantly improve the efficiency of the algorithm,so as to further improve the performance of personalized recommendation system.展开更多
基金supported by the Huaihua University Science Foundation under Grant HHUY2019-24.
文摘With the rapid development of the Internet,many enterprises have launched their network platforms.When users browse,search,and click the products of these platforms,most platforms will keep records of these network behaviors,these records are often heterogeneous,and it is called log data.To effectively to analyze and manage these heterogeneous log data,so that enterprises can grasp the behavior characteristics of their platform users in time,to realize targeted recommendation of users,increase the sales volume of enterprises’products,and accelerate the development of enterprises.Firstly,we follow the process of big data collection,storage,analysis,and visualization to design the system,then,we adopt HDFS storage technology,Yarn resource management technology,and gink load balancing technology to build a Hadoop cluster to process the log data,and adopt MapReduce processing technology and data warehouse hive technology analyze the log data to obtain the results.Finally,the obtained results are displayed visually,and a log data analysis system is successfully constructed.It has been proved by practice that the system effectively realizes the collection,analysis and visualization of log data,and can accurately realize the recommendation of products by enterprises.The system is stable and effective.
文摘The data and internet are highly growing which causes problems in management of the big-data.For these kinds of problems,there are many software frameworks used to increase the performance of the distributed system.This software is used for the availability of large data storage.One of the most beneficial software frameworks used to utilize data in distributed systems is Hadoop.This paper introduces Apache Hadoop architecture,components of Hadoop,their significance in managing vast volumes of data in a distributed system.Hadoop Distributed File System enables the storage of enormous chunks of data over a distributed network.Hadoop Framework maintains fsImage and edits files,which supports the availability and integrity of data.This paper includes cases of Hadoop implementation,such as monitoring weather,processing bioinformatics.
文摘In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average.
文摘Hadoop technology is followed by some security issues. At its beginnings, developers paid attention to the development of basic functionalities mostly, and proposal of security components was not of prime interest. Because of that, the technology remained vulnerable to malicious activities of unauthorized users whose purpose is to endanger system functionalities or to compromise private user data. Researchers and developers are continuously trying to solve these issues by upgrading Hadoop’s security mechanisms and preventing undesirable malicious activities. In this paper, the most common HDFS security problems and a review of unauthorized access issues are presented. First, Hadoop mechanism and its main components are described as the introduction part of the leading research problem. Then, HDFS architecture is given, and all including components and functionalities are introduced. Further, all possible types of users are listed with an accent on unauthorized users, which are of great importance for the paper. One part of the research is dedicated to the consideration of Hadoop security levels, environment and user assessments. The review also includes an explanation of Log Monitoring and Audit features, and detail consideration of authorization and authentication issues. Possible consequences of unauthorized access to a system are covered, and a few recommendations for solving problems of unauthorized access are offered. Honeypot nodes, security mechanisms for collecting valuable information about malicious parties, are presented in the last part of the paper. Finally, the idea for developing a new type of Intrusion Detector, which will be based on using an artificial neural network, is presented. The detector will be an integral part of a new kind of virtual honeypot mechanism and represents the initial base for future scientific work of authors.
文摘In the market of agricultural products, the price of agricultural products is affected by production cost, market supply and other factors. In order to obtain the market information of agricultural products, the price fluctuation can be analyzed and predicted. A distributed big data software platform based on Hadoop, Hive and Spark is proposed to analyze and forecast agricultural price data. Firstly, Hadoop, Hive and Spark big data frameworks were built to store the data information of agricultural products crawled into MYSQL. Secondly, the information of agricultural products crawled from MYSQL was exported to a text file, uploaded to HDFS, and mapped to spark SQL database. The data was cleaned and improved by Holt-Winters (three times exponential smoothing method) model to predict the price of agricultural products in the future. The data cleaned by spark SQL was imported and predicted by improved Holt-Winters into MYSQL database. The technologies of pringMVC, Ajax and Echarts were used to visualize the data.
文摘Hadoop framework emerged at the right moment when traditional tools were powerless in terms of handling big data. Hadoop Distributed File System (HDFS) which serves as a highly fault-tolerance distributed file system in Hadoop, can improve the throughput of data access effectively. It is very suitable for the application of handling large amounts of datasets. However, Hadoop has the disadvantage that the memory usage rate in NameNode is so high when processing large amounts of small files that it has become the limit of the whole system. In this paper, we propose an approach to optimize the performance of HDFS with small files. The basic idea is to merge small files into a large one whose size is suitable for a block. Furthermore, indexes are built to meet the requirements for fast access to all files in HDFS. Preliminary experiment results show that our approach achieves better performance.
文摘In view of the existing recommendation system in the Big Data have two insufficiencies:poor scalability of the data storage and poor expansibility of the recommendation algorithm,research and analysis the IBCF algorithm and the working principle of Hadoop and HBase platform,a scheme for optimizing the design of personalized recommendation system based on Hadoop and HBase platform is proposed.The experimental results show that,using the HBase database can effectively solve the problem of mass data storage,using the MapReduce programming model of Hadoop platform parallel processing recommendation problem,can significantly improve the efficiency of the algorithm,so as to further improve the performance of personalized recommendation system.