Offiine network traffic analysis is very important for an in-depth study upon the understanding of network conditions and characteristics, such as user behavior and abnormal traffic. With the rapid growth of the amoun...Offiine network traffic analysis is very important for an in-depth study upon the understanding of network conditions and characteristics, such as user behavior and abnormal traffic. With the rapid growth of the amount of information on the Intemet, the traditional stand-alone analysis tools face great challenges in storage capacity and computing efficiency, but which is the advantages for Hadoop cluster. In this paper, we designed an offiine traffic analysis system based on Hadoop (OTASH), and proposed a MapReduce-based algorithm for TopN user statistics. In addition, we studied the computing performance and failure tolerance in OTASH. From the experiments we drew the conclusion that OTASH is suitable for handling large amounts of flow data, and are competent to calculate in the case of single node failure.展开更多
We present an approach to optimize the MapReduce architecture, which could make heterogeneous cloud environment more stable and efficient. Fundamentally different from previous methods, our approach introduces the mac...We present an approach to optimize the MapReduce architecture, which could make heterogeneous cloud environment more stable and efficient. Fundamentally different from previous methods, our approach introduces the machine learning technique into MapReduce framework, and dynamically improve MapReduce algorithm according to the statistics result of machine learning. There are three main aspects: learning machine performance, reduce task assignment algorithm based on learning result, and speculative execution optimization mechanism. Furthermore, there are two important features in our approach. First, the MapReduce framework can obtain nodes' performance values in the cluster through machine learning module. And machine learning module will daily calibrate nodes' performance values to make an accurate assessment of cluster performance. Second, with the optimization of tasks assignment algorithm, we can maximize the performance of heterogeneous clusters. According to our evaluation result, the cluster performance could have 19% improvement in current heterogeneous cloud environment, and the stability of cluster has greatly enhanced.展开更多
基金supported by the Important National Science & Technology Specific Projects (2012ZX03002008)the National Natural Science Foundation of China (61072061)The Fundamental Research Funds for the Central Universities (2012RC0121)
文摘Offiine network traffic analysis is very important for an in-depth study upon the understanding of network conditions and characteristics, such as user behavior and abnormal traffic. With the rapid growth of the amount of information on the Intemet, the traditional stand-alone analysis tools face great challenges in storage capacity and computing efficiency, but which is the advantages for Hadoop cluster. In this paper, we designed an offiine traffic analysis system based on Hadoop (OTASH), and proposed a MapReduce-based algorithm for TopN user statistics. In addition, we studied the computing performance and failure tolerance in OTASH. From the experiments we drew the conclusion that OTASH is suitable for handling large amounts of flow data, and are competent to calculate in the case of single node failure.
基金supported by the Important National Science & Technology Specific Projects (2012ZX03002008)the 111 Project of China (B08004)the Fundamental Research Funds for the Central Universities (2012RC0121)
文摘We present an approach to optimize the MapReduce architecture, which could make heterogeneous cloud environment more stable and efficient. Fundamentally different from previous methods, our approach introduces the machine learning technique into MapReduce framework, and dynamically improve MapReduce algorithm according to the statistics result of machine learning. There are three main aspects: learning machine performance, reduce task assignment algorithm based on learning result, and speculative execution optimization mechanism. Furthermore, there are two important features in our approach. First, the MapReduce framework can obtain nodes' performance values in the cluster through machine learning module. And machine learning module will daily calibrate nodes' performance values to make an accurate assessment of cluster performance. Second, with the optimization of tasks assignment algorithm, we can maximize the performance of heterogeneous clusters. According to our evaluation result, the cluster performance could have 19% improvement in current heterogeneous cloud environment, and the stability of cluster has greatly enhanced.