DNS(domain name system) query log analysis has been a popular research topic in recent years. CLOPE, the represented transactional clustering algorithm, could be readily used for DNS query log mining. However, the alg...DNS(domain name system) query log analysis has been a popular research topic in recent years. CLOPE, the represented transactional clustering algorithm, could be readily used for DNS query log mining. However, the algorithm is inefficient when processing large scale data. The MR-CLOPE algorithm is proposed, which is an extension and improvement on CLOPE based on Map Reduce. Different from the previous parallel clustering method, a two-stage Map Reduce implementation framework is proposed. Each of the stage is implemented by one kind Map Reduce task. In the first stage, the DNS query logs are divided into multiple splits and the CLOPE algorithm is executed on each split. The second stage usually tends to iterate many times to merge the small clusters into bigger satisfactory ones. In these two stages, a novel partition process is designed to randomly spread out original sub clusters, which will be moved and merged in the map phrase of the second phase according to the defined merge criteria. In such way, the advantage of the original CLOPE algorithm is kept and its disadvantages are dealt with in the proposed framework to achieve more excellent clustering performance. The experiment results show that MR-CLOPE is not only faster but also has better clustering quality on DNS query logs compared with CLOPE.展开更多
Recently, virtualization has become more and more important in the cloud computing to support efficient flexible resource provisioning. However, performance interference among virtual machines(VMs) has become a challe...Recently, virtualization has become more and more important in the cloud computing to support efficient flexible resource provisioning. However, performance interference among virtual machines(VMs) has become a challenge which may affect the effectiveness of resource provisioning. In a virtual cluster which runs the Map Reduce applications, the performance interference can also affect the performance of the Map and Reduce tasks and thus cause a performance degradation of the Map Reduce job. Accordingly, this paper presents a Map Reduce scheduling framework to mitigate this performance degradation caused by the performance interference. The framework includes a performance interference prediction module and an interference aware scheduling algorithm. To verify its effectiveness, we have done a set of experiments on a 24-node virtual Map Reduce cluster. The experiments illustrate that the proposed framework can achieve a performance improvement in the virtualized environment compared with other Map Reduce schedulers.展开更多
基金Project(61103046) supported in part by the National Natural Science Foundation of ChinaProject(B201312) supported by DHU Distinguished Young Professor Program,China+1 种基金Project(LY14F020007) supported by Zhejiang Provincial Natural Science Funds of ChinaProject(2014A610072) supported by the Natural Science Foundation of Ningbo City,China
文摘DNS(domain name system) query log analysis has been a popular research topic in recent years. CLOPE, the represented transactional clustering algorithm, could be readily used for DNS query log mining. However, the algorithm is inefficient when processing large scale data. The MR-CLOPE algorithm is proposed, which is an extension and improvement on CLOPE based on Map Reduce. Different from the previous parallel clustering method, a two-stage Map Reduce implementation framework is proposed. Each of the stage is implemented by one kind Map Reduce task. In the first stage, the DNS query logs are divided into multiple splits and the CLOPE algorithm is executed on each split. The second stage usually tends to iterate many times to merge the small clusters into bigger satisfactory ones. In these two stages, a novel partition process is designed to randomly spread out original sub clusters, which will be moved and merged in the map phrase of the second phase according to the defined merge criteria. In such way, the advantage of the original CLOPE algorithm is kept and its disadvantages are dealt with in the proposed framework to achieve more excellent clustering performance. The experiment results show that MR-CLOPE is not only faster but also has better clustering quality on DNS query logs compared with CLOPE.
基金supported in part by the National Key Technology R&D Program of the Ministry of Science and Technology (2015BAH09F02, 2015BAH47F03)National Natural Science Foundation of China(60903008,61073062)the Fundamental Research Funds for the Central Universities(N130417002, N130404011)
文摘Recently, virtualization has become more and more important in the cloud computing to support efficient flexible resource provisioning. However, performance interference among virtual machines(VMs) has become a challenge which may affect the effectiveness of resource provisioning. In a virtual cluster which runs the Map Reduce applications, the performance interference can also affect the performance of the Map and Reduce tasks and thus cause a performance degradation of the Map Reduce job. Accordingly, this paper presents a Map Reduce scheduling framework to mitigate this performance degradation caused by the performance interference. The framework includes a performance interference prediction module and an interference aware scheduling algorithm. To verify its effectiveness, we have done a set of experiments on a 24-node virtual Map Reduce cluster. The experiments illustrate that the proposed framework can achieve a performance improvement in the virtualized environment compared with other Map Reduce schedulers.