期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Balance Resource Allocation for Spark Jobs Based on Prediction of the Optimal Resource 被引量:5
1
作者 zhiyao hu Dongsheng Li Deke Guo 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2020年第4期487-497,共11页
Apache Spark provides a well-known Map Reduce computing framework,aiming to fast-process big data analytics in data-parallel manners.With this platform,large input data are divided into data partitions.Each data parti... Apache Spark provides a well-known Map Reduce computing framework,aiming to fast-process big data analytics in data-parallel manners.With this platform,large input data are divided into data partitions.Each data partition is processed by multiple computation tasks concurrently.Outputs of these computation tasks are transferred among multiple computers via the network.However,such a distributed computing framework suffers from system overheads,inevitably caused by communication and disk I/O operations.System overheads take up a large proportion of the Job Completion Time(JCT).We observed that excessive computational resources incurs considerable system overheads,prolonging the JCT.The over-allocation of individual jobs not only prolongs their own JCTs,but also likely makes other jobs suffer from under-allocation.Thus,the average JCT is suboptimal,too.To address this problem,we propose a prediction model to estimate the changing JCT of a single Spark job.With the support of the prediction method,we designed a heuristic algorithm to balance the resource allocation of multiple Spark jobs,aiming to minimize the average JCT in multiple-job cases.We implemented the prediction model and resource allocation method in Re B,a Resource-Balancer based on Apache Spark.Experimental results showed that Re B significantly outperformed the traditional max-min fairness and shortest-job-optimal methods.The average JCT was decreased by around 10%–30%compared to the existing solutions. 展开更多
关键词 SPARK JOBS RESOURCE over-allocation performance PREDICTION
原文传递
Improved Heuristic Job Scheduling Method to Enhance Throughput for Big Data Analytics 被引量:1
2
作者 zhiyao hu Dongsheng Li 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第2期344-357,共14页
Data-parallel computing platforms,such as Hadoop and Spark,are deployed in computing clusters for big data analytics.There is a general tendency that multiple users share the same computing cluster.The schedule of mul... Data-parallel computing platforms,such as Hadoop and Spark,are deployed in computing clusters for big data analytics.There is a general tendency that multiple users share the same computing cluster.The schedule of multiple jobs becomes a serious challenge.Over a long period in the past,the Shortest-Job-First(SJF)method has been considered as the optimal solution to minimize the average job completion time.However,the SJF method leads to a low system throughput in the case where a small number of short jobs consume a large amount of resources.This factor prolongs the average job completion time.We propose an improved heuristic job scheduling method,called the Densest-Job-Set-First(DJSF)method.The DJSF method schedules jobs by maximizing the number of completed jobs per unit time,aiming to decrease the average Job Completion Time(JCT)and improve the system throughput.We perform extensive simulations based on Google cluster data.Compared with the SJF method,the DJSF method decreases the average JCT by 23.19% and enhances the system throughput by 42.19%.Compared with Tetris,the job packing method improves the job completion efficiency by 55.4%,so that the computing platforms complete more jobs in a short time span. 展开更多
关键词 big data job scheduling job throughput job completion time job completion efficiency
原文传递
Uncertain multicast under dynamic behaviors
3
作者 Yudong QIN Deke GUO +1 位作者 zhiyao hu Bangbang REN 《Frontiers of Computer Science》 SCIE EI CSCD 2020年第1期130-145,共16页
Multicast transfer can efficiently save the bandwidth consumption and reduce the load on the source node than a series of independent unicast transfers.Nowadays,many applications employ the content replica strategy to... Multicast transfer can efficiently save the bandwidth consumption and reduce the load on the source node than a series of independent unicast transfers.Nowadays,many applications employ the content replica strategy to improve the robustness and efficiency;hence each file and its replicas are usually distributed among multiple sources.In such scenarios,the traditional deterministic multicast develops into the Uncertain multicast,which has more flexibility in the source selection.In this paper,we focus on building and maintaining a minimal cost forest(MCF)for any uncertain multicast,whose group members(source nodes and destination nodes)may join or leave after constructing a MCF.We formulate this dynamic minimal cost forest(DMCF)problem as a mixed integer programming model.We then design three dedicated methods to approximate the optimal solution.Among them,our a-MCF aims to efficiently construct an MCF for any given uncertain multicast,without dynamic behaviors of multicast group members.The d-MCF method motivates to slightly update the existing MCF via local modifications once appearing a dynamic behavior.It can achieve the balance between the minimal cost and the minimal modifications to the existing forest.The last r-MCF is a supplement method to the d-MCF method,since many rounds of local modifications maymake the resultant forest far away from the optimal forest.Accordingly,our r-MCF method monitors the accumulated degradation and triggers the rearrangement process to reconstruct an new MCF when necessary.The comprehensive evaluation results demonstrate that our methods can well tackle the proposed DMCF problem. 展开更多
关键词 UNCERTAIN MULTICAST dynamic BEHAVIORS ROUTING FOREST approximation algorithm STEINER tree
原文传递
Comparing Set Reconciliation Methods Based on Bloom Filters and Their Variants
4
作者 zhiyao hu Xiaoqiang Teng +3 位作者 Deke Guo Bangbang Ren Pin Lv Zhong Liu 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2016年第2期157-167,共11页
Set reconciliation between two nodes is widely used in network applications. The basic idea is that each member of a node pair has an object set and seeks to deliver its unique objects to the other member. The Standar... Set reconciliation between two nodes is widely used in network applications. The basic idea is that each member of a node pair has an object set and seeks to deliver its unique objects to the other member. The Standard Bloom Filter (SBF) and its variants, such as the Invertible Bloom Filter (IBF), are effective approaches to solving the set reconciliation problem. The SBF-based method requires each node to represent its objects using an SBF, which is exchanged with the other node. A receiving node queries the received SBF against its local objects to identify the unique objects. Finally, each node exchanges its unique objects with the other node in the node pair. For the IBF- based method, each node represents its objects using an IBF, which is then exchanged. A receiving node subtracts the received IBF from its local IBF so as to decode the different objects between the two sets. Intuitively, it would seem that the IBF-based method, with only one round of communication, entails less communication overhead than the SBF-based method, which incurs two rounds of communication. Our research results, however, indicate that neither of these two methods has an absolute advantages over the others. In this paper, we aim to provide an in-depth understanding of the two methods, by evaluating and comparing their communication overhead. We find that the best method depends on parameter settings. We demonstrate that the SBF-based method outperforms the IBF-based method in most cases. But when the number of different objects in the two sets is below a certain threshold, the IBF-based method outperforms the SBF-based method. 展开更多
关键词 set reconciliation bloom filter communication overheads
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部