The second-order random walk has recently been shown to effectively improve the accuracy in graph analysis tasks.Existing work mainly focuses on centralized second-order random walk(SOW)algorithms.SOW algorithms rely ...The second-order random walk has recently been shown to effectively improve the accuracy in graph analysis tasks.Existing work mainly focuses on centralized second-order random walk(SOW)algorithms.SOW algorithms rely on edge-to-edge transition probabilities to generate next random steps.However,it is prohibitively costly to store all the probabilities for large-scale graphs,and restricting the number of probabilities to consider can negatively impact the accuracy of graph analysis tasks.In this paper,we propose and study an alternative approach,SOOP(second-order random walks with on-demand probability computation),that avoids the space overhead by computing the edge-to-edge transition probabilities on demand during the random walk.However,the same probabilities may be computed multiple times when the same edge appears multiple times in SOW,incurring extra cost for redundant computation and communication.We propose two optimization techniques that reduce the complexity of computing edge-to-edge transition probabilities to generate next random steps,and reduce the cost of communicating out-neighbors for the probability computation,respectively.Our experiments on real-world and synthetic graphs show that SOOP achieves orders of magnitude better performance than baseline precompute solutions,and it can efficiently computes SOW algorithms on billion-scale graphs.展开更多
It is challenging to model the performance of distributed graph computation.Explicit formulation cannot easily capture the diversified factors and complex interactions in the system.Statistical learning methods requir...It is challenging to model the performance of distributed graph computation.Explicit formulation cannot easily capture the diversified factors and complex interactions in the system.Statistical learning methods require a large number of training samples to generate an accurate prediction model.However,it is time-consuming to run the required graph computation tests to obtain the training samples.In this paper,we propose TransGPerf,a transfer learning based solution that can exploit prior knowledge from a source scenario and utilize a manageable amount of training data for modeling the performance of a target graph computation scenario.Experimental results show that our proposed method is capable of generating accurate models for a wide range of graph computation tasks on PowerGraph and GraphX.It outperforms transfer learning methods proposed for other applications in the literature.展开更多
文摘The second-order random walk has recently been shown to effectively improve the accuracy in graph analysis tasks.Existing work mainly focuses on centralized second-order random walk(SOW)algorithms.SOW algorithms rely on edge-to-edge transition probabilities to generate next random steps.However,it is prohibitively costly to store all the probabilities for large-scale graphs,and restricting the number of probabilities to consider can negatively impact the accuracy of graph analysis tasks.In this paper,we propose and study an alternative approach,SOOP(second-order random walks with on-demand probability computation),that avoids the space overhead by computing the edge-to-edge transition probabilities on demand during the random walk.However,the same probabilities may be computed multiple times when the same edge appears multiple times in SOW,incurring extra cost for redundant computation and communication.We propose two optimization techniques that reduce the complexity of computing edge-to-edge transition probabilities to generate next random steps,and reduce the cost of communicating out-neighbors for the probability computation,respectively.Our experiments on real-world and synthetic graphs show that SOOP achieves orders of magnitude better performance than baseline precompute solutions,and it can efficiently computes SOW algorithms on billion-scale graphs.
文摘It is challenging to model the performance of distributed graph computation.Explicit formulation cannot easily capture the diversified factors and complex interactions in the system.Statistical learning methods require a large number of training samples to generate an accurate prediction model.However,it is time-consuming to run the required graph computation tests to obtain the training samples.In this paper,we propose TransGPerf,a transfer learning based solution that can exploit prior knowledge from a source scenario and utilize a manageable amount of training data for modeling the performance of a target graph computation scenario.Experimental results show that our proposed method is capable of generating accurate models for a wide range of graph computation tasks on PowerGraph and GraphX.It outperforms transfer learning methods proposed for other applications in the literature.