Unstructured and irregular graph data causes strong randomness and poor locality of data accesses in graph processing.This paper optimizes the depth-branch-resorting algorithm(DBR),and proposes a branch-alternation-re...Unstructured and irregular graph data causes strong randomness and poor locality of data accesses in graph processing.This paper optimizes the depth-branch-resorting algorithm(DBR),and proposes a branch-alternation-resorting algorithm(BAR).In order to make the algorithm run in parallel and improve the efficiency of algorithm operation,the BAR algorithm is mapped onto the reconfigurable array processor(APR-16)to achieve vertex reordering,effectively improving the locality of graph data.This paper validates the BAR algorithm on the GraphBIG framework,by utilizing the reordered dataset with BAR on breadth-first search(BFS),single source shortest paht(SSSP)and betweenness centrality(BC)algorithms for traversal.The results show that compared with DBR and Corder algorithms,BAR can reduce execution time by up to 33.00%,and 51.00%seperatively.In terms of data movement,the BAR algorithm has a maximum reduction of 39.00%compared with the DBR algorithm and 29.66%compared with Corder algorithm.In terms of computational complexity,the BAR algorithm has a maximum reduction of 32.56%compared with DBR algorithm and53.05%compared with Corder algorithm.展开更多
Graph processing has been widely used in many scenarios,from scientific computing to artificial intelligence.Graph processing exhibits irregular computational parallelism and random memory accesses,unlike traditional ...Graph processing has been widely used in many scenarios,from scientific computing to artificial intelligence.Graph processing exhibits irregular computational parallelism and random memory accesses,unlike traditional workloads.Therefore,running graph processing workloads on conventional architectures(e.g.,CPUs and GPUs)often shows a significantly low compute-memory ratio with few performance benefits,which can be,in many cases,even slower than a specialized single-thread graph algorithm.While domain-specific hardware designs are essential for graph processing,it is still challenging to transform the hardware capability to performance boost without coupled software codesigns.This article presents a graph processing ecosystem from hardware to software.We start by introducing a series of hardware accelerators as the foundation of this ecosystem.Subsequently,the codesigned parallel graph systems and their distributed techniques are presented to support graph applications.Finally,we introduce our efforts on novel graph applications and hardware architectures.Extensive results show that various graph applications can be efficiently accelerated in this graph processing ecosystem.展开更多
Graphs that are used to model real-world entities with vertices and relationships among entities with edges,have proven to be a powerful tool for describing real-world problems in applications.In most real-world scena...Graphs that are used to model real-world entities with vertices and relationships among entities with edges,have proven to be a powerful tool for describing real-world problems in applications.In most real-world scenarios,entities and their relationships are subject to constant changes.Graphs that record such changes are called dynamic graphs.In recent years,the widespread application scenarios of dynamic graphs have stimulated extensive research on dynamic graph processing systems that continuously ingest graph updates and produce up-to-date graph analytics results.As the scale of dynamic graphs becomes larger,higher performance requirements are demanded to dynamic graph processing systems.With the massive parallel processing power and high memory bandwidth,GPUs become mainstream vehicles to accelerate dynamic graph processing tasks.GPU-based dynamic graph processing systems mainly address two challenges:maintaining the graph data when updates occur(i.e.,graph updating)and producing analytics results in time(i.e.,graph computing).In this paper,we survey GPU-based dynamic graph processing systems and review their methods on addressing both graph updating and graph computing.To comprehensively discuss existing dynamic graph processing systems on GPUs,we first introduce the terminologies of dynamic graph processing and then develop a taxonomy to describe the methods employed for graph updating and graph computing.In addition,we discuss the challenges and future research directions of dynamic graph processing on GPUs.展开更多
In order to analyze and process the large graphs with high cost efficiency,researchers have developed a number of out-of-core graph processing systems in recent years based on just one commodity computer.On the other ...In order to analyze and process the large graphs with high cost efficiency,researchers have developed a number of out-of-core graph processing systems in recent years based on just one commodity computer.On the other hand,with the rapidly growing need of analyzing graphs in the real-world,graph processing systems have to efficiently handle massive concurrent graph processing(CGP)jobs.Unfortunately,due to the inherent design for single graph processing job,existing out-of-core graph processing systems usually incur unnecessary data accesses and severe competition of I/O bandwidth when handling the CGP jobs.In this paper,we propose GraphCP,a disk I/O optimized out-of-core graph processing system that efficiently supports the processing of CGP jobs.GraphCP proposes a benefit-aware sharing execution model to share the I/O access and processing of graph data among the CGP jobs and adaptively schedule the graph data loading based on the states of vertices,which efficiently overcomes above challenges faced by existing out-of-core graph processing systems.Moreover,GraphCP adopts a dependency-based future-vertex updating model so as to reduce disk I/Os in the future iterations.In addition,GraphCP organizes the graph data with a Source-Sorted Sub-Block graph representation for better processing capacity and I/O access locality.Extensive evaluation results show that GraphCP is 20.5×and 8.9×faster than two out-of-core graph processing systems GridGraph and GraphZ,and 3.5×and 1.7×faster than two state-of-art concurrent graph processing systems Seraph and GraphSO.展开更多
The networks are fundamental to our modern world and they appear throughout science and society.Access to a massive amount of data presents a unique opportunity to the researcher’s community.As networks grow in size ...The networks are fundamental to our modern world and they appear throughout science and society.Access to a massive amount of data presents a unique opportunity to the researcher’s community.As networks grow in size the complexity increases and our ability to analyze them using the current state of the art is at severe risk of failing to keep pace.Therefore,this paper initiates a discussion on graph signal processing for large-scale data analysis.We first provide a comprehensive overview of core ideas in Graph signal processing(GSP)and their connection to conventional digital signal processing(DSP).We then summarize recent developments in developing basic GSP tools,including methods for graph filtering or graph learning,graph signal,graph Fourier transform(GFT),spectrum,graph frequency,etc.Graph filtering is a basic task that allows for isolating the contribution of individual frequencies and therefore enables the removal of noise.We then consider a graph filter as a model that helps to extend the application of GSP methods to large datasets.To show the suitability and the effeteness,we first created a noisy graph signal and then applied it to the filter.After several rounds of simulation results.We see that the filtered signal appears to be smoother and is closer to the original noise-free distance-based signal.By using this example application,we thoroughly demonstrated that graph filtration is efficient for big data analytics.展开更多
Although many graph processing systems have been proposed, graphs in the real-world are often dynamic. It is important to keep the results of graph computation up-todate. Incremental computation is demonstrated to be ...Although many graph processing systems have been proposed, graphs in the real-world are often dynamic. It is important to keep the results of graph computation up-todate. Incremental computation is demonstrated to be an efficient solution to update calculated results. Recently, many incremental graph processing systems have been proposed to handle dynamic graphs in an asynchronous way and are able to achieve better performance than those processed in a synchronous way. However, these solutions still suffer from sub-optimal convergence speed due to their slow propagation of important vertex state (important to convergence speed) and poor locality. In order to solve these problems, we propose a novel graph processing framework. It introduces a dynamic partition method to gather the important vertices for high locality, and then uses a priority-based scheduling algorithm to assign them with a higher priority for an effective processing order. By such means, it is able to reduce the number of updates and increase the locality, thereby reducing the convergence time. Experimental results show that our method reduces the number of updates by 30%, and reduces the total execution time by 35%, compared with state-of-the-art systems.展开更多
Hybrid pull-push computational model can provide compelling results over either of single one for processing real-world graphs.Programmability and pipeline parallelism of FPGAs make it potential to process different s...Hybrid pull-push computational model can provide compelling results over either of single one for processing real-world graphs.Programmability and pipeline parallelism of FPGAs make it potential to process different stages of graph iterations.Nevertheless,considering the limited on-chip resources and streamline pipeline computation,the efficiency of hybrid model on FPGAs often suffers due to well-known random access feature of graph processing.In this paper,we present a hybrid graph processing system on FPGAs,which can achieve the best of both worlds.Our approach on FPGAs is unique and novel as follow.First,we propose to use edge block(consisting of edges with the same destination vertex set),which allows to sequentially access edges at block granularity for locality while still preserving the precision.Due to the independence of blocks in the sense that all edges in an inactive block are associated with inactive vertices,this also enables to skip invalid blocks for reducing redundant computation.Second,we consider a large number of vertices and their associated edge-blocks to maintain a predictable execution history.We also present to switch models in advance with few stalls using their state statistics.Our evaluation on a wide variety of graph algorithms for many real-world graphs shows that our approach achieves up to 3.69x speedup over state-of-the-art FPGA-based graph processing systems.展开更多
With the rapid growth of real-world graphs,the size of which can easily exceed the on-chip(board)storage capacity of an accelerator,processing large-scale graphs on a single Field Programmable Gate Array(FPGA)becomes ...With the rapid growth of real-world graphs,the size of which can easily exceed the on-chip(board)storage capacity of an accelerator,processing large-scale graphs on a single Field Programmable Gate Array(FPGA)becomes difficult.The multi-FPGA acceleration is of great necessity and importance.Many cloud providers(e.g.,Amazon,Microsoft,and Baidu)now expose FPGAs to users in their data centers,providing opportunities to accelerate large-scale graph processing.In this paper,we present a communication library,called FDGLib,which can easily scale out any existing single FPGA-based graph accelerator to a distributed version in a data center,with minimal hardware engineering efforts.FDGLib provides six APIs that can be easily used and integrated into any FPGA-based graph accelerator with only a few lines of code modifications.Considering the torus-based FPGA interconnection in data centers,FDGLib also improves communication efficiency using simple yet effective torus-friendly graph partition and placement schemes.We interface FDGLib into AccuGraph,a state-of-the-art graph accelerator.Our results on a 32-node Microsoft Catapult-like data center show that the distributed AccuGraph can be 2.32x and 4.77x faster than a state-of-the-art distributed FPGA-based graph accelerator ForeGraph and a distributed CPU-based graph system Gemini,with better scalability.展开更多
Graph is a significant data structure that describes the relationship between entries.Many application domains in the real world are heavily dependent on graph data.However,graph applications are vastly different from...Graph is a significant data structure that describes the relationship between entries.Many application domains in the real world are heavily dependent on graph data.However,graph applications are vastly different from traditional applications.It is inefficient to use general-purpose platforms for graph applications,thus contributing to the research of specific graph processing platforms.In this survey,we systematically categorize the graph workloads and applications,and provide a detailed review of existing graph processing platforms by dividing them into generalpurpose and specialized systems.We thoroughly analyze the implementation technologies including programming models,partitioning strategies,communication models,execution models,and fault tolerance strategies.Finally,we analyze recent advances and present four open problems for future research.展开更多
The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer s...The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer set. These tables are implemented using column-based techniques and are used to store graphs of database, frequent sub-graphs and the neighborhood of nodes. In order to exact checking of remaining graphs, the vertex invariant is used for isomorphism test which can be parallel implemented. The results of evaluation indicate that proposed method outperforms existing methods.展开更多
Identifying influential nodes in complex networks is of both theoretical and practical importance. Existing methods identify influential nodes based on their positions in the network and assume that the nodes are homo...Identifying influential nodes in complex networks is of both theoretical and practical importance. Existing methods identify influential nodes based on their positions in the network and assume that the nodes are homogeneous. However, node heterogeneity (i.e., different attributes such as interest, energy, age, and so on ) ubiquitously exists and needs to be taken into consideration. In this paper, we conduct an investigation into node attributes and propose a graph signal pro- cessing based centrality (GSPC) method to identify influential nodes considering both the node attributes and the network topology. We first evaluate our GSPC method using two real-world datasets. The results show that our GSPC method effectively identifies influential nodes, which correspond well with the underlying ground truth. This is compatible to the previous eigenvector centrality and principal component centrality methods under circumstances where the nodes are homogeneous. In addition, spreading analysis shows that the GSPC method has a positive effect on the spreading dynamics.展开更多
In the graph signal processing(GSP)framework,distributed algorithms are highly desirable in processing signals defined on large-scale networks.However,in most existing distributed algorithms,all nodes homogeneously pe...In the graph signal processing(GSP)framework,distributed algorithms are highly desirable in processing signals defined on large-scale networks.However,in most existing distributed algorithms,all nodes homogeneously perform the local computation,which calls for heavy computational and communication costs.Moreover,in many real-world networks,such as those with straggling nodes,the homogeneous manner may result in serious delay or even failure.To this end,we propose active network decomposition algorithms to select non-straggling nodes(normal nodes)that perform the main computation and communication across the network.To accommodate the decomposition in different kinds of networks,two different approaches are developed,one is centralized decomposition that leverages the adjacency of the network and the other is distributed decomposition that employs the indicator message transmission between neighboring nodes,which constitutes the main contribution of this paper.By incorporating the active decomposition scheme,a distributed Newton method is employed to solve the least squares problem in GSP,where the Hessian inverse is approximately evaluated by patching a series of inverses of local Hessian matrices each of which is governed by one normal node.The proposed algorithm inherits the fast convergence of the second-order algorithms while maintains low computational and communication cost.Numerical examples demonstrate the effectiveness of the proposed algorithm.展开更多
Lexicalized reordering models are very important components of phrasebased translation systems.By examining the reordering relationships between adjacent phrases,conventional methods learn these models from the word a...Lexicalized reordering models are very important components of phrasebased translation systems.By examining the reordering relationships between adjacent phrases,conventional methods learn these models from the word aligned bilingual corpus,while ignoring the effect of the number of adjacent bilingual phrases.In this paper,we propose a method to take the number of adjacent phrases into account for better estimation of reordering models.Instead of just checking whether there is one phrase adjacent to a given phrase,our method firstly uses a compact structure named reordering graph to represent all phrase segmentations of a parallel sentence,then the effect of the adjacent phrase number can be quantified in a forward-backward fashion,and finally incorporated into the estimation of reordering models.Experimental results on the NIST Chinese-English and WMT French-Spanish data sets show that our approach significantly outperforms the baseline method.展开更多
With the popularity of social network, the de- mand for real-time processing of graph data is increasing. However, most of the existing graph systems adopt a batch processing mode, therefore the overhead of maintainin...With the popularity of social network, the de- mand for real-time processing of graph data is increasing. However, most of the existing graph systems adopt a batch processing mode, therefore the overhead of maintaining and processing of dynamic graph is significantly high. In this pa- per, we design iGraph, an incremental graph processing sys- tem for dynamic graph with its continuous updates. The con- tribufions of iGraph include: 1) a hash-based graph partition strategy to enable fine-grained graph updates; 2) a vertex- based graph computing model to support incremental data processing; 3) detection and rebalance methods of hotspot to address the workload imbalance problem during incre- mental processing. Through the general-purpose API, iGraph can be used to implement various graph processing algo- rithms such as PageRank. We have implemented iGraph on Apache Spark, and experimental results show that for real life datasets, iGraph outperforms the original GraphX in respect of graph update and graph computation.展开更多
In this paper, the continuous-time independent edge-Markovian random graph process model is constructed. The authors also define the interval isolated nodes of the random graph process, study the distribution sequence...In this paper, the continuous-time independent edge-Markovian random graph process model is constructed. The authors also define the interval isolated nodes of the random graph process, study the distribution sequence of the number of isolated nodes and the probability of having no isolated nodes when the initial distribution of the random graph process is stationary distribution, derive the lower limit of the probability in which two arbitrary nodes are connected and the random graph is also connected, and prove that the random graph is almost everywhere connected when the number of nodes is sufficiently large.展开更多
Design extraction and reduction have been extensively used in modern VLSI design process. The extracted and reduced design can be efficiently processed by various applications, such as formal verification, simulation,...Design extraction and reduction have been extensively used in modern VLSI design process. The extracted and reduced design can be efficiently processed by various applications, such as formal verification, simulation, automatic test pattern generation (ATPG), etc. This paper presents a new circuit extraction method using program slicing technique, and develops an elegant theoretical basis based on program slicing for circuit extraction from Verilog description. The technique can obtain a chaining slice for given signals of interest. Compared with related researches, the main advantages of the method include that it is fine grain, it has no hardware description language (HDL) coding style limitation; it is precise and is capable of dealing with various Verilog constructions. The technique has been integrated with a commercial simulation environment and incorporated into a design process. The results of practical designs show the significant benefits of the approach.展开更多
基金the National Key R&D Program of China(No.2022ZD0119001)the National Natural Science Foundation of China(No.61834005)+3 种基金the Shaanxi Province Key R&D Plan(No.2022GY-027)the Key Scientific Research Project of Shaanxi Department of Education(No.22JY060)the Education Research Project of XUPT(No.JGA202108)the Graduate Student Innovation Fund of Xi'an University of Posts and Telecommunications(No.CXJJZL2022011)。
文摘Unstructured and irregular graph data causes strong randomness and poor locality of data accesses in graph processing.This paper optimizes the depth-branch-resorting algorithm(DBR),and proposes a branch-alternation-resorting algorithm(BAR).In order to make the algorithm run in parallel and improve the efficiency of algorithm operation,the BAR algorithm is mapped onto the reconfigurable array processor(APR-16)to achieve vertex reordering,effectively improving the locality of graph data.This paper validates the BAR algorithm on the GraphBIG framework,by utilizing the reordered dataset with BAR on breadth-first search(BFS),single source shortest paht(SSSP)and betweenness centrality(BC)algorithms for traversal.The results show that compared with DBR and Corder algorithms,BAR can reduce execution time by up to 33.00%,and 51.00%seperatively.In terms of data movement,the BAR algorithm has a maximum reduction of 39.00%compared with the DBR algorithm and 29.66%compared with Corder algorithm.In terms of computational complexity,the BAR algorithm has a maximum reduction of 32.56%compared with DBR algorithm and53.05%compared with Corder algorithm.
基金supported by the National Key Research and Development Program of China under Grant No.2023YFB4502300.
文摘Graph processing has been widely used in many scenarios,from scientific computing to artificial intelligence.Graph processing exhibits irregular computational parallelism and random memory accesses,unlike traditional workloads.Therefore,running graph processing workloads on conventional architectures(e.g.,CPUs and GPUs)often shows a significantly low compute-memory ratio with few performance benefits,which can be,in many cases,even slower than a specialized single-thread graph algorithm.While domain-specific hardware designs are essential for graph processing,it is still challenging to transform the hardware capability to performance boost without coupled software codesigns.This article presents a graph processing ecosystem from hardware to software.We start by introducing a series of hardware accelerators as the foundation of this ecosystem.Subsequently,the codesigned parallel graph systems and their distributed techniques are presented to support graph applications.Finally,we introduce our efforts on novel graph applications and hardware architectures.Extensive results show that various graph applications can be efficiently accelerated in this graph processing ecosystem.
基金National Natural Science Foundation of China(Grant Nos.61972444,61825202,62072195,and 61832006)Zhejiang Lab(2022P10AC02).
文摘Graphs that are used to model real-world entities with vertices and relationships among entities with edges,have proven to be a powerful tool for describing real-world problems in applications.In most real-world scenarios,entities and their relationships are subject to constant changes.Graphs that record such changes are called dynamic graphs.In recent years,the widespread application scenarios of dynamic graphs have stimulated extensive research on dynamic graph processing systems that continuously ingest graph updates and produce up-to-date graph analytics results.As the scale of dynamic graphs becomes larger,higher performance requirements are demanded to dynamic graph processing systems.With the massive parallel processing power and high memory bandwidth,GPUs become mainstream vehicles to accelerate dynamic graph processing tasks.GPU-based dynamic graph processing systems mainly address two challenges:maintaining the graph data when updates occur(i.e.,graph updating)and producing analytics results in time(i.e.,graph computing).In this paper,we survey GPU-based dynamic graph processing systems and review their methods on addressing both graph updating and graph computing.To comprehensively discuss existing dynamic graph processing systems on GPUs,we first introduce the terminologies of dynamic graph processing and then develop a taxonomy to describe the methods employed for graph updating and graph computing.In addition,we discuss the challenges and future research directions of dynamic graph processing on GPUs.
基金supported by the National Natural Science Foundation of China(Grant Nos.61832020,61821003 and U1705261)National Defense Preliminary Research Project(No.31511010202)+3 种基金the Fundamental Research Funds for the Central Universities,the Open Project Program of Wuhan National Laboratory for Optoelectronics(No.2022WNLOKF017)the Natural Science Foundation of Fujian Province(No.2020J01493)Zhejiang provincial“Ten Thousand Talents Program”(No.2021R52007)Center-initiated Research Project of Zhejiang Lab(No.2021DA0AM01).
文摘In order to analyze and process the large graphs with high cost efficiency,researchers have developed a number of out-of-core graph processing systems in recent years based on just one commodity computer.On the other hand,with the rapidly growing need of analyzing graphs in the real-world,graph processing systems have to efficiently handle massive concurrent graph processing(CGP)jobs.Unfortunately,due to the inherent design for single graph processing job,existing out-of-core graph processing systems usually incur unnecessary data accesses and severe competition of I/O bandwidth when handling the CGP jobs.In this paper,we propose GraphCP,a disk I/O optimized out-of-core graph processing system that efficiently supports the processing of CGP jobs.GraphCP proposes a benefit-aware sharing execution model to share the I/O access and processing of graph data among the CGP jobs and adaptively schedule the graph data loading based on the states of vertices,which efficiently overcomes above challenges faced by existing out-of-core graph processing systems.Moreover,GraphCP adopts a dependency-based future-vertex updating model so as to reduce disk I/Os in the future iterations.In addition,GraphCP organizes the graph data with a Source-Sorted Sub-Block graph representation for better processing capacity and I/O access locality.Extensive evaluation results show that GraphCP is 20.5×and 8.9×faster than two out-of-core graph processing systems GridGraph and GraphZ,and 3.5×and 1.7×faster than two state-of-art concurrent graph processing systems Seraph and GraphSO.
基金supported in part by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(NRF-2019R1A2C1006159)and(NRF-2021R1A6A1A03039493)by the 2021 Yeungnam University Research Grant.
文摘The networks are fundamental to our modern world and they appear throughout science and society.Access to a massive amount of data presents a unique opportunity to the researcher’s community.As networks grow in size the complexity increases and our ability to analyze them using the current state of the art is at severe risk of failing to keep pace.Therefore,this paper initiates a discussion on graph signal processing for large-scale data analysis.We first provide a comprehensive overview of core ideas in Graph signal processing(GSP)and their connection to conventional digital signal processing(DSP).We then summarize recent developments in developing basic GSP tools,including methods for graph filtering or graph learning,graph signal,graph Fourier transform(GFT),spectrum,graph frequency,etc.Graph filtering is a basic task that allows for isolating the contribution of individual frequencies and therefore enables the removal of noise.We then consider a graph filter as a model that helps to extend the application of GSP methods to large datasets.To show the suitability and the effeteness,we first created a noisy graph signal and then applied it to the filter.After several rounds of simulation results.We see that the filtered signal appears to be smoother and is closer to the original noise-free distance-based signal.By using this example application,we thoroughly demonstrated that graph filtration is efficient for big data analytics.
基金the National Natural Science Foundation of China (Grant No. 61702202)China Postdoctoral Science Foundation Funded Project (2017M610477 and 2017T100555).
文摘Although many graph processing systems have been proposed, graphs in the real-world are often dynamic. It is important to keep the results of graph computation up-todate. Incremental computation is demonstrated to be an efficient solution to update calculated results. Recently, many incremental graph processing systems have been proposed to handle dynamic graphs in an asynchronous way and are able to achieve better performance than those processed in a synchronous way. However, these solutions still suffer from sub-optimal convergence speed due to their slow propagation of important vertex state (important to convergence speed) and poor locality. In order to solve these problems, we propose a novel graph processing framework. It introduces a dynamic partition method to gather the important vertices for high locality, and then uses a priority-based scheduling algorithm to assign them with a higher priority for an effective processing order. By such means, it is able to reduce the number of updates and increase the locality, thereby reducing the convergence time. Experimental results show that our method reduces the number of updates by 30%, and reduces the total execution time by 35%, compared with state-of-the-art systems.
基金This work was supported by the National Key Research and Development Program of China(2018YFB1003502)the National Natural Science Foundation of China(Grant Nos.61825202,61832006,and 61702201).
文摘Hybrid pull-push computational model can provide compelling results over either of single one for processing real-world graphs.Programmability and pipeline parallelism of FPGAs make it potential to process different stages of graph iterations.Nevertheless,considering the limited on-chip resources and streamline pipeline computation,the efficiency of hybrid model on FPGAs often suffers due to well-known random access feature of graph processing.In this paper,we present a hybrid graph processing system on FPGAs,which can achieve the best of both worlds.Our approach on FPGAs is unique and novel as follow.First,we propose to use edge block(consisting of edges with the same destination vertex set),which allows to sequentially access edges at block granularity for locality while still preserving the precision.Due to the independence of blocks in the sense that all edges in an inactive block are associated with inactive vertices,this also enables to skip invalid blocks for reducing redundant computation.Second,we consider a large number of vertices and their associated edge-blocks to maintain a predictable execution history.We also present to switch models in advance with few stalls using their state statistics.Our evaluation on a wide variety of graph algorithms for many real-world graphs shows that our approach achieves up to 3.69x speedup over state-of-the-art FPGA-based graph processing systems.
基金supported by the National Key Research and Development Program of China under Grant No.2018YFB1003502the National Natural Science Foundation of China under Grant Nos.62072195,61825202,61832006,and 61628204.
文摘With the rapid growth of real-world graphs,the size of which can easily exceed the on-chip(board)storage capacity of an accelerator,processing large-scale graphs on a single Field Programmable Gate Array(FPGA)becomes difficult.The multi-FPGA acceleration is of great necessity and importance.Many cloud providers(e.g.,Amazon,Microsoft,and Baidu)now expose FPGAs to users in their data centers,providing opportunities to accelerate large-scale graph processing.In this paper,we present a communication library,called FDGLib,which can easily scale out any existing single FPGA-based graph accelerator to a distributed version in a data center,with minimal hardware engineering efforts.FDGLib provides six APIs that can be easily used and integrated into any FPGA-based graph accelerator with only a few lines of code modifications.Considering the torus-based FPGA interconnection in data centers,FDGLib also improves communication efficiency using simple yet effective torus-friendly graph partition and placement schemes.We interface FDGLib into AccuGraph,a state-of-the-art graph accelerator.Our results on a 32-node Microsoft Catapult-like data center show that the distributed AccuGraph can be 2.32x and 4.77x faster than a state-of-the-art distributed FPGA-based graph accelerator ForeGraph and a distributed CPU-based graph system Gemini,with better scalability.
基金Project supported by the National Key Program of China(No.2018YFB2101100)the National Natural Science Foundation of China(Nos.61932001 and 61872376)the Major State Research Development Program of China(No.2016YFB0201305).
文摘Graph is a significant data structure that describes the relationship between entries.Many application domains in the real world are heavily dependent on graph data.However,graph applications are vastly different from traditional applications.It is inefficient to use general-purpose platforms for graph applications,thus contributing to the research of specific graph processing platforms.In this survey,we systematically categorize the graph workloads and applications,and provide a detailed review of existing graph processing platforms by dividing them into generalpurpose and specialized systems.We thoroughly analyze the implementation technologies including programming models,partitioning strategies,communication models,execution models,and fault tolerance strategies.Finally,we analyze recent advances and present four open problems for future research.
文摘The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer set. These tables are implemented using column-based techniques and are used to store graphs of database, frequent sub-graphs and the neighborhood of nodes. In order to exact checking of remaining graphs, the vertex invariant is used for isomorphism test which can be parallel implemented. The results of evaluation indicate that proposed method outperforms existing methods.
基金supported by the National Natural Science Foundation of China(Grant No.61231010)the Fundamental Research Funds for the Central Universities,China(Grant No.HUST No.2012QN076)
文摘Identifying influential nodes in complex networks is of both theoretical and practical importance. Existing methods identify influential nodes based on their positions in the network and assume that the nodes are homogeneous. However, node heterogeneity (i.e., different attributes such as interest, energy, age, and so on ) ubiquitously exists and needs to be taken into consideration. In this paper, we conduct an investigation into node attributes and propose a graph signal pro- cessing based centrality (GSPC) method to identify influential nodes considering both the node attributes and the network topology. We first evaluate our GSPC method using two real-world datasets. The results show that our GSPC method effectively identifies influential nodes, which correspond well with the underlying ground truth. This is compatible to the previous eigenvector centrality and principal component centrality methods under circumstances where the nodes are homogeneous. In addition, spreading analysis shows that the GSPC method has a positive effect on the spreading dynamics.
基金supported by National Natural Science Foundation of China(Grant No.61761011)Natural Science Foundation of Guangxi(Grant No.2020GXNSFBA297078).
文摘In the graph signal processing(GSP)framework,distributed algorithms are highly desirable in processing signals defined on large-scale networks.However,in most existing distributed algorithms,all nodes homogeneously perform the local computation,which calls for heavy computational and communication costs.Moreover,in many real-world networks,such as those with straggling nodes,the homogeneous manner may result in serious delay or even failure.To this end,we propose active network decomposition algorithms to select non-straggling nodes(normal nodes)that perform the main computation and communication across the network.To accommodate the decomposition in different kinds of networks,two different approaches are developed,one is centralized decomposition that leverages the adjacency of the network and the other is distributed decomposition that employs the indicator message transmission between neighboring nodes,which constitutes the main contribution of this paper.By incorporating the active decomposition scheme,a distributed Newton method is employed to solve the least squares problem in GSP,where the Hessian inverse is approximately evaluated by patching a series of inverses of local Hessian matrices each of which is governed by one normal node.The proposed algorithm inherits the fast convergence of the second-order algorithms while maintains low computational and communication cost.Numerical examples demonstrate the effectiveness of the proposed algorithm.
基金supported by the National Natural Science Foundation of China(No.61303082) the Research Fund for the Doctoral Program of Higher Education of China(No.20120121120046)
文摘Lexicalized reordering models are very important components of phrasebased translation systems.By examining the reordering relationships between adjacent phrases,conventional methods learn these models from the word aligned bilingual corpus,while ignoring the effect of the number of adjacent bilingual phrases.In this paper,we propose a method to take the number of adjacent phrases into account for better estimation of reordering models.Instead of just checking whether there is one phrase adjacent to a given phrase,our method firstly uses a compact structure named reordering graph to represent all phrase segmentations of a parallel sentence,then the effect of the adjacent phrase number can be quantified in a forward-backward fashion,and finally incorporated into the estimation of reordering models.Experimental results on the NIST Chinese-English and WMT French-Spanish data sets show that our approach significantly outperforms the baseline method.
文摘With the popularity of social network, the de- mand for real-time processing of graph data is increasing. However, most of the existing graph systems adopt a batch processing mode, therefore the overhead of maintaining and processing of dynamic graph is significantly high. In this pa- per, we design iGraph, an incremental graph processing sys- tem for dynamic graph with its continuous updates. The con- tribufions of iGraph include: 1) a hash-based graph partition strategy to enable fine-grained graph updates; 2) a vertex- based graph computing model to support incremental data processing; 3) detection and rebalance methods of hotspot to address the workload imbalance problem during incre- mental processing. Through the general-purpose API, iGraph can be used to implement various graph processing algo- rithms such as PageRank. We have implemented iGraph on Apache Spark, and experimental results show that for real life datasets, iGraph outperforms the original GraphX in respect of graph update and graph computation.
基金supported by the National Natural Science Foundation of China(Nos.60872060,11101265)the Shanghai Natural Science Foundation of China(No.12ZR1421000)the Shanghai Education Commission Innovation Project Fund(Nos.12ZZ193,14YZ152,15ZZ099)
文摘In this paper, the continuous-time independent edge-Markovian random graph process model is constructed. The authors also define the interval isolated nodes of the random graph process, study the distribution sequence of the number of isolated nodes and the probability of having no isolated nodes when the initial distribution of the random graph process is stationary distribution, derive the lower limit of the probability in which two arbitrary nodes are connected and the random graph is also connected, and prove that the random graph is almost everywhere connected when the number of nodes is sufficiently large.
文摘Design extraction and reduction have been extensively used in modern VLSI design process. The extracted and reduced design can be efficiently processed by various applications, such as formal verification, simulation, automatic test pattern generation (ATPG), etc. This paper presents a new circuit extraction method using program slicing technique, and develops an elegant theoretical basis based on program slicing for circuit extraction from Verilog description. The technique can obtain a chaining slice for given signals of interest. Compared with related researches, the main advantages of the method include that it is fine grain, it has no hardware description language (HDL) coding style limitation; it is precise and is capable of dealing with various Verilog constructions. The technique has been integrated with a commercial simulation environment and incorporated into a design process. The results of practical designs show the significant benefits of the approach.