In order to make full use of advanced technologies for future mobile communications systems such as Space Time Code (STC), Joint Transmission (JT) and Multiple Input Multiple Output (MIMO), and to meet the requirement...In order to make full use of advanced technologies for future mobile communications systems such as Space Time Code (STC), Joint Transmission (JT) and Multiple Input Multiple Output (MIMO), and to meet the requirements of high-bit-rate multimedia services, new network topologies should be studied. Generalized distributed multicell architecture can take full advantage of multi-antenna technologies and solve the problem of frequent handover caused by higher carrier frequencies. Group handover, the handover policy based on the architecture, can eliminate the cell edge effect. Furthermore, by applying the concept of group handover to 3G mobile communication systems, the Fast Cell Group Selection (FCGS) scheme can effectively improve the data rate for cell edge users.展开更多
Computing resources are one of the key factors restricting the extraction of marine targets by using deep learning.In order to increase computing speed and shorten the computing time,parallel distributed architecture ...Computing resources are one of the key factors restricting the extraction of marine targets by using deep learning.In order to increase computing speed and shorten the computing time,parallel distributed architecture is adopted to extract marine targets.The advantages of two distributed architectures,Parameter Server and Ring-allreduce architecture,are combined to design a parallel distributed architecture suitable for deep learning–Optimal Interleaved Distributed Architecture(OIDA).Three marine target extraction methods including OTD_StErf,OTD_Loglogistic and OTD_Sgmloglog are used to test OIDA,and a total of 18 experiments in 3categories are carried out.The results show that OIDA architecture can meet the timeliness requirements of marine target extraction.The average speed of target parallel extraction with single-machine 8-core CPU is 5.75 times faster than that of single-machine single-core CPU,and the average speed with 5-machine 40-core CPU is 20.75 times faster.展开更多
1 Introduction Reservoir architecture analysis of distributary channel of Daqing oilfield has drawn consistent interest among development geologists and petroleum engineers over the last decade(Lv et al.,1999;Zhou et ...1 Introduction Reservoir architecture analysis of distributary channel of Daqing oilfield has drawn consistent interest among development geologists and petroleum engineers over the last decade(Lv et al.,1999;Zhou et al.,2008;Zhang et展开更多
A common assumption of coverage path planning research is a static environment.Such environments require only a single visit to each area to achieve coverage.However,some real-world environments are characterised by t...A common assumption of coverage path planning research is a static environment.Such environments require only a single visit to each area to achieve coverage.However,some real-world environments are characterised by the presence of unexpected,dynamic obstacles.They require areas to be revisited periodically to maintain an accurate coverage map,as well as reactive obstacle avoidance.This paper proposes a novel swarmbased control algorithm for multi-robot exploration and repeated coverage in environments with unknown,dynamic obstacles.The algorithm combines two elements:frontier-led swarming for driving exploration by a group of robots,and pheromone-based stigmergy for controlling repeated coverage while avoiding obstacles.We tested the performance of our approach on heterogeneous and homogeneous groups of mobile robots in different environments.We measure both repeated coverage performance and obstacle avoidance ability.Through a series of comparison experiments,we demonstrate that our proposed strategy has superior performance to recently presented multi-robot repeated coverage methodologies.展开更多
Information-Centric Networking(ICN), an alternative architecture to the current Internet infrastructure, focuses on the distribution and retrieval of content by employing caches in a network to reduce network traffic....Information-Centric Networking(ICN), an alternative architecture to the current Internet infrastructure, focuses on the distribution and retrieval of content by employing caches in a network to reduce network traffic. The employment of caches may be accomplished using graph-based and content-based criteria such as the position of a node in a network and content popularity. The contribution of this paper lies on the characterization of content popularity for on-path in-network caching. To this end, four dynamic approaches for identifying content popularity are evaluated via simulations. Content popularity may be determined per chunk or per object, calculated by the number of requests for a content against the sum of requests or the maximum number of requests. Based on the results, chunk-based approaches provide 23% more accurate content popularity calculations than object-based approaches. In addition, approaches that are based on the comparison of a content against the maximum number of requests have been shown to be more accurate than the alternatives.展开更多
The study of induced polarization (IP) information extraction from magnetotelluric (MT) sounding data is of great and practical significance to the exploitation of deep mineral, oil and gas resources. The linear i...The study of induced polarization (IP) information extraction from magnetotelluric (MT) sounding data is of great and practical significance to the exploitation of deep mineral, oil and gas resources. The linear inversion method, which has been given priority in previous research on the IP information extraction method, has three main problems as follows: 1) dependency on the initial model, 2) easily falling into the local minimum, and 3) serious non-uniqueness of solutions. Taking the nonlinearity and nonconvexity of IP information extraction into consideration, a two-stage CO-PSO minimum structure inversion method using compute unified distributed architecture (CUDA) is proposed. On one hand, a novel Cauchy oscillation particle swarm optimization (CO-PSO) algorithm is applied to extract nonlinear IP information from MT sounding data, which is implemented as a parallel algorithm within CUDA computing architecture; on the other hand, the impact of the polarizability on the observation data is strengthened by introducing a second stage inversion process, and the regularization parameter is applied in the fitness function of PSO algorithm to solve the problem of multi-solution in inversion. The inversion simulation results of polarization layers in different strata of various geoelectric models show that the smooth models of resistivity and IP parameters can be obtained by the proposed algorithm, the results of which are relatively stable and accurate. The experiment results added with noise indicate that this method is robust to Gaussian white noise. Compared with the traditional PSO and GA algorithm, the proposed algorithm has more efficiency and better inversion results.展开更多
How to make use of limited onboard resources for complex and heavy space tasks has attracted much attention.With the continuous improvement on satellite payload capacity and the increasing complexity of observation re...How to make use of limited onboard resources for complex and heavy space tasks has attracted much attention.With the continuous improvement on satellite payload capacity and the increasing complexity of observation requirements,the importance of satellite autonomous task scheduling research has gradually increased.This article first gives the problem description and mathematical model for the satellite autonomous task scheduling and then follows the steps of"satellite autonomous task scheduling,centralized autonomous collaborative task scheduling architecture,distributed autonomous collaborative task scheduling architecture,solution algorithm".Finally,facing the complex and changeable environment situation,this article proposes the future direction of satellite autonomous task scheduling.展开更多
As huge users are involved,there is a difficulty in spectrum allocation and scheduling in Cognitive Radio Networks(CRNs).Collision increases when there is no allocation of spectrum and these results in huge drop rate ...As huge users are involved,there is a difficulty in spectrum allocation and scheduling in Cognitive Radio Networks(CRNs).Collision increases when there is no allocation of spectrum and these results in huge drop rate and network performance degradation.To solve these problems and allocate appropriate spectrum,a novel method is introduced termed as Quality of Service(QoS)Improvement Proper Scheduling(QIPS).The major contribution of the work is to design a new cross layer QoS Aware Scheduling based on Loss-based Proportional Fairness with Multihop(QoSAS-LBPFM).In Medium Access Control(MAC)multi-channel network environment mobile nodes practice concurrent broadcast between several channels.Acquiring the advantage of introduced cross layer design,the real-time channel conditions offered by Cognitive Radio(CR)function allows adaptive sub channel choice for every broadcast.To optimize the resources of network,the LBPFM adaptively plans the radio resources for allocating to diverse services without lessening the quality of service.Results obtained from simulation proved that QoSAS-LBPFM provides enhanced QoS guaranteed performance against other existing QIPS algorithm.展开更多
This paper is concerned with the problem of odor source localization using multi-robot system. A learning particle swarm optimization algorithm, which can coordinate a multi-robot system to locate the odor source, is ...This paper is concerned with the problem of odor source localization using multi-robot system. A learning particle swarm optimization algorithm, which can coordinate a multi-robot system to locate the odor source, is proposed. First, in order to develop the proposed algorithm, a source probability map for a robot is built and updated by using concentration magnitude information, wind information, and swarm information. Based on the source probability map, the new position of the robot can be generated. Second, a distributed coordination architecture, by which the proposed algorithm can run on the multi-robot system, is designed. Specifically, the proposed algorithm is used on the group level to generate a new position for the robot. A consensus algorithm is then adopted on the robot level in order to control the robot to move from the current position to the new position. Finally, the effectiveness of the proposed algorithm is illustrated for the odor source localization problem.展开更多
With the rapid development in cloud data centers and cloud service customers,the demand for high quality cloud service has been grown rapidly.To face this reality,this paper focuses on service optimization issues in c...With the rapid development in cloud data centers and cloud service customers,the demand for high quality cloud service has been grown rapidly.To face this reality,this paper focuses on service optimization issues in cloud computing environment.First,a service-oriented architecture is proposed and programmable network facilities are utilized in it to optimize specific cloud services.Then various cloud services are categorized into two subcategories;static services and dynamic services.Furthermore,the concepts of cloud service quality and cloud resource idle rate are defined,and the aforementioned concepts have also been taken into consideration as parameters in the service optimization algorithm to improve the cloud service quality and optimize system workload simultaneously.Numerical simulations are conducted to verify the effectiveness of the proposed algorithm in balancing the workload of all servers.展开更多
This article proposes an optimized in-band control channel scheme with channel selection scheduling algorithm and network coding based transmission paradigm in the distributed cognitive radio network (CRN). As well ...This article proposes an optimized in-band control channel scheme with channel selection scheduling algorithm and network coding based transmission paradigm in the distributed cognitive radio network (CRN). As well known, control channel plays an important role for establishment of wireless transmission. In order to improve spectrum efficiency in CRN, control channel is preferred to be deployed without dedicated spectrum allocation, i.e. the in-band way. In this study, the time slot division and dynamic channel selection scheduling algorithm is proposed to realize the in-band control channel with improved spectrum efficiency in the distributed CRN. Furthermore, to adapt to dynamic behavior of the primary users, network coding technology is employed to optimize the overhead of control information transmission so that the control information can be efficiently and reliably transmitted. The performance of the proposed in-band control channel scheme is verified by the extensive simulation results.展开更多
There is an increasing need to build scalable distributed systems over the Internet infrastructure. However the development of distributed scalable applications suffers from lack of a wide accepted virtual computing e...There is an increasing need to build scalable distributed systems over the Internet infrastructure. However the development of distributed scalable applications suffers from lack of a wide accepted virtual computing environment. Users have to take great efforts on the management and sharing of the involved resources over Internet, whose characteristics are intrinsic growth, autonomy and diversity. To deal with this challenge, Internet-based Virtual Computing Environment (iVCE) is proposed and developed to serve as a platform for distributed scalable applications over the open infrastructure, whose kernel mechanisms are on-demand aggregation and autonomic collaboration of resources. In this paper, we present a programming language for iVCE named Owlet. Owlet conforms with the conceptual model of iVCE, and exposes the iVCE to application developers. As an interaction language based on peer-to-peer content-based publish/subscribe scheme, Owlet abstracts the Internet as an environment for the roles to interact, and uses roles to build a relatively stable view of resources for the on-demand resource aggregation. It provides language constructs to use 1) distributed event driven rules to describe interaction protocols among different roles, 2) conversations to correlate events and rules into a common context, and 3) resource pooling to do fault tolerance and load balancing among networked nodes. We have implemented an Owlet compiler and its runtime environment according to the architecture of iVCE, and built several Owlet applications, including a peer-to-peer file sharing application. Experimental results show that, with iVCE, the separation of resource aggregation logic and business logic significantly eases the process of building scalable distributed applications.展开更多
With the rapid growth of real-world graphs,the size of which can easily exceed the on-chip(board)storage capacity of an accelerator,processing large-scale graphs on a single Field Programmable Gate Array(FPGA)becomes ...With the rapid growth of real-world graphs,the size of which can easily exceed the on-chip(board)storage capacity of an accelerator,processing large-scale graphs on a single Field Programmable Gate Array(FPGA)becomes difficult.The multi-FPGA acceleration is of great necessity and importance.Many cloud providers(e.g.,Amazon,Microsoft,and Baidu)now expose FPGAs to users in their data centers,providing opportunities to accelerate large-scale graph processing.In this paper,we present a communication library,called FDGLib,which can easily scale out any existing single FPGA-based graph accelerator to a distributed version in a data center,with minimal hardware engineering efforts.FDGLib provides six APIs that can be easily used and integrated into any FPGA-based graph accelerator with only a few lines of code modifications.Considering the torus-based FPGA interconnection in data centers,FDGLib also improves communication efficiency using simple yet effective torus-friendly graph partition and placement schemes.We interface FDGLib into AccuGraph,a state-of-the-art graph accelerator.Our results on a 32-node Microsoft Catapult-like data center show that the distributed AccuGraph can be 2.32x and 4.77x faster than a state-of-the-art distributed FPGA-based graph accelerator ForeGraph and a distributed CPU-based graph system Gemini,with better scalability.展开更多
This paper addresses the problem why grid technology has not spread as fast as the Web technology of the 1990's. In the past 10 years, considerable efforts have been put into grid computing. Much progress has been ma...This paper addresses the problem why grid technology has not spread as fast as the Web technology of the 1990's. In the past 10 years, considerable efforts have been put into grid computing. Much progress has been made and more importantly, fundamental challenges and essential issues of this field are emerging. This paper focuses on the area of grid system software research, and argues that usability of grid system software must be enhanced. It identifies four usability issues, drawing from international grid research experiences. It also presents advances by the Vega Grid team in addressing these challenges.展开更多
基金Program ofNational Natural Science Foundation of China(No. 60496312) Program of Beijing NaturalScience Foundation (No. 4042021)
文摘In order to make full use of advanced technologies for future mobile communications systems such as Space Time Code (STC), Joint Transmission (JT) and Multiple Input Multiple Output (MIMO), and to meet the requirements of high-bit-rate multimedia services, new network topologies should be studied. Generalized distributed multicell architecture can take full advantage of multi-antenna technologies and solve the problem of frequent handover caused by higher carrier frequencies. Group handover, the handover policy based on the architecture, can eliminate the cell edge effect. Furthermore, by applying the concept of group handover to 3G mobile communication systems, the Fast Cell Group Selection (FCGS) scheme can effectively improve the data rate for cell edge users.
基金the Natural Science Foundation of Shandong Province(No.ZR2019MD034)the Education Reform Project of Shandong Province(No.M2020266)。
文摘Computing resources are one of the key factors restricting the extraction of marine targets by using deep learning.In order to increase computing speed and shorten the computing time,parallel distributed architecture is adopted to extract marine targets.The advantages of two distributed architectures,Parameter Server and Ring-allreduce architecture,are combined to design a parallel distributed architecture suitable for deep learning–Optimal Interleaved Distributed Architecture(OIDA).Three marine target extraction methods including OTD_StErf,OTD_Loglogistic and OTD_Sgmloglog are used to test OIDA,and a total of 18 experiments in 3categories are carried out.The results show that OIDA architecture can meet the timeliness requirements of marine target extraction.The average speed of target parallel extraction with single-machine 8-core CPU is 5.75 times faster than that of single-machine single-core CPU,and the average speed with 5-machine 40-core CPU is 20.75 times faster.
基金funding support of this project from National Science and Technology Major Project of the Ministry of Science and Technology of China (Grant No. 2011ZX05010-002-005)
文摘1 Introduction Reservoir architecture analysis of distributary channel of Daqing oilfield has drawn consistent interest among development geologists and petroleum engineers over the last decade(Lv et al.,1999;Zhou et al.,2008;Zhang et
基金supported by the DEFENCE SCIENCE&TECHNOLOGY GROUP(DSTG)(9729)The Commonwealth of Australia supported this research through a Defence Science Partnerships agreement with the Australian Defence Science and Technology Group。
文摘A common assumption of coverage path planning research is a static environment.Such environments require only a single visit to each area to achieve coverage.However,some real-world environments are characterised by the presence of unexpected,dynamic obstacles.They require areas to be revisited periodically to maintain an accurate coverage map,as well as reactive obstacle avoidance.This paper proposes a novel swarmbased control algorithm for multi-robot exploration and repeated coverage in environments with unknown,dynamic obstacles.The algorithm combines two elements:frontier-led swarming for driving exploration by a group of robots,and pheromone-based stigmergy for controlling repeated coverage while avoiding obstacles.We tested the performance of our approach on heterogeneous and homogeneous groups of mobile robots in different environments.We measure both repeated coverage performance and obstacle avoidance ability.Through a series of comparison experiments,we demonstrate that our proposed strategy has superior performance to recently presented multi-robot repeated coverage methodologies.
基金funded by the Higher Education Authority (HEA)co-funded under the European Regional Development Fund (ERDF)
文摘Information-Centric Networking(ICN), an alternative architecture to the current Internet infrastructure, focuses on the distribution and retrieval of content by employing caches in a network to reduce network traffic. The employment of caches may be accomplished using graph-based and content-based criteria such as the position of a node in a network and content popularity. The contribution of this paper lies on the characterization of content popularity for on-path in-network caching. To this end, four dynamic approaches for identifying content popularity are evaluated via simulations. Content popularity may be determined per chunk or per object, calculated by the number of requests for a content against the sum of requests or the maximum number of requests. Based on the results, chunk-based approaches provide 23% more accurate content popularity calculations than object-based approaches. In addition, approaches that are based on the comparison of a content against the maximum number of requests have been shown to be more accurate than the alternatives.
基金Projects(41604117,41204054)supported by the National Natural Science Foundation of ChinaProjects(20110490149,2015M580700)supported by the Research Fund for the Doctoral Program of Higher Education,China+1 种基金Project(2015zzts064)supported by the Fundamental Research Funds for the Central Universities,ChinaProject(16B147)supported by the Scientific Research Fund of Hunan Provincial Education Department,China
文摘The study of induced polarization (IP) information extraction from magnetotelluric (MT) sounding data is of great and practical significance to the exploitation of deep mineral, oil and gas resources. The linear inversion method, which has been given priority in previous research on the IP information extraction method, has three main problems as follows: 1) dependency on the initial model, 2) easily falling into the local minimum, and 3) serious non-uniqueness of solutions. Taking the nonlinearity and nonconvexity of IP information extraction into consideration, a two-stage CO-PSO minimum structure inversion method using compute unified distributed architecture (CUDA) is proposed. On one hand, a novel Cauchy oscillation particle swarm optimization (CO-PSO) algorithm is applied to extract nonlinear IP information from MT sounding data, which is implemented as a parallel algorithm within CUDA computing architecture; on the other hand, the impact of the polarizability on the observation data is strengthened by introducing a second stage inversion process, and the regularization parameter is applied in the fitness function of PSO algorithm to solve the problem of multi-solution in inversion. The inversion simulation results of polarization layers in different strata of various geoelectric models show that the smooth models of resistivity and IP parameters can be obtained by the proposed algorithm, the results of which are relatively stable and accurate. The experiment results added with noise indicate that this method is robust to Gaussian white noise. Compared with the traditional PSO and GA algorithm, the proposed algorithm has more efficiency and better inversion results.
基金supported by the National Natural Science Foundation of China(72001212,61773120)Hunan Postgraduate Research Innovation Project(CX20210031)+1 种基金the Foundation for the Author of National Excellent Doctoral Dissertation of China(2014-92)the Innovation Team of Guangdong Provincial Department of Education(2018KCXTD031)。
文摘How to make use of limited onboard resources for complex and heavy space tasks has attracted much attention.With the continuous improvement on satellite payload capacity and the increasing complexity of observation requirements,the importance of satellite autonomous task scheduling research has gradually increased.This article first gives the problem description and mathematical model for the satellite autonomous task scheduling and then follows the steps of"satellite autonomous task scheduling,centralized autonomous collaborative task scheduling architecture,distributed autonomous collaborative task scheduling architecture,solution algorithm".Finally,facing the complex and changeable environment situation,this article proposes the future direction of satellite autonomous task scheduling.
文摘As huge users are involved,there is a difficulty in spectrum allocation and scheduling in Cognitive Radio Networks(CRNs).Collision increases when there is no allocation of spectrum and these results in huge drop rate and network performance degradation.To solve these problems and allocate appropriate spectrum,a novel method is introduced termed as Quality of Service(QoS)Improvement Proper Scheduling(QIPS).The major contribution of the work is to design a new cross layer QoS Aware Scheduling based on Loss-based Proportional Fairness with Multihop(QoSAS-LBPFM).In Medium Access Control(MAC)multi-channel network environment mobile nodes practice concurrent broadcast between several channels.Acquiring the advantage of introduced cross layer design,the real-time channel conditions offered by Cognitive Radio(CR)function allows adaptive sub channel choice for every broadcast.To optimize the resources of network,the LBPFM adaptively plans the radio resources for allocating to diverse services without lessening the quality of service.Results obtained from simulation proved that QoSAS-LBPFM provides enhanced QoS guaranteed performance against other existing QIPS algorithm.
基金supported by National Natural Science Foundation of China (No. 60675043)Natural Science Foundation of Zhejiang Province of China (No. Y1090426, No. Y1090956)Technical Project of Zhejiang Province of China (No. 2009C33045)
文摘This paper is concerned with the problem of odor source localization using multi-robot system. A learning particle swarm optimization algorithm, which can coordinate a multi-robot system to locate the odor source, is proposed. First, in order to develop the proposed algorithm, a source probability map for a robot is built and updated by using concentration magnitude information, wind information, and swarm information. Based on the source probability map, the new position of the robot can be generated. Second, a distributed coordination architecture, by which the proposed algorithm can run on the multi-robot system, is designed. Specifically, the proposed algorithm is used on the group level to generate a new position for the robot. A consensus algorithm is then adopted on the robot level in order to control the robot to move from the current position to the new position. Finally, the effectiveness of the proposed algorithm is illustrated for the odor source localization problem.
基金Supported by the National Natural Science Foundation of China(No.61272508,61472033,61202432)
文摘With the rapid development in cloud data centers and cloud service customers,the demand for high quality cloud service has been grown rapidly.To face this reality,this paper focuses on service optimization issues in cloud computing environment.First,a service-oriented architecture is proposed and programmable network facilities are utilized in it to optimize specific cloud services.Then various cloud services are categorized into two subcategories;static services and dynamic services.Furthermore,the concepts of cloud service quality and cloud resource idle rate are defined,and the aforementioned concepts have also been taken into consideration as parameters in the service optimization algorithm to improve the cloud service quality and optimize system workload simultaneously.Numerical simulations are conducted to verify the effectiveness of the proposed algorithm in balancing the workload of all servers.
基金supported by the National Basic Research Programof China (2009CB320400)the SinoFinland ICT Collaborations Programme Project on ‘Future Wireless Access Technologies’(2010DFB10410)the National Key Technology R&D Program of China (2010ZX03003-001-01)
文摘This article proposes an optimized in-band control channel scheme with channel selection scheduling algorithm and network coding based transmission paradigm in the distributed cognitive radio network (CRN). As well known, control channel plays an important role for establishment of wireless transmission. In order to improve spectrum efficiency in CRN, control channel is preferred to be deployed without dedicated spectrum allocation, i.e. the in-band way. In this study, the time slot division and dynamic channel selection scheduling algorithm is proposed to realize the in-band control channel with improved spectrum efficiency in the distributed CRN. Furthermore, to adapt to dynamic behavior of the primary users, network coding technology is employed to optimize the overhead of control information transmission so that the control information can be efficiently and reliably transmitted. The performance of the proposed in-band control channel scheme is verified by the extensive simulation results.
基金Supported by the National Basic Research 973 Program of China under Grant Nos.2005CB321800 and 2011CB302600the National Natural Science Foundation of China under Grant Nos.90612009,60725206 and 60625203
文摘There is an increasing need to build scalable distributed systems over the Internet infrastructure. However the development of distributed scalable applications suffers from lack of a wide accepted virtual computing environment. Users have to take great efforts on the management and sharing of the involved resources over Internet, whose characteristics are intrinsic growth, autonomy and diversity. To deal with this challenge, Internet-based Virtual Computing Environment (iVCE) is proposed and developed to serve as a platform for distributed scalable applications over the open infrastructure, whose kernel mechanisms are on-demand aggregation and autonomic collaboration of resources. In this paper, we present a programming language for iVCE named Owlet. Owlet conforms with the conceptual model of iVCE, and exposes the iVCE to application developers. As an interaction language based on peer-to-peer content-based publish/subscribe scheme, Owlet abstracts the Internet as an environment for the roles to interact, and uses roles to build a relatively stable view of resources for the on-demand resource aggregation. It provides language constructs to use 1) distributed event driven rules to describe interaction protocols among different roles, 2) conversations to correlate events and rules into a common context, and 3) resource pooling to do fault tolerance and load balancing among networked nodes. We have implemented an Owlet compiler and its runtime environment according to the architecture of iVCE, and built several Owlet applications, including a peer-to-peer file sharing application. Experimental results show that, with iVCE, the separation of resource aggregation logic and business logic significantly eases the process of building scalable distributed applications.
基金supported by the National Key Research and Development Program of China under Grant No.2018YFB1003502the National Natural Science Foundation of China under Grant Nos.62072195,61825202,61832006,and 61628204.
文摘With the rapid growth of real-world graphs,the size of which can easily exceed the on-chip(board)storage capacity of an accelerator,processing large-scale graphs on a single Field Programmable Gate Array(FPGA)becomes difficult.The multi-FPGA acceleration is of great necessity and importance.Many cloud providers(e.g.,Amazon,Microsoft,and Baidu)now expose FPGAs to users in their data centers,providing opportunities to accelerate large-scale graph processing.In this paper,we present a communication library,called FDGLib,which can easily scale out any existing single FPGA-based graph accelerator to a distributed version in a data center,with minimal hardware engineering efforts.FDGLib provides six APIs that can be easily used and integrated into any FPGA-based graph accelerator with only a few lines of code modifications.Considering the torus-based FPGA interconnection in data centers,FDGLib also improves communication efficiency using simple yet effective torus-friendly graph partition and placement schemes.We interface FDGLib into AccuGraph,a state-of-the-art graph accelerator.Our results on a 32-node Microsoft Catapult-like data center show that the distributed AccuGraph can be 2.32x and 4.77x faster than a state-of-the-art distributed FPGA-based graph accelerator ForeGraph and a distributed CPU-based graph system Gemini,with better scalability.
基金TMs work is supported in part by the National Natural Science Foundation of China (Grant Nos. 60573102, 90412010) and the National Grand Fundamental Research 973 Program of China (Grant Nos. 2003CB317000, 2005CB321800).
文摘This paper addresses the problem why grid technology has not spread as fast as the Web technology of the 1990's. In the past 10 years, considerable efforts have been put into grid computing. Much progress has been made and more importantly, fundamental challenges and essential issues of this field are emerging. This paper focuses on the area of grid system software research, and argues that usability of grid system software must be enhanced. It identifies four usability issues, drawing from international grid research experiences. It also presents advances by the Vega Grid team in addressing these challenges.