This paper preliminarily evaluates the speedup,scalability,and prediction skill of the highperformance advanced regional eta coordinate model(H-AREM),which is based on several parallel processing methods and decompo...This paper preliminarily evaluates the speedup,scalability,and prediction skill of the highperformance advanced regional eta coordinate model(H-AREM),which is based on several parallel processing methods and decomposition strategies.Results show that the parallel version of the model that is based on a modular parallel framework and a multidimensional domain decomposition strategy performs better overall,e.g.it is faster and more scalable than the version based on a message passing interface and a one-dimensional decomposition strategy.In particular,the scalability of the H-AREM with a resolution of 8 km approaches 8099 cores.Moreover,in the H-AREM,higher resolutions result in more realistic precipitation predictions without remarkable increases in simulation time.展开更多
This paper analyzes the physical potential, computing performance benefi t and power consumption of optical interconnects. Compared with electrical interconnections, optical ones show undoubted advantages based on phy...This paper analyzes the physical potential, computing performance benefi t and power consumption of optical interconnects. Compared with electrical interconnections, optical ones show undoubted advantages based on physical factor analysis. At the same time, since the recent developments drive us to think about whether these optical interconnect technologies with higher bandwidth but higher cost are worthy to be deployed, the computing performance comparison is performed. To meet the increasing demand of large-scale parallel or multi-processor computing tasks, an analytic method to evaluate parallel computing performance ofinterconnect systems is proposed in this paper. Both bandwidth-limit model and full-bandwidth model are under our investigation. Speedup and effi ciency are selected to represent the parallel performance of an interconnect system. Deploying the proposed models, we depict the performance gap between the optical and electrically interconnected systems. Another investigation on power consumption of commercial products showed that if the parallel interconnections are deployed, the unit power consumption will be reduced. Therefore, from the analysis of computing influence and power dissipation, we found that parallel optical interconnect is valuable combination of high performance and low energy consumption. Considering the possible data center under construction, huge power could be saved if parallel optical interconnects technologies are used.展开更多
One of the key challenges in largescale network simulation is the huge computation demand in fine-grained traffic simulation.Apart from using high-performance computing facilities and parallelism techniques,an alterna...One of the key challenges in largescale network simulation is the huge computation demand in fine-grained traffic simulation.Apart from using high-performance computing facilities and parallelism techniques,an alternative is to replace the background traffic by simplified abstract models such as fluid flows.This paper suggests a hybrid modeling approach for background traffic,which combines ON/OFF model with TCP activities.The ON/OFF model is to characterize the application activities,and the ordinary differential equations(ODEs) based on fluid flows is to describe the TCP congestion avoidance functionality.The apparent merits of this approach are(1) to accurately capture the traffic self-similarity at source level,(2) properly reflect the network dynamics,and(3) efficiently decrease the computational complexity.The experimental results show that the approach perfectly makes a proper trade-off between accuracy and complexity in background traffic simulation.展开更多
The rapid growth of interconnected high performance workstations has produced a new computing paradigm called clustered of workstations computing. In these systems load balance problem is a serious impediment to achie...The rapid growth of interconnected high performance workstations has produced a new computing paradigm called clustered of workstations computing. In these systems load balance problem is a serious impediment to achieve good performance. The main concern of this paper is the implementation of dynamic load balancing algorithm, asynchronous Round Robin (ARR), for balancing workload of parallel tree computation depth-first-search algorithm on Cluster of Heterogeneous Workstations (COW) Many algorithms in artificial intelligence and other areas of computer science are based on depth first search in implicitty defined trees. For these algorithms a load-balancing scheme is required, which is able to evenly distribute parts of an irregularly shaped tree over the workstations with minimal interprocessor communication and without prior knowledge of the tree’s shape. For the (ARR) algorithm only minimal interprocessor communication is needed when necessary and it runs under the MPI (Message passing interface) that allows parallel execution on heterogeneous SUN cluster of workstation platform. The program code is written in C language and executed under UNIX operating system (Solaris version).展开更多
With the continuous development of network communication technology and computer technology, parallel computer network applications becoming more widely, its reliability has attracted more attention on researcher. Thi...With the continuous development of network communication technology and computer technology, parallel computer network applications becoming more widely, its reliability has attracted more attention on researcher. This paper gives a introduction to a simple computer network, given the reliability of the design criteria for computer network analysis, and finally through the examples to illustrate the computer network hardware and software reliability.展开更多
High performance computing(HPC)is a powerful tool to accelerate the Kohn–Sham density functional theory(KS-DFT)calculations on modern heterogeneous supercomputers.Here,we describe a massively parallel implementation ...High performance computing(HPC)is a powerful tool to accelerate the Kohn–Sham density functional theory(KS-DFT)calculations on modern heterogeneous supercomputers.Here,we describe a massively parallel implementation of discontinuous Galerkin density functional theory(DGDFT)method on the Sunway Taihu Light supercomputer.The DGDFT method uses the adaptive local basis(ALB)functions generated on-the-fly during the self-consistent field(SCF)iteration to solve the KS equations with high precision comparable to plane-wave basis set.In particular,the DGDFT method adopts a two-level parallelization strategy that deals with various types of data distribution,task scheduling,and data communication schemes,and combines with the master–slave multi-thread heterogeneous parallelism of SW26010 processor,resulting in large-scale HPC KS-DFT calculations on the Sunway Taihu Light supercomputer.We show that the DGDFT method can scale up to 8,519,680 processing cores(131,072 core groups)on the Sunway Taihu Light supercomputer for studying the electronic structures of twodimensional(2 D)metallic graphene systems that contain tens of thousands of carbon atoms.展开更多
基金jointly supported by the National Basic Research Program of China(973 Program)[grant number 6131270305]the Ministry of Water Resources'special research grant for non-profit public service[grant number 201301062-02]+1 种基金the National Natural Science Foundation of China[grant number61572058]the Strategic Priority Research Program of the Chinese Academy of Sciences[grant number XDA05110304]
文摘This paper preliminarily evaluates the speedup,scalability,and prediction skill of the highperformance advanced regional eta coordinate model(H-AREM),which is based on several parallel processing methods and decomposition strategies.Results show that the parallel version of the model that is based on a modular parallel framework and a multidimensional domain decomposition strategy performs better overall,e.g.it is faster and more scalable than the version based on a message passing interface and a one-dimensional decomposition strategy.In particular,the scalability of the H-AREM with a resolution of 8 km approaches 8099 cores.Moreover,in the H-AREM,higher resolutions result in more realistic precipitation predictions without remarkable increases in simulation time.
基金supported in part by National 863 Program (2009AA01Z256,No.2009AA01A345)National 973 Program (2007CB310705)the NSFC (60932004),P.R.China
文摘This paper analyzes the physical potential, computing performance benefi t and power consumption of optical interconnects. Compared with electrical interconnections, optical ones show undoubted advantages based on physical factor analysis. At the same time, since the recent developments drive us to think about whether these optical interconnect technologies with higher bandwidth but higher cost are worthy to be deployed, the computing performance comparison is performed. To meet the increasing demand of large-scale parallel or multi-processor computing tasks, an analytic method to evaluate parallel computing performance ofinterconnect systems is proposed in this paper. Both bandwidth-limit model and full-bandwidth model are under our investigation. Speedup and effi ciency are selected to represent the parallel performance of an interconnect system. Deploying the proposed models, we depict the performance gap between the optical and electrically interconnected systems. Another investigation on power consumption of commercial products showed that if the parallel interconnections are deployed, the unit power consumption will be reduced. Therefore, from the analysis of computing influence and power dissipation, we found that parallel optical interconnect is valuable combination of high performance and low energy consumption. Considering the possible data center under construction, huge power could be saved if parallel optical interconnects technologies are used.
基金supported by the Science and Technology Project of Zhejiang Province(No. 2014C01051)the National High Technology Development 863 Program of China( No.2015AA011901)
文摘One of the key challenges in largescale network simulation is the huge computation demand in fine-grained traffic simulation.Apart from using high-performance computing facilities and parallelism techniques,an alternative is to replace the background traffic by simplified abstract models such as fluid flows.This paper suggests a hybrid modeling approach for background traffic,which combines ON/OFF model with TCP activities.The ON/OFF model is to characterize the application activities,and the ordinary differential equations(ODEs) based on fluid flows is to describe the TCP congestion avoidance functionality.The apparent merits of this approach are(1) to accurately capture the traffic self-similarity at source level,(2) properly reflect the network dynamics,and(3) efficiently decrease the computational complexity.The experimental results show that the approach perfectly makes a proper trade-off between accuracy and complexity in background traffic simulation.
文摘The rapid growth of interconnected high performance workstations has produced a new computing paradigm called clustered of workstations computing. In these systems load balance problem is a serious impediment to achieve good performance. The main concern of this paper is the implementation of dynamic load balancing algorithm, asynchronous Round Robin (ARR), for balancing workload of parallel tree computation depth-first-search algorithm on Cluster of Heterogeneous Workstations (COW) Many algorithms in artificial intelligence and other areas of computer science are based on depth first search in implicitty defined trees. For these algorithms a load-balancing scheme is required, which is able to evenly distribute parts of an irregularly shaped tree over the workstations with minimal interprocessor communication and without prior knowledge of the tree’s shape. For the (ARR) algorithm only minimal interprocessor communication is needed when necessary and it runs under the MPI (Message passing interface) that allows parallel execution on heterogeneous SUN cluster of workstation platform. The program code is written in C language and executed under UNIX operating system (Solaris version).
文摘With the continuous development of network communication technology and computer technology, parallel computer network applications becoming more widely, its reliability has attracted more attention on researcher. This paper gives a introduction to a simple computer network, given the reliability of the design criteria for computer network analysis, and finally through the examples to illustrate the computer network hardware and software reliability.
基金partly supported by the Supercomputer Application Project Trail Funding from Wuxi Jiangnan Institute of Computing Technology(BB2340000016)the Strategic Priority Research Program of Chinese Academy of Sciences(XDC01040100)+6 种基金the National Natural Science Foundation of China(21688102,21803066)the Anhui Initiative in Quantum Information Technologies(AHY090400)the National Key Research and Development Program of China(2016YFA0200604)the Fundamental Research Funds for Central Universities(WK2340000091)the Chinese Academy of Sciences Pioneer Hundred Talents Program(KJ2340000031)the Research Start-Up Grants(KY2340000094)the Academic Leading Talents Training Program(KY2340000103)from University of Science and Technology of China。
文摘High performance computing(HPC)is a powerful tool to accelerate the Kohn–Sham density functional theory(KS-DFT)calculations on modern heterogeneous supercomputers.Here,we describe a massively parallel implementation of discontinuous Galerkin density functional theory(DGDFT)method on the Sunway Taihu Light supercomputer.The DGDFT method uses the adaptive local basis(ALB)functions generated on-the-fly during the self-consistent field(SCF)iteration to solve the KS equations with high precision comparable to plane-wave basis set.In particular,the DGDFT method adopts a two-level parallelization strategy that deals with various types of data distribution,task scheduling,and data communication schemes,and combines with the master–slave multi-thread heterogeneous parallelism of SW26010 processor,resulting in large-scale HPC KS-DFT calculations on the Sunway Taihu Light supercomputer.We show that the DGDFT method can scale up to 8,519,680 processing cores(131,072 core groups)on the Sunway Taihu Light supercomputer for studying the electronic structures of twodimensional(2 D)metallic graphene systems that contain tens of thousands of carbon atoms.