In this paper,a typical experiment is carried out based on a high-resolution air-sea coupled model,namely,the coupled ocean-atmosphere-wave-sediment transport(COAWST)model,on both heterogeneous many-core(SW)and homoge...In this paper,a typical experiment is carried out based on a high-resolution air-sea coupled model,namely,the coupled ocean-atmosphere-wave-sediment transport(COAWST)model,on both heterogeneous many-core(SW)and homogenous multicore(Intel)supercomputing platforms.We construct a hindcast of Typhoon Lekima on both the SW and Intel platforms,compare the simulation results between these two platforms and compare the key elements of the atmospheric and ocean modules to reanalysis data.The comparative experiment in this typhoon case indicates that the domestic many-core computing platform and general cluster yield almost no differences in the simulated typhoon path and intensity,and the differences in surface pressure(PSFC)in the WRF model and sea surface temperature(SST)in the short-range forecast are very small,whereas a major difference can be identified at high latitudes after the first 10 days.Further heat budget analysis verifies that the differences in SST after 10 days are mainly caused by shortwave radiation variations,as influenced by subsequently generated typhoons in the system.These typhoons generated in the hindcast after the first 10 days attain obviously different trajectories between the two platforms.展开更多
With the continuous improvement of supercomputer performance and the integration of artificial intelligence with traditional scientific computing,the scale of applications is gradually increasing,from millions to tens...With the continuous improvement of supercomputer performance and the integration of artificial intelligence with traditional scientific computing,the scale of applications is gradually increasing,from millions to tens of millions of computing cores,which raises great challenges to achieve high scalability and efficiency of parallel applications on super-large-scale systems.Taking the Sunway exascale prototype system as an example,in this paper we first analyze the challenges of high scalability and high efficiency for parallel applications in the exascale era.To overcome these challenges,the optimization technologies used in the parallel supporting environment software on the Sunway exascale prototype system are highlighted,including the parallel operating system,input/output(I/O)optimization technology,ultra-large-scale parallel debugging technology,10-million-core parallel algorithm,and mixed-precision method.Parallel operating systems and I/O optimization technology mainly support largescale system scaling,while the ultra-large-scale parallel debugging technology,10-million-core parallel algorithm,and mixed-precision method mainly enhance the efficiency of large-scale applications.Finally,the contributions to various applications running on the Sunway exascale prototype system are introduced,verifying the effectiveness of the parallel supporting environment design.展开更多
The overall efficiency of an extreme-scale supercomputer largely relies on the performance of its network interconnects.Several of the state of the art supercomputers use networks based on the increasingly popular Dra...The overall efficiency of an extreme-scale supercomputer largely relies on the performance of its network interconnects.Several of the state of the art supercomputers use networks based on the increasingly popular Dragonfly topology.It is crucial to study the behavior and performance of different parallel applications running on Dragonfly networks in order to make optimal system configurations and design choices,such as job scheduling and routing strategies.However,in order to study these temporal network behavior,we would need a tool to analyze and correlate numerous sets of multivariate time-series data collected from the Dragonfly's multi-level hierarchies.This paper presents such a tool-a visual analytics system-that uses the Dragonfly network to investigate the temporal behavior and optimize the communication performance of a supercomputer.We coupled interactive visualization with time-series analysis methods to help reveal hidden patterns in the network behavior with respect to different parallel applications and system configurations.Our system also provides multiple coordinated views for connecting behaviors observed at different levels of the network hierarchies,which effectively helps visual analysis tasks.We demonstrate the effectiveness of the system with a set of case studies.Our system and findings can not only help improve the communication performance of supercomputing applications,but also the network performance of next-generation supercomputers.展开更多
In June 2018, the United States claimed the No. 1 position in supercomputing according to TOP500, which ranks the top 500 most powerful computer systems in the world [1]. The US Department of Energy’s Summit machine ...In June 2018, the United States claimed the No. 1 position in supercomputing according to TOP500, which ranks the top 500 most powerful computer systems in the world [1]. The US Department of Energy’s Summit machine (Fig. 1)[1] claimed this distinction, which previously had been held by China’s Sunway TaihuLight supercomputer.展开更多
The first in China 10~9 sparallel supercomputer, named as Yinhe-Ⅱ, had been manufac-tured by Science-technological University of National Defence. The main feature of thesupercomputer are: 4-processor system, the pri...The first in China 10~9 sparallel supercomputer, named as Yinhe-Ⅱ, had been manufac-tured by Science-technological University of National Defence. The main feature of thesupercomputer are: 4-processor system, the principle frequence 50 MHz, the word length 64 byte,the main memory 256 Mb, two individual input / output subsystems, > 10~9 operations per sec-展开更多
China’s first supercomputer capable of 100 million calculations per second was the YH-1,which was independently developed by the Institute of Computer Science at the National University of Defense Technology(NUDT)bet...China’s first supercomputer capable of 100 million calculations per second was the YH-1,which was independently developed by the Institute of Computer Science at the National University of Defense Technology(NUDT)between 1978 and 1983.YH-1 played an important role in China’s national defense construction and national economic development.It made China one of the few countries in the world to successfully develop a supercomputer.Based on original archive documents,interviews with relevant personnel,and an analysis of the technological parameters of the supercomputers YH-1 in China and Cray-1 in the United States,this paper reviews in detail the historic process of the development of YH-1,analyzing its innovation and summarizing the experience and lessons learned from it.This analysis is significant for current military-civilian integration,and the commercialization of university research findings in China.展开更多
As an important branch of information technology, high-performance computing has expanded its application field and its influence has been expanding. High-performance computing is always a key area of application in m...As an important branch of information technology, high-performance computing has expanded its application field and its influence has been expanding. High-performance computing is always a key area of application in meteorology. We used field research and literature review methods to study the application of high performance computing in China’s meteorological department, and obtained the following results: 1) China Meteorological Department gradually established the first high-performance computer system since 1978. High-performance computing services can support operational numerical weather prediction models. 2) The Chinese meteorological department has always used the relatively advanced high-performance computing technology, and the business system capability has been continuously improved. The computing power has become an important symbol of the level of meteorological modernization. 3) High-performance computing technology and meteorological numerical forecasting applications are increasingly integrated, and continue to innovate and develop. 4) In the future, high-performance computing resource management will gradually transit from the current local pre-allocation mode to the local remote unified scheduling and shared use. In summary, we have come to the conclusion that the performance calculation business of the meteorological department will usher in a better tomorrow.展开更多
We have demonstrated the application of the world’s fastest supercomputer Fugaku located in Japan to select the COVID-19 drugs and stopping the pandemic spread. Using computer simulation out of 2128 potential drug ca...We have demonstrated the application of the world’s fastest supercomputer Fugaku located in Japan to select the COVID-19 drugs and stopping the pandemic spread. Using computer simulation out of 2128 potential drug candidates, the world’s fastest supercomputer picked 30 most effective and potential drugs. Twelve of them are under clinical trials outside Japan;some are being tested in Japan. The computer reduced the computation time from one year to 10 days when compared to second superfast computer of the world. Fugaku supercomputer was employed to know the behavior of airborne aerosol COVID-19 virus. 3Cs were suggested: avoid closed and crowded spaces and contacts to stop the pandemic spread. The progress in vaccine development and proper use and type of mask has also been described in this article. The article will benefit greatly to stop spreading and treating the pandemic COVID-19.展开更多
Exploring the human brain is perhaps the most challenging and fascinating scientific issue in the 21st century.It will facilitate the development of various aspects of the society,including economics,education,health ...Exploring the human brain is perhaps the most challenging and fascinating scientific issue in the 21st century.It will facilitate the development of various aspects of the society,including economics,education,health care,national defense and daily life.The artificial intelligence techniques are becoming useful as an alternate method of classical techniques or as a component of an integrated system.They are used to solve complicated problems in various fields and becoming increasingly popular nowadays.Especially,the investigation of human brain will promote the artificial intelligence techniques,utilizing the accumulating knowledge of neuroscience,brain-machine interface techniques,algorithms of spiking neural networks and neuromorphic supercomputers.Consequently,we provide a comprehensive survey of the research and motivations for brain-inspired artificial intelligence and its engineering over its history.The goals of this work are to provide a brief review of the research associated with brain-inspired artificial intelligence and its related engineering techniques,and to motivate further work by elucidating challenges in the field where new researches are required.展开更多
当前,高渗透性反渗透膜材料的研究引起了广泛的关注,然而高渗透导致的浓差极化与膜污染加剧等瓶颈问题限制了高性能膜材料的应用发展.本工作采用机器学习结合超级计算提出了针对先进反渗透膜材料的组件进水隔网(亚毫米级)与系统(米级)...当前,高渗透性反渗透膜材料的研究引起了广泛的关注,然而高渗透导致的浓差极化与膜污染加剧等瓶颈问题限制了高性能膜材料的应用发展.本工作采用机器学习结合超级计算提出了针对先进反渗透膜材料的组件进水隔网(亚毫米级)与系统(米级)的多尺度优化设计新方法.在进料含盐度35,000 ppm,回收率50%典型工况下,对标目前国际先进海水反渗透淡化工艺,本文提出的优化方案能使淡水制备比能耗(1.66 k Wh/m^(3))降低27.5%,所需膜面积减少约37.2%,系统最大浓差极化因子控制在工程允许范围以内(<1.20),可有效缓解高渗透膜系统中膜污染问题,为高性能膜材料精准设计提供理论依据、计算工具和大数据支撑,有重要的应用潜力.本文提出的机器学习结合超算的多尺度设计新研究范式,突破了基于“试错法”的传统单一尺度组件设计限制,高通量并行计算规模可扩展至93,120核以上,较串行算法计算效率提升3000倍以上,可大幅度缩短高性能膜组件的设计周期.展开更多
Unified programming models can effectively improve program portability on various heterogeneous high-performance computers.Existing unified programming models put a lot of effort to code portability but are still far ...Unified programming models can effectively improve program portability on various heterogeneous high-performance computers.Existing unified programming models put a lot of effort to code portability but are still far from achieving good performance portability.In this paper,we present a preliminary design of a performance-portable unified programming model including four aspects:programming language,programming abstraction,compilation optimization,and scheduling system.Specifically,domain-specific languages introduce domain knowledge to decouple the optimizations for different applications and architectures.The unified programming abstraction unifies the common features of different architectures to support common optimizations.Multi-level compilation optimization enables comprehensive performance optimization based on multi-level intermediate representations.Resource-aware lightweight runtime scheduling system improves the resource utilization of heterogeneous computers.This is a perspective paper to show our viewpoints on programming models for emerging heterogeneous systems.展开更多
Network technology is the basis for large-scale high-efficiency network computing, such as supercomputing, cloud computing, big data processing, and artificial intelligence computing. The network technologies of netwo...Network technology is the basis for large-scale high-efficiency network computing, such as supercomputing, cloud computing, big data processing, and artificial intelligence computing. The network technologies of network computing systems in different fields not only learn from each other but also have targeted design and optimization. Considering it comprehensively,three development trends, i.e., integration, differentiation, and optimization, are summarized in this paper for network technologies in different fields. Integration reflects that there are no clear boundaries for network technologies in different fields, differentiation reflects that there are some unique solutions in different application fields or innovative solutions under new application requirements,and optimization reflects that there are some optimizations for specific scenarios. This paper can help academic researchers consider what should be done in the future and industry personnel consider how to build efficient practical network systems.展开更多
With various exascale systems in different countries planned over the next three to five years, developing application software for such unprecedented computing capabilities and parallel scaling becomes a major challe...With various exascale systems in different countries planned over the next three to five years, developing application software for such unprecedented computing capabilities and parallel scaling becomes a major challenge. In this study, we start our discussion with the current 125-Pflops Sunway TaihuLight system in China and its related application challenges and solutions. Based on our current experience with Sunway TaihuLight, we provide a projection into the next decade and discuss potential challenges and possible trends we would probably observe in future high performance computing software.展开更多
On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design...On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity- off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16- core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system.展开更多
This paper presents an overview of TianHe-lA (TH-1A) supercomputer, which is built by National University of Defense Technology of China (NUDT). TH-1A adopts a hybrid architecture by integrating CPUs and GPUs, and...This paper presents an overview of TianHe-lA (TH-1A) supercomputer, which is built by National University of Defense Technology of China (NUDT). TH-1A adopts a hybrid architecture by integrating CPUs and GPUs, and its interconnect network is a proprietary high-speed communication network. The theoretical peak performance of TH-1A is 4700TFlops, and its LINPACK test result is 2566TFlops. It was ranked the No. 1 on the TOP500 List released in November, 2010. TH-1A is now deployed in National Supercomputer Center in Tianjin and provides high performance computing services. TH-1A has played an important role in many applications, such as oil exploration, weather forecast, bio-medical research.展开更多
In this paper, we present the Tianhe-2 interconnect network and message passing services. We describe the architecture of the router and network interface chips, and highlight a set of hardware and software features e...In this paper, we present the Tianhe-2 interconnect network and message passing services. We describe the architecture of the router and network interface chips, and highlight a set of hardware and software features effectively supporting high performance communications, ranging over remote direct memory access, collective optimization, hardwareenable reliable end-to-end communication, user-level message passing services, etc. Measured hardware performance results are also presented.展开更多
With the rapid improvement of computation capability in high performance supercomputer system, the imbalance of performance between computation subsystem and storage subsystem has become more and more serious, especia...With the rapid improvement of computation capability in high performance supercomputer system, the imbalance of performance between computation subsystem and storage subsystem has become more and more serious, especially when various big data are produced ranging from tens of gigabytes up to terabytes. To reduce this gap, large-scale storage systems need to be designed and implemented with high performance and scalability. MilkyWay-2 (TH-2) supercomputer system with peak performance 54.9 Props, definitely has this kind of requirement for storage system. This paper mainly introduces the storage system in MilkyWay-2 supercomputer, including the hardware architecture and the parallel file system. The storage system in MilkyWay-2 supercomputer exploits a novel hybrid hierarchy storage architecture to enable high scalability of I/O clients, I/O bandwidth and storage capacity. To fit this architecture, a user level virtualized file system, named H^2FS, is designed and implemented which can cooperate local storage and shared storage together into a dynamic single namespace to optimize I/O performance in IO-intensive applications. The evaluation results show that the storage system in MilkyWay-2 supercomputer can satisfy the critical requirements in large scale supercomputer, such as performance and scalability.展开更多
High performance computers provide strategic computing power in the construction of national economy and defense, and become one of symbols of the country's overall strength. Over 30 years, with the supports of gover...High performance computers provide strategic computing power in the construction of national economy and defense, and become one of symbols of the country's overall strength. Over 30 years, with the supports of governments, the technology of high performance computers is in the process of rapid development, during which the computing performance increases nearly 3 million times and the processors number expands over 10 hundred thousands times. To solve the critical issues related with parallel efficiency and scalability, scientific researchers pursued extensive theoretical studies and technical innovations. The paper briefly looks back the course of building high performance computer systems both at home and abroad, and summarizes the significant breakthroughs of international high performance computer technology. We also overview the technology progress of China in the area of parallel computer architecture, parallel operating system and resource management, parallel compiler and performance optimization, environment for parallel programming and network computing. Finally, we examine the challenging issues, "memory wall", system scalability and "power wall", and discuss the issues of high productivity computers, which is the trend in building next generation high performance computers.展开更多
In this paper we present the programming of the Linpack benchmark on TianHe-1 system,the first petascale supercomputer system of China,and the largest GPU-accelerated heterogeneous system ever attempted before.A hybri...In this paper we present the programming of the Linpack benchmark on TianHe-1 system,the first petascale supercomputer system of China,and the largest GPU-accelerated heterogeneous system ever attempted before.A hybrid programming model consisting of MPI,OpenMP and streaming computing is described to explore the task parallel,thread parallel and data parallel of the Linpack.We explain how we optimized the load distribution across the CPUs and GPUs using the two-level adaptive method and describe the implementation in details.To overcome the low-bandwidth between the CPU and GPU communication,we present a software pipelining technique to hide the communication overhead.Combined with other traditional optimizations,the Linpack we developed achieved 196.7 GFLOPS on a single compute element of TianHe-1.This result is 70.1% of the peak compute capability,3.3 times faster than the result by using the vendor's library.On the full configuration of TianHe-1 our optimizations resulted in a Linpack performance of 0.563 PFLOPS,which made TianHe-1 the 5th fastest supercomputer on the Top500 list in November,2009.展开更多
Dawning Nebulae is a heterogeneous system composed of 9280 multi-core x86 CPUs and 4640 NVIDIA Fermi GPUs. With a Linpack performance of 1.271 petaFLOPS, it was ranked the second in the TOP500 List released in June 20...Dawning Nebulae is a heterogeneous system composed of 9280 multi-core x86 CPUs and 4640 NVIDIA Fermi GPUs. With a Linpack performance of 1.271 petaFLOPS, it was ranked the second in the TOP500 List released in June 2010. In this paper, key issues in the system design of Dawning Nebulae are introduced. System tuning methodologies aiming at petaFLOPS Linpack result are presented, including algorithmic optimization and communication improvement. The design of its file I/O subsystem, including HVFS and the underlying DCFS3, is also described. Performance evaluations show that the Linpack efficiency of each node reaches 69.89%, and 1024-node aggregate read and write bandwidths exceed 100 GB/s and 70 GB/s respectively. The success of Dawning Nebulae has demonstrated the viability of CPU/GPU heterogeneous structure for future designs of supercomputers.展开更多
基金This work is supported by the National Key Research and Development Plan program of the Ministry of Science and Technology of China(No.2016YFB0201100)Additionally,this work is supported by the National Laboratory for Marine Science and Technology(Qingdao)Major Project of the Aoshan Science and Technology Innovation Program(No.2018ASKJ01-04)the Open Fundation of Key Laboratory of Marine Science and Numerical Simulation,Ministry of Natural Resources(No.2021-YB-02).
文摘In this paper,a typical experiment is carried out based on a high-resolution air-sea coupled model,namely,the coupled ocean-atmosphere-wave-sediment transport(COAWST)model,on both heterogeneous many-core(SW)and homogenous multicore(Intel)supercomputing platforms.We construct a hindcast of Typhoon Lekima on both the SW and Intel platforms,compare the simulation results between these two platforms and compare the key elements of the atmospheric and ocean modules to reanalysis data.The comparative experiment in this typhoon case indicates that the domestic many-core computing platform and general cluster yield almost no differences in the simulated typhoon path and intensity,and the differences in surface pressure(PSFC)in the WRF model and sea surface temperature(SST)in the short-range forecast are very small,whereas a major difference can be identified at high latitudes after the first 10 days.Further heat budget analysis verifies that the differences in SST after 10 days are mainly caused by shortwave radiation variations,as influenced by subsequently generated typhoons in the system.These typhoons generated in the hindcast after the first 10 days attain obviously different trajectories between the two platforms.
基金Project supported by the Key R&D Program of Zhejiang Province,China(No.2022C01250)the National Key R&D Program of China(No.2019YFA0709402)。
文摘With the continuous improvement of supercomputer performance and the integration of artificial intelligence with traditional scientific computing,the scale of applications is gradually increasing,from millions to tens of millions of computing cores,which raises great challenges to achieve high scalability and efficiency of parallel applications on super-large-scale systems.Taking the Sunway exascale prototype system as an example,in this paper we first analyze the challenges of high scalability and high efficiency for parallel applications in the exascale era.To overcome these challenges,the optimization technologies used in the parallel supporting environment software on the Sunway exascale prototype system are highlighted,including the parallel operating system,input/output(I/O)optimization technology,ultra-large-scale parallel debugging technology,10-million-core parallel algorithm,and mixed-precision method.Parallel operating systems and I/O optimization technology mainly support largescale system scaling,while the ultra-large-scale parallel debugging technology,10-million-core parallel algorithm,and mixed-precision method mainly enhance the efficiency of large-scale applications.Finally,the contributions to various applications running on the Sunway exascale prototype system are introduced,verifying the effectiveness of the parallel supporting environment design.
基金This research was sponsored by the Advanced Scientific Computing Research Program,the Office of Science,U.SDepartment of Energy through grants DE-SC0014917,DE-SC0012610,and DE-AC02-06CH11357.
文摘The overall efficiency of an extreme-scale supercomputer largely relies on the performance of its network interconnects.Several of the state of the art supercomputers use networks based on the increasingly popular Dragonfly topology.It is crucial to study the behavior and performance of different parallel applications running on Dragonfly networks in order to make optimal system configurations and design choices,such as job scheduling and routing strategies.However,in order to study these temporal network behavior,we would need a tool to analyze and correlate numerous sets of multivariate time-series data collected from the Dragonfly's multi-level hierarchies.This paper presents such a tool-a visual analytics system-that uses the Dragonfly network to investigate the temporal behavior and optimize the communication performance of a supercomputer.We coupled interactive visualization with time-series analysis methods to help reveal hidden patterns in the network behavior with respect to different parallel applications and system configurations.Our system also provides multiple coordinated views for connecting behaviors observed at different levels of the network hierarchies,which effectively helps visual analysis tasks.We demonstrate the effectiveness of the system with a set of case studies.Our system and findings can not only help improve the communication performance of supercomputing applications,but also the network performance of next-generation supercomputers.
文摘In June 2018, the United States claimed the No. 1 position in supercomputing according to TOP500, which ranks the top 500 most powerful computer systems in the world [1]. The US Department of Energy’s Summit machine (Fig. 1)[1] claimed this distinction, which previously had been held by China’s Sunway TaihuLight supercomputer.
文摘The first in China 10~9 sparallel supercomputer, named as Yinhe-Ⅱ, had been manufac-tured by Science-technological University of National Defence. The main feature of thesupercomputer are: 4-processor system, the principle frequence 50 MHz, the word length 64 byte,the main memory 256 Mb, two individual input / output subsystems, > 10~9 operations per sec-
文摘China’s first supercomputer capable of 100 million calculations per second was the YH-1,which was independently developed by the Institute of Computer Science at the National University of Defense Technology(NUDT)between 1978 and 1983.YH-1 played an important role in China’s national defense construction and national economic development.It made China one of the few countries in the world to successfully develop a supercomputer.Based on original archive documents,interviews with relevant personnel,and an analysis of the technological parameters of the supercomputers YH-1 in China and Cray-1 in the United States,this paper reviews in detail the historic process of the development of YH-1,analyzing its innovation and summarizing the experience and lessons learned from it.This analysis is significant for current military-civilian integration,and the commercialization of university research findings in China.
文摘As an important branch of information technology, high-performance computing has expanded its application field and its influence has been expanding. High-performance computing is always a key area of application in meteorology. We used field research and literature review methods to study the application of high performance computing in China’s meteorological department, and obtained the following results: 1) China Meteorological Department gradually established the first high-performance computer system since 1978. High-performance computing services can support operational numerical weather prediction models. 2) The Chinese meteorological department has always used the relatively advanced high-performance computing technology, and the business system capability has been continuously improved. The computing power has become an important symbol of the level of meteorological modernization. 3) High-performance computing technology and meteorological numerical forecasting applications are increasingly integrated, and continue to innovate and develop. 4) In the future, high-performance computing resource management will gradually transit from the current local pre-allocation mode to the local remote unified scheduling and shared use. In summary, we have come to the conclusion that the performance calculation business of the meteorological department will usher in a better tomorrow.
文摘We have demonstrated the application of the world’s fastest supercomputer Fugaku located in Japan to select the COVID-19 drugs and stopping the pandemic spread. Using computer simulation out of 2128 potential drug candidates, the world’s fastest supercomputer picked 30 most effective and potential drugs. Twelve of them are under clinical trials outside Japan;some are being tested in Japan. The computer reduced the computation time from one year to 10 days when compared to second superfast computer of the world. Fugaku supercomputer was employed to know the behavior of airborne aerosol COVID-19 virus. 3Cs were suggested: avoid closed and crowded spaces and contacts to stop the pandemic spread. The progress in vaccine development and proper use and type of mask has also been described in this article. The article will benefit greatly to stop spreading and treating the pandemic COVID-19.
文摘Exploring the human brain is perhaps the most challenging and fascinating scientific issue in the 21st century.It will facilitate the development of various aspects of the society,including economics,education,health care,national defense and daily life.The artificial intelligence techniques are becoming useful as an alternate method of classical techniques or as a component of an integrated system.They are used to solve complicated problems in various fields and becoming increasingly popular nowadays.Especially,the investigation of human brain will promote the artificial intelligence techniques,utilizing the accumulating knowledge of neuroscience,brain-machine interface techniques,algorithms of spiking neural networks and neuromorphic supercomputers.Consequently,we provide a comprehensive survey of the research and motivations for brain-inspired artificial intelligence and its engineering over its history.The goals of this work are to provide a brief review of the research associated with brain-inspired artificial intelligence and its related engineering techniques,and to motivate further work by elucidating challenges in the field where new researches are required.
基金support provided by Key-Area Research and Development Program of Guangdong Province(2021B0101190003)Zhujiang Talent Program of Guangdong Province(2017GC010576)+3 种基金Natural Science Foundation of Guangdong Province,China(2022A1515011514)financial support from the National Science Foundation(2140946)financial support from the UCLA Sustainable LA Grand Challengefinancial support from China Postdoctoral Science Foundation(2022M723674)。
文摘当前,高渗透性反渗透膜材料的研究引起了广泛的关注,然而高渗透导致的浓差极化与膜污染加剧等瓶颈问题限制了高性能膜材料的应用发展.本工作采用机器学习结合超级计算提出了针对先进反渗透膜材料的组件进水隔网(亚毫米级)与系统(米级)的多尺度优化设计新方法.在进料含盐度35,000 ppm,回收率50%典型工况下,对标目前国际先进海水反渗透淡化工艺,本文提出的优化方案能使淡水制备比能耗(1.66 k Wh/m^(3))降低27.5%,所需膜面积减少约37.2%,系统最大浓差极化因子控制在工程允许范围以内(<1.20),可有效缓解高渗透膜系统中膜污染问题,为高性能膜材料精准设计提供理论依据、计算工具和大数据支撑,有重要的应用潜力.本文提出的机器学习结合超算的多尺度设计新研究范式,突破了基于“试错法”的传统单一尺度组件设计限制,高通量并行计算规模可扩展至93,120核以上,较串行算法计算效率提升3000倍以上,可大幅度缩短高性能膜组件的设计周期.
基金partially supported by the National Natural Science Foundation of China under Grant No.62225206.
文摘Unified programming models can effectively improve program portability on various heterogeneous high-performance computers.Existing unified programming models put a lot of effort to code portability but are still far from achieving good performance portability.In this paper,we present a preliminary design of a performance-portable unified programming model including four aspects:programming language,programming abstraction,compilation optimization,and scheduling system.Specifically,domain-specific languages introduce domain knowledge to decouple the optimizations for different applications and architectures.The unified programming abstraction unifies the common features of different architectures to support common optimizations.Multi-level compilation optimization enables comprehensive performance optimization based on multi-level intermediate representations.Resource-aware lightweight runtime scheduling system improves the resource utilization of heterogeneous computers.This is a perspective paper to show our viewpoints on programming models for emerging heterogeneous systems.
基金Project supported by the National Natural Science Foundation of China (Nos. 61972412, 62202486, and 12102468)。
文摘Network technology is the basis for large-scale high-efficiency network computing, such as supercomputing, cloud computing, big data processing, and artificial intelligence computing. The network technologies of network computing systems in different fields not only learn from each other but also have targeted design and optimization. Considering it comprehensively,three development trends, i.e., integration, differentiation, and optimization, are summarized in this paper for network technologies in different fields. Integration reflects that there are no clear boundaries for network technologies in different fields, differentiation reflects that there are some unique solutions in different application fields or innovative solutions under new application requirements,and optimization reflects that there are some optimizations for specific scenarios. This paper can help academic researchers consider what should be done in the future and industry personnel consider how to build efficient practical network systems.
基金Project supported by the National Key Technology R&D Program of China(No.2016YFA0602200)
文摘With various exascale systems in different countries planned over the next three to five years, developing application software for such unprecedented computing capabilities and parallel scaling becomes a major challenge. In this study, we start our discussion with the current 125-Pflops Sunway TaihuLight system in China and its related application challenges and solutions. Based on our current experience with Sunway TaihuLight, we provide a projection into the next decade and discuss potential challenges and possible trends we would probably observe in future high performance computing software.
基金Acknowledgements This work was partially supported by the Na- tional High-tech R&D Program of China (863 Program) (2012AA01A301), and the National Natural Science Foundation of China (Grant No. 61120106005). The MilkyWay-2 project is a great team effort and benefits from the cooperation of many individuals at NUDT. We thank all the people who have contributed to the system in a variety of ways.
文摘On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity- off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16- core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system.
基金Supported by the National High Technology Research and Development 863 Program of China under Grant No. 2009AA01A128
文摘This paper presents an overview of TianHe-lA (TH-1A) supercomputer, which is built by National University of Defense Technology of China (NUDT). TH-1A adopts a hybrid architecture by integrating CPUs and GPUs, and its interconnect network is a proprietary high-speed communication network. The theoretical peak performance of TH-1A is 4700TFlops, and its LINPACK test result is 2566TFlops. It was ranked the No. 1 on the TOP500 List released in November, 2010. TH-1A is now deployed in National Supercomputer Center in Tianjin and provides high performance computing services. TH-1A has played an important role in many applications, such as oil exploration, weather forecast, bio-medical research.
基金This work was partially supported by the National High Technology Research and Development 863 Program of China under Grant No. 2012AA01A301 and the National Natural Science Foundation of China under Grant No. 61120106005. Acknowledgements The Tianhe-2 project is a great team effort and benefits from the cooperation of many individuals at NUDT. We would like to thank the entire Tianhe-2 development, applications, and bench- marking teams, and all the people who have contributed to the system in a variety of ways.
文摘In this paper, we present the Tianhe-2 interconnect network and message passing services. We describe the architecture of the router and network interface chips, and highlight a set of hardware and software features effectively supporting high performance communications, ranging over remote direct memory access, collective optimization, hardwareenable reliable end-to-end communication, user-level message passing services, etc. Measured hardware performance results are also presented.
基金Acknowledgements This work was supported by the National High-Tech Research & Development Program of China (863 Program) (2012AA01A301), and by the National Natural Science Foundation of China (Grant Nos. 61120106005, 61202118, 61303187).
文摘With the rapid improvement of computation capability in high performance supercomputer system, the imbalance of performance between computation subsystem and storage subsystem has become more and more serious, especially when various big data are produced ranging from tens of gigabytes up to terabytes. To reduce this gap, large-scale storage systems need to be designed and implemented with high performance and scalability. MilkyWay-2 (TH-2) supercomputer system with peak performance 54.9 Props, definitely has this kind of requirement for storage system. This paper mainly introduces the storage system in MilkyWay-2 supercomputer, including the hardware architecture and the parallel file system. The storage system in MilkyWay-2 supercomputer exploits a novel hybrid hierarchy storage architecture to enable high scalability of I/O clients, I/O bandwidth and storage capacity. To fit this architecture, a user level virtualized file system, named H^2FS, is designed and implemented which can cooperate local storage and shared storage together into a dynamic single namespace to optimize I/O performance in IO-intensive applications. The evaluation results show that the storage system in MilkyWay-2 supercomputer can satisfy the critical requirements in large scale supercomputer, such as performance and scalability.
基金The paper is partly supported by the National Natural Science Foundation of China under Grant No. 69933030. Acknowledgement We have to indicate with great regret that some excellent researches may not be mentioned in this paper because of our limited knowledge and the wide area related with high performance computer technology.
文摘High performance computers provide strategic computing power in the construction of national economy and defense, and become one of symbols of the country's overall strength. Over 30 years, with the supports of governments, the technology of high performance computers is in the process of rapid development, during which the computing performance increases nearly 3 million times and the processors number expands over 10 hundred thousands times. To solve the critical issues related with parallel efficiency and scalability, scientific researchers pursued extensive theoretical studies and technical innovations. The paper briefly looks back the course of building high performance computer systems both at home and abroad, and summarizes the significant breakthroughs of international high performance computer technology. We also overview the technology progress of China in the area of parallel computer architecture, parallel operating system and resource management, parallel compiler and performance optimization, environment for parallel programming and network computing. Finally, we examine the challenging issues, "memory wall", system scalability and "power wall", and discuss the issues of high productivity computers, which is the trend in building next generation high performance computers.
基金Supported by the National High Technology Research and Development 863 Program of China under Grant No.2009AA01A128the Major Science and Technology Project of China under Grant No.2009ZX01036-001-003-001the National Natural Science Foundation of China under Grant Nos.61003087,60903044,60903059,60970033,and60673150
文摘In this paper we present the programming of the Linpack benchmark on TianHe-1 system,the first petascale supercomputer system of China,and the largest GPU-accelerated heterogeneous system ever attempted before.A hybrid programming model consisting of MPI,OpenMP and streaming computing is described to explore the task parallel,thread parallel and data parallel of the Linpack.We explain how we optimized the load distribution across the CPUs and GPUs using the two-level adaptive method and describe the implementation in details.To overcome the low-bandwidth between the CPU and GPU communication,we present a software pipelining technique to hide the communication overhead.Combined with other traditional optimizations,the Linpack we developed achieved 196.7 GFLOPS on a single compute element of TianHe-1.This result is 70.1% of the peak compute capability,3.3 times faster than the result by using the vendor's library.On the full configuration of TianHe-1 our optimizations resulted in a Linpack performance of 0.563 PFLOPS,which made TianHe-1 the 5th fastest supercomputer on the Top500 list in November,2009.
基金supported by the National Hi-Tech Research and Development 863 Program of China under Grant No. 2009AA01A129the National Natural Science Foundation of China under Grant Nos. 60633040,60803030,61033009,the National Natural Science Foundation for Distinguished Young Scholars of China under Grant No. 60925009 and the Foundation for Innovative Research Groups of the National Natural Science Foundation of China under Grant No. 60921002the National Basic Research 973 Program of China under Grant No. 2011CB302500
文摘Dawning Nebulae is a heterogeneous system composed of 9280 multi-core x86 CPUs and 4640 NVIDIA Fermi GPUs. With a Linpack performance of 1.271 petaFLOPS, it was ranked the second in the TOP500 List released in June 2010. In this paper, key issues in the system design of Dawning Nebulae are introduced. System tuning methodologies aiming at petaFLOPS Linpack result are presented, including algorithmic optimization and communication improvement. The design of its file I/O subsystem, including HVFS and the underlying DCFS3, is also described. Performance evaluations show that the Linpack efficiency of each node reaches 69.89%, and 1024-node aggregate read and write bandwidths exceed 100 GB/s and 70 GB/s respectively. The success of Dawning Nebulae has demonstrated the viability of CPU/GPU heterogeneous structure for future designs of supercomputers.