There are a wide variety of intelligence accelerators with promising performance and energy efficiency,deployed in a broad range of applications such as computer vision and speech recognition.However,programming produ...There are a wide variety of intelligence accelerators with promising performance and energy efficiency,deployed in a broad range of applications such as computer vision and speech recognition.However,programming productivity hinders the deployment of deep learning accelerators.The low-level library invoked in the high-level deep learning framework which supports the end-to-end execution with a given model,is designed to reduce the programming burden on the intelligence accelerators.Unfortunately,it is inflexible for developers to build a network model for every deep learning application,which probably brings unnecessary repetitive implementation.In this paper,a flexible and efficient programming framework for deep learning accelerators,FlexPDA,is proposed,which provides more optimization opportunities than the low-level library and realizes quick transplantation of applications to intelligence accelerators for fast upgrades.We evaluate FlexPDA by using 10 representative operators selected from deep learning algorithms and an end-to-end network.The experimental results validate the effectiveness of FlexPDA,which achieves an end-to-end performance improvement of 1.620x over the low-level library.展开更多
Context-aware system is an emerging research area in recent years. Context plays an important role in these systems. In most existing work, context is treated as all rel- ative elements in the environment of an applic...Context-aware system is an emerging research area in recent years. Context plays an important role in these systems. In most existing work, context is treated as all rel- ative elements in the environment of an application, and the scope of context is predefined by the developers during the development. However, it is difficult to analyze, specify, and organize everything in the environment accurately and com- pletely; and even when it is possible, the developed applica- tions are difficult to extend or modify as the requests for en- vironment may change over time. In this paper, we focus on activity-oriented context-aware (AOCA) applications where the requests for environment are highly dependent on user activities, and propose a programming framework for devel- oping AOCA applications. In particular, we first present a concept model for describing the notions of activity-oriented context. Next, based on the concept model, we describe the details of the programming framework as well as a develop- ment tool. Moreover, we provide a platform to support the runtime of AOCA applications, and demonstrate the advan- tages of our programming framework through experimental evaluations.展开更多
Peta-scale high-perfomlance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to co...Peta-scale high-perfomlance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to conduct computational experiments of historic significance, these systems are presently difficult to program. The users, who are domain experts rather than computer experts, prefer to use programming models closer to their domains (e.g., physics and biology) rather than MPI and OpenME This has led the development of domain-specific programming that provides domain-specific programming interfaces but abstracts away some performance-critical architecture details. Based on experience in designing large-scale computing systems, a hybrid programming framework for scientific computing on heterogeneous architectures is proposed in this work. Its design philosophy is to provide a collaborative mechanism for domain experts and computer experts so that both domain-specific knowledge and performance-critical architecture details can be adequately exploited. Two real-world scientific applications have been evaluated on TH-IA, a peta-scale CPU-GPU heterogeneous system that is currently the 5th fastest supercomputer in the world. The experimental results show that the proposed framework is well suited for developing large-scale scientific computing applications on peta-scale heterogeneous CPU/GPU systems.展开更多
Nowadays, there exist numerous images in the Internet, and with the development ot cloud compuung ano big data applications, many of those images need to be processed for different kinds of applications by using speci...Nowadays, there exist numerous images in the Internet, and with the development ot cloud compuung ano big data applications, many of those images need to be processed for different kinds of applications by using specific image processing algorithms. Meanwhile, there already exist many kinds of image processing algorithms and their variations, while new algorithms are still emerging. Consequently, an ongoing problem is how to improve the efficiency of massive image processing and support the integration of existing implementations of image processing algorithms into the systems. This paper proposes a distributed image processing system named SEIP, which is built on Hadoop, and employs extensible in- node architecture to support various kinds of image processing algorithms on distributed platforms with GPU accelerators. The system also uses a pipeline-based h'amework to accelerate massive image file processing. A demonstration application for image feature extraction is designed. The system is evaluated in a small-scale Hadoop cluster with GPU accelerators, and the experimental results show the usability and efficiency of SEIP.展开更多
基金This work was supported by the National Key Research and Development Program of China under Grant No.2017YFB1003103the Natural Science Research Foundation of Jilin Province of China under Grant No.20190201193JCthe Fundamental Research Funds for the Central Universities,JLU.
文摘There are a wide variety of intelligence accelerators with promising performance and energy efficiency,deployed in a broad range of applications such as computer vision and speech recognition.However,programming productivity hinders the deployment of deep learning accelerators.The low-level library invoked in the high-level deep learning framework which supports the end-to-end execution with a given model,is designed to reduce the programming burden on the intelligence accelerators.Unfortunately,it is inflexible for developers to build a network model for every deep learning application,which probably brings unnecessary repetitive implementation.In this paper,a flexible and efficient programming framework for deep learning accelerators,FlexPDA,is proposed,which provides more optimization opportunities than the low-level library and realizes quick transplantation of applications to intelligence accelerators for fast upgrades.We evaluate FlexPDA by using 10 representative operators selected from deep learning algorithms and an end-to-end network.The experimental results validate the effectiveness of FlexPDA,which achieves an end-to-end performance improvement of 1.620x over the low-level library.
基金This research was funded by the National Ba- sic Research Program (973 program) (2015CB352202), the National High Technology Research and Development Program (863 program) (2015AA01A203), and the National Natural Science Foundation of China (Grant Nos. 91318301, 61373011, 61321491).
文摘Context-aware system is an emerging research area in recent years. Context plays an important role in these systems. In most existing work, context is treated as all rel- ative elements in the environment of an application, and the scope of context is predefined by the developers during the development. However, it is difficult to analyze, specify, and organize everything in the environment accurately and com- pletely; and even when it is possible, the developed applica- tions are difficult to extend or modify as the requests for en- vironment may change over time. In this paper, we focus on activity-oriented context-aware (AOCA) applications where the requests for environment are highly dependent on user activities, and propose a programming framework for devel- oping AOCA applications. In particular, we first present a concept model for describing the notions of activity-oriented context. Next, based on the concept model, we describe the details of the programming framework as well as a develop- ment tool. Moreover, we provide a platform to support the runtime of AOCA applications, and demonstrate the advan- tages of our programming framework through experimental evaluations.
基金Project(61170049) supported by the National Natural Science Foundation of ChinaProject(2012AA010903) supported by the National High Technology Research and Development Program of China
文摘Peta-scale high-perfomlance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to conduct computational experiments of historic significance, these systems are presently difficult to program. The users, who are domain experts rather than computer experts, prefer to use programming models closer to their domains (e.g., physics and biology) rather than MPI and OpenME This has led the development of domain-specific programming that provides domain-specific programming interfaces but abstracts away some performance-critical architecture details. Based on experience in designing large-scale computing systems, a hybrid programming framework for scientific computing on heterogeneous architectures is proposed in this work. Its design philosophy is to provide a collaborative mechanism for domain experts and computer experts so that both domain-specific knowledge and performance-critical architecture details can be adequately exploited. Two real-world scientific applications have been evaluated on TH-IA, a peta-scale CPU-GPU heterogeneous system that is currently the 5th fastest supercomputer in the world. The experimental results show that the proposed framework is well suited for developing large-scale scientific computing applications on peta-scale heterogeneous CPU/GPU systems.
基金The work was supported by the National Natural Science Foundation of China (NSFC) under Grant No. 61133004, the National High Technology Research and Development 863 Program of China under Grant No. 2012AA01A302, and the NSFC Projects of International Cooperation and Exchanges under Grant No. 61361126011.
文摘Nowadays, there exist numerous images in the Internet, and with the development ot cloud compuung ano big data applications, many of those images need to be processed for different kinds of applications by using specific image processing algorithms. Meanwhile, there already exist many kinds of image processing algorithms and their variations, while new algorithms are still emerging. Consequently, an ongoing problem is how to improve the efficiency of massive image processing and support the integration of existing implementations of image processing algorithms into the systems. This paper proposes a distributed image processing system named SEIP, which is built on Hadoop, and employs extensible in- node architecture to support various kinds of image processing algorithms on distributed platforms with GPU accelerators. The system also uses a pipeline-based h'amework to accelerate massive image file processing. A demonstration application for image feature extraction is designed. The system is evaluated in a small-scale Hadoop cluster with GPU accelerators, and the experimental results show the usability and efficiency of SEIP.