期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
Towards optimized tensor code generation for deep learning on sunway many-core processor
1
作者 Mingzhen LI Changxi LIU +8 位作者 Jianjin LIAO Xuegui ZHENG Hailong yang Rujun SUN Jun XU Lin GAN guangwen yang Zhongzhi LUAN Depei QIAN 《Frontiers of Computer Science》 SCIE EI CSCD 2024年第2期1-15,共15页
The flourish of deep learning frameworks and hardware platforms has been demanding an efficient compiler that can shield the diversity in both software and hardware in order to provide application portability.Among th... The flourish of deep learning frameworks and hardware platforms has been demanding an efficient compiler that can shield the diversity in both software and hardware in order to provide application portability.Among the existing deep learning compilers,TVM is well known for its efficiency in code generation and optimization across diverse hardware devices.In the meanwhile,the Sunway many-core processor renders itself as a competitive candidate for its attractive computational power in both scientific computing and deep learning workloads.This paper combines the trends in these two directions.Specifically,we propose swTVM that extends the original TVM to support ahead-of-time compilation for architecture requiring cross-compilation such as Sunway.In addition,we leverage the architecture features during the compilation such as core group for massive parallelism,DMA for high bandwidth memory transfer and local device memory for data locality,in order to generate efficient codes for deep learning workloads on Sunway.The experiment results show that the codes generated by swTVM achieve 1.79x improvement of inference latency on average compared to the state-of-the-art deep learning framework on Sunway,across eight representative benchmarks.This work is the first attempt from the compiler perspective to bridge the gap of deep learning and Sunway processor particularly with productivity and efficiency in mind.We believe this work will encourage more people to embrace the power of deep learning and Sunwaymany-coreprocessor. 展开更多
关键词 sunway processor deep learning compiler code generation performance optimization
原文传递
SCStore: Managing Scientific Computing Packages for Hybrid System with Containers 被引量:5
2
作者 Wusheng Zhang Jiao Lin +2 位作者 Weiping Xu Haohuan Fu guangwen yang 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2017年第6期675-681,共7页
Managing software packages in a scientific computing environment is a challenging task, especially in the case of heterogeneous systems. It is error prone when installing and updating software packages in a sophistica... Managing software packages in a scientific computing environment is a challenging task, especially in the case of heterogeneous systems. It is error prone when installing and updating software packages in a sophisticated computing environment. Testing and performance evaluation in an on-the-fly manner is also a troublesome task for a production system. In this paper, we discuss a package management scheme based on containers. The newly developed method can ease the maintenance complexity and reduce human mistakes. We can benefit from the self-containing and isolation features of container technologies for maintaining the software packages among intricately connected clusters. By deploying the Super Computing application Strore(SCStore) over the WAN connected world-largest clusters, it proved that it can greatly reduce the effort for maintaining the consistency of software environment and bring benefit to achieve automation. 展开更多
关键词 high performance computing package management container hybrid system
原文传递
A Fully Pipelined Probability Density Function Engine for Gaussian Copula Model 被引量:1
3
作者 Huabin Ruan Xiaomeng Huang +1 位作者 Haohuan Fu guangwen yang 《Tsinghua Science and Technology》 SCIE EI CAS 2014年第2期195-202,共8页
The Gaussian Copula Probability Density Function (PDF) plays an important role in the fields of finance, hydrological modeling, biomedical study, and texture retrieval. However, the existing schemes for evaluating t... The Gaussian Copula Probability Density Function (PDF) plays an important role in the fields of finance, hydrological modeling, biomedical study, and texture retrieval. However, the existing schemes for evaluating the Gaussian Copula PDF are all computationally-demanding and generally the most time-consuming part in the corresponding applications. In this paper, we propose an FPGA-based design to accelerate the computation of the Gaussian Copula PDF. Specifically, the evaluation of the Gaussian Copula PDF is mapped into a fully-pipelined FPGA dataflow engine by using three optimization steps: transforming the calculation pattern, eliminating constant computations from hardware logic, and extending calculations to multiple pipelines. In the experiments on 10 typical large-scale data sets, our FPGA-based solution shows a maximum of 1870 times speedup over a well-tuned single- core CPU-based solution, and 610 times speedup over a well-optimized parallel quad-core CPU-based solution when processing two-dimensional data. 展开更多
关键词 Gaussian Copula probability density function FPGA PIPELINE OPTIMIZATION
原文传递
Interpolation oriented parallel communication to optimize coupling in earth system modeling
4
作者 Yingsheng JI Yingzhuo ZHANG guangwen yang 《Frontiers of Computer Science》 SCIE EI CSCD 2014年第4期693-708,共16页
Complicated global climate problems trigger researchers from different scientific disciplines to link multiphysics simulations called models for integrated modeling of climate changes by using a software framework cal... Complicated global climate problems trigger researchers from different scientific disciplines to link multiphysics simulations called models for integrated modeling of climate changes by using a software framework called earth system modeling (ESM). As its critical component, coupler is in charge of connections and interactions among models. With the advance of next-generation models, greater data transfer volume and higher coupling frequency are expected to put heavy performance burden on coupler. High efficient coupling techniques are required. In this paper, we propose the sub-domain mapping method to improve the parallel coupling consisted of data transfer and data transformation. By using one specific interpolation oriented communication routing, the communication operations that are originally decentralized in various steps can be combined together for execution. This can reduce the redundant communications and the entailed synchronization costs. The tests on the Tianhe-lA (TH-1A) supercomputer show that our method can achieve 1.1 to 4.9 fold performance improve- ments. We also present further optimization solution for the multi-interpolation cases. The test results show that our method can achieve up to 3.4 fold speedup over the original coupling execution of the current climate system. 展开更多
关键词 COUPLER communication optimization coupling performance ESM
原文传递
SwiftArray: Accelerating Queries on Multidimensional Arrays
5
作者 Yifeng Geng Xiaomeng Huang guangwen yang 《Tsinghua Science and Technology》 SCIE EI CAS 2014年第5期521-530,共10页
Scientific instruments and simulation programs are generating large amounts of multidimensional array data. Queries with value and dimension subsetting conditions are commonly used by scientists to find useful informa... Scientific instruments and simulation programs are generating large amounts of multidimensional array data. Queries with value and dimension subsetting conditions are commonly used by scientists to find useful information from big array data, and data storage and indexing methods play an important role in supporting queries on multidimensional array data efficiently. In this paper, we propose SwiftArray, a new storage layout with indexing techniques to accelerate queries with value and dimension subsetting conditions. In SwiftArray, the multidimensional array is divided into blocks and each block stores sorted values. Blocks are placed in the order of a Hilbert space-filling curve to improve data locality for dimension subsetting queries. We propose a 2-D-Bin method to build an index for the blocks' value ranges, which is an efficient way to avoid accessing unnecessary blocks for value subsetting queries. Our evaluations show that SwiftArray surpasses the NetCDF-4 format and FastBit indexing technique for queries on multidimensional arrays. 展开更多
关键词 multidimensional array INDEXING space-filling curve
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部