期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
Optimizing Memory Access Efficiency in CUDA Kernel via Data Layout Technique
1
作者 Neda Seifi Abdullah Al-Mamun 《Journal of Computer and Communications》 2024年第5期124-139,共16页
Over the past decade, Graphics Processing Units (GPUs) have revolutionized high-performance computing, playing pivotal roles in advancing fields like IoT, autonomous vehicles, and exascale computing. Despite these adv... Over the past decade, Graphics Processing Units (GPUs) have revolutionized high-performance computing, playing pivotal roles in advancing fields like IoT, autonomous vehicles, and exascale computing. Despite these advancements, efficiently programming GPUs remains a daunting challenge, often relying on trial-and-error optimization methods. This paper introduces an optimization technique for CUDA programs through a novel Data Layout strategy, aimed at restructuring memory data arrangement to significantly enhance data access locality. Focusing on the dynamic programming algorithm for chained matrix multiplication—a critical operation across various domains including artificial intelligence (AI), high-performance computing (HPC), and the Internet of Things (IoT)—this technique facilitates more localized access. We specifically illustrate the importance of efficient matrix multiplication in these areas, underscoring the technique’s broader applicability and its potential to address some of the most pressing computational challenges in GPU-accelerated applications. Our findings reveal a remarkable reduction in memory consumption and a substantial 50% decrease in execution time for CUDA programs utilizing this technique, thereby setting a new benchmark for optimization in GPU computing. 展开更多
关键词 Data Layout Optimization CUDA Performance Optimization GPU Memory Optimization Dynamic Programming Matrix Multiplication Memory Access Pattern Optimization in CUDA
下载PDF
Research on optimization of virtual machine memory access based on NUMA architecture 被引量:2
2
作者 He Mujun Zheng Linjiang +2 位作者 Yang Kai Liu Runfeng Liu Weining 《High Technology Letters》 EI CAS 2021年第4期347-356,共10页
With the rapid development of big data and artificial intelligence(AI),the cloud platform architecture system is constantly developing,optimizing,and improving.As such,new applications,like deep computing and high-per... With the rapid development of big data and artificial intelligence(AI),the cloud platform architecture system is constantly developing,optimizing,and improving.As such,new applications,like deep computing and high-performance computing,require enhanced computing power.To meet this requirement,a non-uniform memory access(NUMA)configuration method is proposed for the cloud computing system according to the affinity,adaptability,and availability of the NUMA architecture processor platform.The proposed method is verified based on the test environment of a domestic central processing unit(CPU). 展开更多
关键词 cloud computing VIRTUALIZATION non-uniform memory access(NUMA)virtual machine memory access optimization
下载PDF
Cross-layer resource allocation based on equivalent bandwidth in OFDMA systems
3
作者 Su Pan Cheng Li +1 位作者 Sheng Zhang Danwei Chen 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2016年第4期754-762,共9页
A quality of service(QoS) guaranteed cross-layer resource allocation algorithm with physical layer, medium access control(MAC) layer and call admission control(CAC) considered simultaneously is proposed for the ... A quality of service(QoS) guaranteed cross-layer resource allocation algorithm with physical layer, medium access control(MAC) layer and call admission control(CAC) considered simultaneously is proposed for the full IP orthogonal frequency division multiple access(OFDMA) communication system, which can ensure the quality of multimedia services in full IP networks.The algorithm converts the physical layer resources such as subcarriers, transmission power, and the QoS metrics into equivalent bandwidth which can be distributed by the base station in all three layers. By this means, the QoS requirements in terms of bit error rate(BER), transmission delay and dropping probability can be guaranteed by the cross-layer optimal equivalent bandwidth allocation. The numerical results show that the proposed algorithm has higher spectrum efficiency compared to the existing systems. 展开更多
关键词 resource allocation equivalent bandwidth crosslayer optimization orthogonal frequency division multiple access(OFDMA)
下载PDF
Improving access to urban parks through public transit optimization
4
作者 Ning Xu Kaidan Guan Pu Wang 《Frontiers of Architectural Research》 CSCD 2024年第3期575-592,共18页
This study establishes an evaluation and optimization framework for the public transit network based on social network analysis and a greedy algorithm,aiming to explore a quantitative approach to improving access to u... This study establishes an evaluation and optimization framework for the public transit network based on social network analysis and a greedy algorithm,aiming to explore a quantitative approach to improving access to urban parks through public transit optimization.Social network analysis and the ArcGIS platform are used to build a public transit network model within Nanjing Old City and analyze its overall network structure characteristics.The study also focuses on a method to improve the convenience of reaching regional and citylevel parks by public transit by increasing access and connecting points accordingly.A greedy algorithm is introduced to generate an optimized solution for improving public transit accessibility to regional and city-level parks,consequently enhancing their utilization.The major findings include:(1)The greedy algorithm effectively enhances the performance of the public transit network,but its benefits gradually diminish as more stations are added.(2)Strategically adding stations enhances the performance of most public transit access points,creating efficient pathways for other stations to directly reach these access points and enter regional and city-level parks.(3)The optimized public transit network model offers guidance for the planning and layout of regional and city-level parks.The site selection for new parks should prioritize establishing connections with the“hubs”in the public transit network.The proposed optimization of the public transit network in this study is specific to a single type of urban park,but subsequent research could be conducted to extend the optimization of public transit accessibility around more urban public resources. 展开更多
关键词 Urban parks Social network analysis Accessibility optimization Public transit network Greedy algorithm
原文传递
Memory access optimization for particle operations in computational fluid dynamics-discrete element method simulations
5
作者 Deepthi Vaidhynathan Hariswaran Sitaraman +3 位作者 Ray Grout Thomas Hauser Christine M.Hrenya Jordan Musser 《Particuology》 SCIE EI CAS CSCD 2023年第7期97-110,共14页
Computational Fluid Dynamics-Discrete Element Method is used to model gas-solid systems in several applications in energy,pharmaceutical and petrochemical industries.Computational performance bot-tlenecks often limit ... Computational Fluid Dynamics-Discrete Element Method is used to model gas-solid systems in several applications in energy,pharmaceutical and petrochemical industries.Computational performance bot-tlenecks often limit the problem sizes that can be simulated at industrial scale.The data structures used to store several millions of particles in such large-scale simulations have a large memory footprint that does not fit into the processor cache hierarchies on current high-performance-computing platforms,leading to reduced computational performance.This paper specifically addresses this aspect of memory access bottlenecks in industrial scale simulations.The use of space-flling curves to improve memory access patterns is described and their impact on computational performance is quantified in both shared and distributed memory parallelization paradigms.The Morton space flling curve applied to uniform grids and k-dimensional tree partitions are used to reorder the particle data-structure thus improving spatial and temporal locality in memory.The performance impact of these techniques when applied to two benchmark problems,namely the homogeneous-cooling-system and a fluidized-bed,are presented.These optimization techniques lead to approximately two-fold performance improvement in particle focused operations such as neighbor-list creation and data-exchange,with~1.5 times overall improvement in a fluidization simulation with 1.27 million particles. 展开更多
关键词 CFD-DEM Memory access optimization Spatial reordering Performance optimization
原文传递
Memory Access Optimization of Molecular Dynamics Simulation Software Crystal-MD on Sunway Taihu Light
6
作者 Jianjiang Li Jie Lin +2 位作者 Panpan Du Kai Zhang Jie Wu 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2021年第3期296-308,共13页
The radiation damage effect of key structural materials is one of the main research subjects of the numerical reactor.From the perspective of experimental safety and feasibility,Molecular Dynamics(MD)in the materials ... The radiation damage effect of key structural materials is one of the main research subjects of the numerical reactor.From the perspective of experimental safety and feasibility,Molecular Dynamics(MD)in the materials field is an ideal method for simulating the radiation damage of structural materials.The Crystal-MD represents a massive parallel MD simulation software based on the key material characteristics of reactors.Compared with the Large-scale Atomic/Molecurlar Massively Parallel Simulator(LAMMPS)and ITAP Molecular Dynamics(IMD)software,the Crystal-MD reduces the memory required for software operation to a certain extent,but it is very time-consuming.Moreover,the calculation results of the Crystal-MD have large deviations,and there are also some problems,such as memory limitation and frequent communication during its migration and optimization.In this paper,in order to solve the above problems,the memory access mode of the Crystal-MD software is studied.Based on the memory access mode,a memory access optimization strategy is proposed for a unique architecture of China’s supercomputer Sunway Taihu Light.The proposed optimization strategy is verified by the experiments,and experimental results show that the running speed of the Crystal-MD is increased significantly by using the proposed optimization strategy. 展开更多
关键词 molecular dynamics simulation Crystal-MD Sunway Taihu Light memory access optimization
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部