期刊文献+
共找到4,181篇文章
< 1 2 210 >
每页显示 20 50 100
Optimization Techniques for GPU-Based Parallel Programming Models in High-Performance
1
作者 Shuntao Tang Wei Chen 《信息工程期刊(中英文版)》 2024年第1期7-11,共5页
This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from g... This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology. 展开更多
关键词 Optimization Techniques GPU-Based parallel programming Models High-Performance Computing
下载PDF
Parallel Image Processing: Taking Grayscale Conversion Using OpenMP as an Example
2
作者 Bayan AlHumaidan Shahad Alghofaily +2 位作者 Maitha Al Qhahtani Sara Oudah Naya Nagy 《Journal of Computer and Communications》 2024年第2期1-10,共10页
In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl... In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks. 展开更多
关键词 parallel Computing Image Processing OPENMP parallel programming High Performance Computing GPU (Graphic Processing Unit)
下载PDF
Energy-efficient task allocation for reliable parallel computation of cluster-based wireless sensor network in edge computing
3
作者 Jiabao Wen Jiachen Yang +2 位作者 Tianying Wang Yang Li Zhihan Lv 《Digital Communications and Networks》 SCIE CSCD 2023年第2期473-482,共10页
To efficiently complete a complex computation task,the complex task should be decomposed into subcomputation tasks that run parallel in edge computing.Wireless Sensor Network(WSN)is a typical application of parallel c... To efficiently complete a complex computation task,the complex task should be decomposed into subcomputation tasks that run parallel in edge computing.Wireless Sensor Network(WSN)is a typical application of parallel computation.To achieve highly reliable parallel computation for wireless sensor network,the network's lifetime needs to be extended.Therefore,a proper task allocation strategy is needed to reduce the energy consumption and balance the load of the network.This paper proposes a task model and a cluster-based WSN model in edge computing.In our model,different tasks require different types of resources and different sensors provide different types of resources,so our model is heterogeneous,which makes the model more practical.Then we propose a task allocation algorithm that combines the Genetic Algorithm(GA)and the Ant Colony Optimization(ACO)algorithm.The algorithm concentrates on energy conservation and load balancing so that the lifetime of the network can be extended.The experimental result shows the algorithm's effectiveness and advantages in energy conservation and load balancing. 展开更多
关键词 Wireless sensor network parallel computation task allocation Genetic algorithm Ant colony optimization algorithm ENERGY-EFFICIENT Load balancing
下载PDF
Hadoop-based secure storage solution for big data in cloud computing environment
4
作者 Shaopeng Guan Conghui Zhang +1 位作者 Yilin Wang Wenqing Liu 《Digital Communications and Networks》 SCIE CSCD 2024年第1期227-236,共10页
In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose... In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average. 展开更多
关键词 Big data security data encryption HADOOP parallel encrypted storage Zookeeper
下载PDF
Fortifying Healthcare Data Security in the Cloud:A Comprehensive Examination of the EPM-KEA Encryption Protocol
5
作者 Umi Salma Basha Shashi Kant Gupta +2 位作者 Wedad Alawad SeongKi Kim Salil Bharany 《Computers, Materials & Continua》 SCIE EI 2024年第5期3397-3416,共20页
A new era of data access and management has begun with the use of cloud computing in the healthcare industry.Despite the efficiency and scalability that the cloud provides, the security of private patient data is stil... A new era of data access and management has begun with the use of cloud computing in the healthcare industry.Despite the efficiency and scalability that the cloud provides, the security of private patient data is still a majorconcern. Encryption, network security, and adherence to data protection laws are key to ensuring the confidentialityand integrity of healthcare data in the cloud. The computational overhead of encryption technologies could leadto delays in data access and processing rates. To address these challenges, we introduced the Enhanced ParallelMulti-Key Encryption Algorithm (EPM-KEA), aiming to bolster healthcare data security and facilitate the securestorage of critical patient records in the cloud. The data was gathered from two categories Authorization forHospital Admission (AIH) and Authorization for High Complexity Operations.We use Z-score normalization forpreprocessing. The primary goal of implementing encryption techniques is to secure and store massive amountsof data on the cloud. It is feasible that cloud storage alternatives for protecting healthcare data will become morewidely available if security issues can be successfully fixed. As a result of our analysis using specific parametersincluding Execution time (42%), Encryption time (45%), Decryption time (40%), Security level (97%), and Energyconsumption (53%), the system demonstrated favorable performance when compared to the traditional method.This suggests that by addressing these security concerns, there is the potential for broader accessibility to cloudstorage solutions for safeguarding healthcare data. 展开更多
关键词 Cloud computing healthcare data security enhanced parallel multi-key encryption algorithm(EPM-KEA)
下载PDF
VARIABLE-DRIVEN AND-PARALLELISM
6
作者 李春林 《Journal of Southeast University(English Edition)》 EI CAS 1991年第2期1-6,共6页
A Variable-driven model of AND-parallelism of logic programs isprcscntcd.It statically analyses the values of variables in clauses and picks out the varia.blcs contributing to the parallel execution and then generates... A Variable-driven model of AND-parallelism of logic programs isprcscntcd.It statically analyses the values of variables in clauses and picks out the varia.blcs contributing to the parallel execution and then generates the variable-driving graphsfor clauses.According to the variable-driving graph and the analysis of the instantiationsof variables at run,literals are driven to execute.With binding conflicts of shared variablesprevented,the variable-driven model fully develops the AND-parallelism.Based on thevariable-driving graph,somc models of AND-parallelism already put forward can beavailable if cquipcd with appropriate driving algorithms. 展开更多
关键词 parallel processing algorithm programming languages/logic programming
下载PDF
Performance Evaluation of Quicksort with GPU Dynamic Parallelism for Gene-Expression Quantile Normalization
7
作者 Roberto Pinto Souto Carla Osthoff +2 位作者 Douglas Augusto Oswaldo Trelles Ana Tereza Ribeiro de Vasconcelos 《通讯和计算机(中英文版)》 2013年第12期1522-1528,共7页
关键词 快速排序算法 基因表达数据 并行实现 GPU 绩效评估 位数 现代分子生物学 寡核苷酸微阵列
下载PDF
High Performance Motion Estimation Operator Using Multimedia Oriented Subword Parallelism
8
作者 Shafqat Khan Emmanuel Casseau 《通讯和计算机(中英文版)》 2012年第1期1-14,共14页
关键词 多媒体应用 并行处理 计算单位 运动估计 性能 资源利用率 经营单位 SWP
下载PDF
Task Scheduling of Data-Parallel Applications on HSA Platform
9
作者 Zhenshan Bao Chong Chen Wenbo Zhang 《国际计算机前沿大会会议论文集》 2018年第1期35-35,共1页
下载PDF
An Imbalanced Dataset and Class Overlapping Classification Model for Big Data 被引量:1
10
作者 Mini Prince P.M.Joe Prathap 《Computer Systems Science & Engineering》 SCIE EI 2023年第2期1009-1024,共16页
Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imba... Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imbalance arises.When dealing with large datasets,most traditional classifiers are stuck in the local optimum problem.As a result,it’s necessary to look into new methods for dealing with large data collections.Several solutions have been proposed for overcoming this issue.The rapid growth of the available data threatens to limit the usefulness of many traditional methods.Methods such as oversampling and undersampling have shown great promises in addressing the issues of class imbalance.Among all of these techniques,Synthetic Minority Oversampling TechniquE(SMOTE)has produced the best results by generating synthetic samples for the minority class in creating a balanced dataset.The issue is that their practical applicability is restricted to problems involving tens of thousands or lower instances of each.In this paper,we have proposed a parallel mode method using SMOTE and MapReduce strategy,this distributes the operation of the algorithm among a group of computational nodes for addressing the aforementioned problem.Our proposed solution has been divided into three stages.Thefirst stage involves the process of splitting the data into different blocks using a mapping function,followed by a pre-processing step for each mapping block that employs a hybrid SMOTE algo-rithm for solving the class imbalanced problem.On each map block,a decision tree model would be constructed.Finally,the decision tree blocks would be com-bined for creating a classification model.We have used numerous datasets with up to 4 million instances in our experiments for testing the proposed scheme’s cap-abilities.As a result,the Hybrid SMOTE appears to have good scalability within the framework proposed,and it also cuts down the processing time. 展开更多
关键词 Imbalanced dataset class overlapping SMOTE MAPREDUCE parallel programming OVERSAMPLING
下载PDF
Parallel Control for Optimal Tracking via Adaptive Dynamic Programming 被引量:20
11
作者 Jingwei Lu Qinglai Wei Fei-Yue Wang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2020年第6期1662-1674,共13页
This paper studies the problem of optimal parallel tracking control for continuous-time general nonlinear systems.Unlike existing optimal state feedback control,the control input of the optimal parallel control is int... This paper studies the problem of optimal parallel tracking control for continuous-time general nonlinear systems.Unlike existing optimal state feedback control,the control input of the optimal parallel control is introduced into the feedback system.However,due to the introduction of control input into the feedback system,the optimal state feedback control methods can not be applied directly.To address this problem,an augmented system and an augmented performance index function are proposed firstly.Thus,the general nonlinear system is transformed into an affine nonlinear system.The difference between the optimal parallel control and the optimal state feedback control is analyzed theoretically.It is proven that the optimal parallel control with the augmented performance index function can be seen as the suboptimal state feedback control with the traditional performance index function.Moreover,an adaptive dynamic programming(ADP)technique is utilized to implement the optimal parallel tracking control using a critic neural network(NN)to approximate the value function online.The stability analysis of the closed-loop system is performed using the Lyapunov theory,and the tracking error and NN weights errors are uniformly ultimately bounded(UUB).Also,the optimal parallel controller guarantees the continuity of the control input under the circumstance that there are finite jump discontinuities in the reference signals.Finally,the effectiveness of the developed optimal parallel control method is verified in two cases. 展开更多
关键词 Adaptive dynamic programming(ADP) nonlinear optimal control parallel controller parallel control theory parallel system tracking control neural network(NN)
下载PDF
PDP: Parallel Dynamic Programming 被引量:15
12
作者 Fei-Yue Wang Jie Zhang +2 位作者 Qinglai Wei Xinhu Zheng Li Li 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2017年第1期1-5,共5页
Deep reinforcement learning is a focus research area in artificial intelligence. The principle of optimality in dynamic programming is a key to the success of reinforcement learning methods. The principle of adaptive ... Deep reinforcement learning is a focus research area in artificial intelligence. The principle of optimality in dynamic programming is a key to the success of reinforcement learning methods. The principle of adaptive dynamic programming(ADP)is first presented instead of direct dynamic programming(DP),and the inherent relationship between ADP and deep reinforcement learning is developed. Next, analytics intelligence, as the necessary requirement, for the real reinforcement learning, is discussed. Finally, the principle of the parallel dynamic programming, which integrates dynamic programming and analytics intelligence, is presented as the future computational intelligence. 展开更多
关键词 parallel dynamic programming Dynamic programming Adaptive dynamic programming Reinforcement learning Deep learning Neural networks Artificial intelligence
下载PDF
Efficient Task Completion for Parallel Offloading in Vehicular Fog Computing 被引量:5
13
作者 Jindou Xie Yunjian Jia +2 位作者 Zhengchuan Chen Zhaojun Nan Liang Liang 《China Communications》 SCIE CSCD 2019年第11期42-55,共14页
In this paper,we investigate vehicular fog computing system and develop an effective parallel offloading scheme.The service time,that addresses task offloading delay,task decomposition and handover cost,is adopted as ... In this paper,we investigate vehicular fog computing system and develop an effective parallel offloading scheme.The service time,that addresses task offloading delay,task decomposition and handover cost,is adopted as the metric of offloading performance.We propose an available resource-aware based parallel offloading scheme,which decides target fog nodes by RSU for computation offloading jointly considering effect of vehicles mobility and time-varying computation capability.Based on Hidden Markov model and Markov chain theories,proposed scheme effectively handles the imperfect system state information for fog nodes selection by jointly achieving mobility awareness and computation perception.Simulation results are presented to corroborate the theoretical analysis and validate the effectiveness of the proposed algorithm. 展开更多
关键词 parallel OFFLOADING vehicular FOG COMPUTING task OFFLOADING HMM
下载PDF
Parallel Computing of a Variational Data Assimilation Model for GPS/MET Observation Using the Ray-Tracing Method 被引量:5
14
作者 张昕 刘月巍 +1 位作者 王斌 季仲贞 《Advances in Atmospheric Sciences》 SCIE CAS CSCD 2004年第2期220-226,共7页
The Spectral Statistical Interpolation (SSI) analysis system of NCEP is used to assimilate meteorological data from the Global Positioning Satellite System (GPS/MET) refraction angles with the variational technique. V... The Spectral Statistical Interpolation (SSI) analysis system of NCEP is used to assimilate meteorological data from the Global Positioning Satellite System (GPS/MET) refraction angles with the variational technique. Verified by radiosonde, including GPS/MET observations into the analysis makes an overall improvement to the analysis variables of temperature, winds, and water vapor. However, the variational model with the ray-tracing method is quite expensive for numerical weather prediction and climate research. For example, about 4 000 GPS/MET refraction angles need to be assimilated to produce an ideal global analysis. Just one iteration of minimization will take more than 24 hours CPU time on the NCEP's Cray C90 computer. Although efforts have been taken to reduce the computational cost, it is still prohibitive for operational data assimilation. In this paper, a parallel version of the three-dimensional variational data assimilation model of GPS/MET occultation measurement suitable for massive parallel processors architectures is developed. The divide-and-conquer strategy is used to achieve parallelism and is implemented by message passing. The authors present the principles for the code's design and examine the performance on the state-of-the-art parallel computers in China. The results show that this parallel model scales favorably as the number of processors is increased. With the Memory-IO technique implemented by the author, the wall clock time per iteration used for assimilating 1420 refraction angles is reduced from 45 s to 12 s using 1420 processors. This suggests that the new parallelized code has the potential to be useful in numerical weather prediction (NWP) and climate studies. 展开更多
关键词 parallel computing variational data assimilation GPS/MET
下载PDF
Scheduling Step-Deteriorating Jobs on Parallel Machines by Mixed Integer Programming 被引量:4
15
作者 郭鹏 程文明 +1 位作者 曾鸣 梁剑 《Journal of Donghua University(English Edition)》 EI CAS 2015年第5期709-714,719,共7页
Production scheduling has a major impact on the productivity of the manufacturing process. Recently, scheduling problems with deteriorating jobs have attracted increasing attentions from researchers. In many practical... Production scheduling has a major impact on the productivity of the manufacturing process. Recently, scheduling problems with deteriorating jobs have attracted increasing attentions from researchers. In many practical situations,it is found that some jobs fail to be processed prior to the pre-specified thresholds,and they often consume extra deteriorating time for successful accomplishment. Their processing times can be characterized by a step-wise function. Such kinds of jobs are called step-deteriorating jobs. In this paper,parallel machine scheduling problem with stepdeteriorating jobs( PMSD) is considered. Due to its intractability,four different mixed integer programming( MIP) models are formulated for solving the problem under consideration. The study aims to investigate the performance of these models and find promising optimization formulation to solve the largest possible problem instances. The proposed four models are solved by commercial software CPLEX. Moreover,the near-optimal solutions can be obtained by black-box local-search solver LocalS olver with the fourth one. The computational results show that the efficiencies of different MIP models depend on the distribution intervals of deteriorating thresholds, and the performance of LocalS olver is clearly better than that of CPLEX in terms of the quality of the solutions and the computational time. 展开更多
关键词 parallel machine step-deterioration mixed integer programming(MIP) scheduling models total completion time
下载PDF
An Improved Hilbert Curve for Parallel Spatial Data Partitioning 被引量:7
16
作者 MENG Lingkui HUANG Changqing ZHAO Chunyu LIN Zhiyong 《Geo-Spatial Information Science》 2007年第4期282-286,共5页
一条新奇 Hilbert 曲线为划分的平行空间数据被介绍,与空间信息和向量数据项的可变长度的特征的巨大数量的性质的考虑。基于改进 Hilbert 弯曲,算法能被设计空间数据在平行空间数据库在多重磁盘之中划分完成几乎制服。因此,数据不平... 一条新奇 Hilbert 曲线为划分的平行空间数据被介绍,与空间信息和向量数据项的可变长度的特征的巨大数量的性质的考虑。基于改进 Hilbert 弯曲,算法能被设计空间数据在平行空间数据库在多重磁盘之中划分完成几乎制服。因此,数据不平衡的现象能显著地被避免,搜索和询问效率能被提高。 展开更多
关键词 并行空间数据库 数据划分算法 数据不均衡 希耳伯特曲线
下载PDF
Multi-task Coalition Parallel Formation Strategy Based on Reinforcement Learning 被引量:6
17
作者 JIANG Jian-Guo SU Zhao-Pin +1 位作者 QI Mei-Bin ZHANG Guo-Fu 《自动化学报》 EI CSCD 北大核心 2008年第3期349-352,共4页
代理人联盟是代理人协作和合作的一种重要方式。形成一个联盟,代理人能提高他们的能力解决问题并且获得更多的实用程序。在这份报纸,新奇多工联盟平行形成策略被介绍,并且多工联盟形成的过程是一个 Markov 决定过程的结论理论上被证... 代理人联盟是代理人协作和合作的一种重要方式。形成一个联盟,代理人能提高他们的能力解决问题并且获得更多的实用程序。在这份报纸,新奇多工联盟平行形成策略被介绍,并且多工联盟形成的过程是一个 Markov 决定过程的结论理论上被证明。而且,学习的加强被用来解决多工联盟平行的代理人行为策略,和这个过程形成被描述。在多工面向的领域,策略罐头有效地并且平行形式多工联盟。 展开更多
关键词 强化学习 多任务合并 平行排列 马尔可夫决策过程
下载PDF
Assigning Task by Parallel Genetic Algorithm Based on PVM 被引量:1
18
作者 Zheng Zhi jun, Dong Xiao she, Zheng Shou qi Department of Computer Science and Technology,Xi’an Jiaotong University,Xi’an 710049,China 《Wuhan University Journal of Natural Sciences》 CAS 2001年第Z1期579-584,共6页
Genetic algorithm has been proposed to solve the problem of task assignment. However, it has some drawbacks, e.g., it often takes a long time to find an optimal solution, and the success rate is low. To overcome these... Genetic algorithm has been proposed to solve the problem of task assignment. However, it has some drawbacks, e.g., it often takes a long time to find an optimal solution, and the success rate is low. To overcome these problems, a new coarse grained parallel genetic algorithm with the scheme of central migration is presented, which exploits isolated sub populations. The new approach has been implemented in the PVM environment and has been evaluated on a workstation network for solving the task assignment problem. The results show that it not only significantly improves the result quality but also increases the speed for getting best solution. 展开更多
关键词 task assignment genetic algorithm parallel process PVM
下载PDF
A Distributed Algorithm for Parallel Multi-task Allocation Based on Profit Sharing Learning 被引量:7
19
作者 SU Zhao-Pin JIANG Jian-Guo +1 位作者 LIANG Chang-Yong ZHANG Guo-Fu 《自动化学报》 EI CSCD 北大核心 2011年第7期865-872,共8页
经由联盟形成的任务分配是在多代理人系统(妈) 的几应用程序域的基本研究挑战,例如资源分配,灾难反应管理等等。怎么以一种分布式的方式分配许多未解决的任务到一些代理人,主要处理。在这篇论文,我们在自我组织、自我学习的代理人... 经由联盟形成的任务分配是在多代理人系统(妈) 的几应用程序域的基本研究挑战,例如资源分配,灾难反应管理等等。怎么以一种分布式的方式分配许多未解决的任务到一些代理人,主要处理。在这篇论文,我们在自我组织、自我学习的代理人之中建议一个分布式的平行多工分配算法。处理状况,我们在二维的房间地理上驱散代理人和任务,然后介绍为寻找它的任务由的一个单个代理人的分享学习的利润(PSL ) 不断自我学习。我们也在代理人之中为通讯和协商介绍策略分配真实工作量到每个 tasked 代理人。最后,评估建议算法的有效性,我们把它与 Shehory 和 Krau 被许多研究人员在最近的年里讨论的分布式的任务分配算法作比较。试验性的结果证明建议算法罐头快速为每项任务形成一个解决的联盟。而且,建议算法罐头明确地告诉我们每个 tasked 代理人的真实工作量,并且能因此为实际控制任务提供一本特定、重要的参考书。 展开更多
关键词 自动化系统 自动化技术 ICA 数据处理
下载PDF
Programming for scientific computing on peta-scale heterogeneous parallel systems 被引量:1
20
作者 杨灿群 吴强 +2 位作者 唐滔 王锋 薛京灵 《Journal of Central South University》 SCIE EI CAS 2013年第5期1189-1203,共15页
Peta-scale high-performance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to co... Peta-scale high-performance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to conduct computational experiments of historic significance, these systems are presently difficult to program. The users, who are domain experts rather than computer experts, prefer to use programming models closer to their domains (e.g., physics and biology) rather than MPI and OpenMP. This has led the development of domain-specific programming that provides domain-specific programming interfaces but abstracts away some performance-critical architecture details. Based on experience in designing large-scale computing systems, a hybrid programming framework for scientific computing on heterogeneous architectures is proposed in this work. Its design philosophy is to provide a collaborative mechanism for domain experts and computer experts so that both domain-specific knowledge and performance-critical architecture details can be adequately exploited. Two real-world scientific applications have been evaluated on TH-1A, a peta-scale CPU-GPU heterogeneous system that is currently the 5th fastest supercomputer in the world. The experimental results show that the proposed framework is well suited for developing large-scale scientific computing applications on peta-scale heterogeneous CPU/GPU systems. 展开更多
关键词 计算系统 科学应用 异构系统 PETA 编程模型 并行系统 超级计算机 领域专家
下载PDF
上一页 1 2 210 下一页 到第
使用帮助 返回顶部