Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at t...Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at this problem,a parallelization approach was proposed with six memory optimization schemes for CG,four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20,the parallelization approach can reach up to 21 and 133 times speedups with size A and B,respectively,compared with single power processor element. Finally,the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV,simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores.展开更多
Abstract In this paper, we introduce several on-going research projects to support parallel and distribut,ed computing on heterogeneous networks of workstations (NOW) in the High Performance Computing and Software Lah...Abstract In this paper, we introduce several on-going research projects to support parallel and distribut,ed computing on heterogeneous networks of workstations (NOW) in the High Performance Computing and Software Lahoratory at the University of Texas at San Antonio. The projects at aiming at addressing three technical issues. First, the factors of heterogeneity and time-sharing effects make traditional performance models/metrics for homogeneous computing performance measurement and evaluation not. suitable for bet.erogeneous computing. We develop practical models and metrics which quantify. the heterogeneity of networks and characterize the performance effects. Second, in order to perform parallel computation effectively, special system support is necessary. We are developing system schemes for heterogeneity management, process scheduling and efficient communications. Finally, to provide insight into system performance, we are developing two types of supporting tools : a graphical instrumentation monitor to aid users in investigating performance problems and in determining the most effective way of exploiting the NOW systems, and a trace-driven simulator to test and compare different system management and scheduling schemes.展开更多
The Extensible Markup Language(XML)files,widely used for storing and exchanging information on the web require efficient parsing mechanisms to improve the performance of the applications.With the existing Document Obj...The Extensible Markup Language(XML)files,widely used for storing and exchanging information on the web require efficient parsing mechanisms to improve the performance of the applications.With the existing Document Object Model(DOM)based parsing,the performance degrades due to sequential processing and large memory requirements,thereby requiring an efficient XML parser to mitigate these issues.In this paper,we propose a Parallel XML Tree Generator(PXTG)algorithm for accelerating the parsing of XML files and a Regression-based XML Parsing Framework(RXPF)that analyzes and predicts performance through profiling,regression,and code generation for efficient parsing.The PXTG algorithm is based on dividing the XML file into n parts and producing n trees in parallel.The profiling phase of the RXPF framework produces a dataset by measuring the performance of various parsing models including StAX,SAX,DOM,JDOM,and PXTG on different cores by using multiple file sizes.The regression phase produces the prediction model,based on which the final code for efficient parsing of XML files is produced through the code generation phase.The RXPF framework has shown a significant improvement in performance varying from 9.54%to 32.34%over other existing models used for parsing XML files.展开更多
Peta-scale high-perfomlance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to co...Peta-scale high-perfomlance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to conduct computational experiments of historic significance, these systems are presently difficult to program. The users, who are domain experts rather than computer experts, prefer to use programming models closer to their domains (e.g., physics and biology) rather than MPI and OpenME This has led the development of domain-specific programming that provides domain-specific programming interfaces but abstracts away some performance-critical architecture details. Based on experience in designing large-scale computing systems, a hybrid programming framework for scientific computing on heterogeneous architectures is proposed in this work. Its design philosophy is to provide a collaborative mechanism for domain experts and computer experts so that both domain-specific knowledge and performance-critical architecture details can be adequately exploited. Two real-world scientific applications have been evaluated on TH-IA, a peta-scale CPU-GPU heterogeneous system that is currently the 5th fastest supercomputer in the world. The experimental results show that the proposed framework is well suited for developing large-scale scientific computing applications on peta-scale heterogeneous CPU/GPU systems.展开更多
This paper reviews task scheduling frameworks,methods,and evaluation metrics of central processing unit-graphics processing unit(CPU-GPU)heterogeneous clusters.Task scheduling of CPU-GPU heterogeneous clusters can be ...This paper reviews task scheduling frameworks,methods,and evaluation metrics of central processing unit-graphics processing unit(CPU-GPU)heterogeneous clusters.Task scheduling of CPU-GPU heterogeneous clusters can be carried out on the system level,nodelevel,and device level.Most task-scheduling technologies are heuristic based on the experts’experience,while some technologies are based on statistic methods using machine learning,deep learning,or reinforcement learning.Many metrics have been adopted to evaluate and compare different task scheduling technologies that try to optimize different goals of task scheduling.Although statistic task scheduling has reached fewer research achievements than heuristic task scheduling,the statistic task scheduling still has significant research potential.展开更多
Heterogeneous multicore clusters are becoming more popular for high-performance computing due to their great computing power and cost-to-performance effectiveness nowadays.Nevertheless,parallel efficiency degradation ...Heterogeneous multicore clusters are becoming more popular for high-performance computing due to their great computing power and cost-to-performance effectiveness nowadays.Nevertheless,parallel efficiency degradation is still a problem in large-scale structural analysis based on heterogeneousmulticore clusters.To solve it,a hybrid hierarchical parallel algorithm(HHPA)is proposed on the basis of the conventional domain decomposition algorithm(CDDA)and the parallel sparse solver.In this new algorithm,a three-layer parallelization of the computational procedure is introduced to enable the separation of the communication of inter-nodes,heterogeneous-core-groups(HCGs)and inside-heterogeneous-core-groups through mapping computing tasks to various hardware layers.This approach can not only achieve load balancing at different layers efficiently but can also improve the communication rate significantly through hierarchical communication.Additionally,the proposed hybrid parallel approach in this article can reduce the interface equation size and further reduce the solution time,which can make up for the shortcoming of growing communication overheads with the increase of interface equation size when employing CDDA.Moreover,the distributed sparse storage of a large amount of data is introduced to improve memory access.By solving benchmark instances on the Shenwei-Taihuzhiguang supercomputer,the results show that the proposed method can obtain higher speedup and parallel efficiency compared with CDDA and more superior extensibility of parallel partition compared with the two-level parallel computing algorithm(TPCA).展开更多
In order to improve the concurrent access performance of the web-based spatial computing system in cluster,a parallel scheduling strategy based on the multi-core environment is proposed,which includes two levels of pa...In order to improve the concurrent access performance of the web-based spatial computing system in cluster,a parallel scheduling strategy based on the multi-core environment is proposed,which includes two levels of parallel processing mechanisms.One is that it can evenly allocate tasks to each server node in the cluster and the other is that it can implement the load balancing inside a server node.Based on the strategy,a new web-based spatial computing model is designed in this paper,in which,a task response ratio calculation method,a request queue buffer mechanism and a thread scheduling strategy are focused on.Experimental results show that the new model can fully use the multi-core computing advantage of each server node in the concurrent access environment and improve the average hits per second,average I/O Hits,CPU utilization and throughput.Using speed-up ratio to analyze the traditional model and the new one,the result shows that the new model has the best performance.The performance of the multi-core server nodes in the cluster is optimized;the resource utilization and the parallel processing capabilities are enhanced.The more CPU cores you have,the higher parallel processing capabilities will be obtained.展开更多
The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Co...The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Consequently,the single-core processor cannot meet the requirements of LTE system.This paper analyzes how to use multi-core processors to achieve parallel processing of uplink demodulation and decoding in LTE systems and designs an approach to parallel processing.The test results prove that this approach works quite well.展开更多
Aim To develop a heterogeneous database united system(HDBUS)that combines the local database of Oracle, Sybase and SQL server distributed on different server into a global database,and supports the global transaction...Aim To develop a heterogeneous database united system(HDBUS)that combines the local database of Oracle, Sybase and SQL server distributed on different server into a global database,and supports the global transaction management and parallel query over the Intranet Methods In the designing and implementation of HDBUS two important concepts heterogeneous tables join. Results and Conclu- tion The first concept can be used to process the parallel query of multiple database server, the second one is the key technology of heterogeneous is the key technology of heterogeneous distribute database.展开更多
The particulate discrete element method(DEM) can be employed to capture the response of rock,provided that appropriate bonding models are used to cement the particles to each other.Simulations of laboratory tests are ...The particulate discrete element method(DEM) can be employed to capture the response of rock,provided that appropriate bonding models are used to cement the particles to each other.Simulations of laboratory tests are important to establish the extent to which those models can capture realistic rock behaviors.Hitherto the focus in such comparison studies has either been on homogeneous specimens or use of two-dimensional(2D) models.In situ rock formations are often heterogeneous,thus exploring the ability of this type of models to capture heterogeneous material behavior is important to facilitate their use in design analysis.In situ stress states are basically three-dimensional(3D),and therefore it is important to develop 3D models for this purpose.This paper revisits an earlier experimental study on heterogeneous specimens,of which the relative proportions of weaker material(siltstone) and stronger,harder material(sandstone) were varied in a controlled manner.Using a 3D DEM model with the parallel bond model,virtual heterogeneous specimens were created.The overall responses in terms of variations in strength and stiffness with different percentages of weaker material(siltstone) were shown to agree with the experimental observations.There was also a good qualitative agreement in the failure patterns observed in the experiments and the simulations,suggesting that the DEM data enabled analysis of the initiation of localizations and micro fractures in the specimens.展开更多
Heterogeneous computing is one effective method of high performance computing with many advantages. Task scheduling is a critical issue in heterogeneous environments as well as in homogeneous environments. A number of...Heterogeneous computing is one effective method of high performance computing with many advantages. Task scheduling is a critical issue in heterogeneous environments as well as in homogeneous environments. A number of task scheduling algorithms for homogeneous environments have been proposed, whereas, a few for heterogeneous environments can be found in the literature. A novel task scheduling algorithm for heterogeneous environments, called the heterogeneous critical task (HCT) scheduling algorithm is presented. By means of the directed acyclic graph and the gantt graph, the HCT algorithm defines the critical task and the idle time slot. After determining the critical tasks of a given task, the HCT algorithm tentatively duplicates the critical tasks onto the processor that has the given task in the idle time slot, to reduce the start time of the given task. To compare the performance of the HCT algorithm with several recently proposed algorithms, a large set of randomly generated applications and the Gaussian elimination application are randomly generated. The experimental result has shown that the HCT algorithm outperforms the other algorithm.展开更多
The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive comp...The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive computational costs.To overcome this limitation,a message passing interface(MPI)parallel DEM-IMB-LBM framework is proposed aimed at enhancing computation efficiency.This framework utilises a static domain decomposition scheme,with the entire computation domain being decomposed into multiple subdomains according to predefined processors.A detailed parallel strategy is employed for both contact detection and hydrodynamic force calculation.In particular,a particle ID re-numbering scheme is proposed to handle particle transitions across sub-domain interfaces.Two benchmarks are conducted to validate the accuracy and overall performance of the proposed framework.Subsequently,the framework is applied to simulate scenarios involving multi-particle sedimentation and submarine landslides.The numerical examples effectively demonstrate the robustness and applicability of the MPI parallel DEM-IMB-LBM framework.展开更多
An improved algorithm, which solves cooperative concurrent computing tasks using the idle cycles of a number of high performance heterogeneous workstations interconnected through a high-speed network, was proposed. In...An improved algorithm, which solves cooperative concurrent computing tasks using the idle cycles of a number of high performance heterogeneous workstations interconnected through a high-speed network, was proposed. In order to get better parallel computation performance, this paper gave a model and an algorithm of task scheduling among heterogeneous workstations, in which the costs of loading data, computing, communication and collecting results are considered. Using this efficient algorithm, an optimal subset of heterogeneous workstations with the shortest parallel executing time of tasks can be selected.展开更多
It is significant to efficiently support artificial intelligence(AI)applications on heterogeneous mobile platforms,especially coordinately execute a deep neural network(DNN)model on multiple computing devices of one m...It is significant to efficiently support artificial intelligence(AI)applications on heterogeneous mobile platforms,especially coordinately execute a deep neural network(DNN)model on multiple computing devices of one mobile platform.This paper proposes HOPE,an end-to-end heterogeneous inference framework running on mobile platforms to distribute the operators in a DNN model to different computing devices.The problem is formalized into an integer linear programming(ILP)problem and a heuristic algorithm is proposed to determine the near-optimal heterogeneous execution plan.The experimental results demonstrate that HOPE can reduce up to 36.2%inference latency(with an average of 22.0%)than MOSAIC,22.0%(with an average of 10.2%)than StarPU and 41.8%(with an average of 18.4%)thanμLayer respectively.展开更多
From a practical point of view,grain structure heterogeneities are key parameters that control the rock response and still remains a challenge to incorporate in a quantitative manner.One of the less discussed topics i...From a practical point of view,grain structure heterogeneities are key parameters that control the rock response and still remains a challenge to incorporate in a quantitative manner.One of the less discussed topics in the context of the grain-based model(GBM)in the particle flow code(PFC)is the contact heterogeneities and the appropriate contact model to mimic the grain boundary behavior.Generally,the smooth joint(SJ)model and linear parallel bond(LPB)model are used to simulate the grain boundary behavior.However,the literature does not document the suitability of different models for specific problems.Another challenge in implementing GBM in PFC is that only a single bonding parameter is used at the grain boundaries.The aim of this study is to investigate the responses of a laboratory-scale specimen with SJ and LPB models,considering grain boundary heterogeneous and homogeneous contact parameters.Uniaxial and biaxial compression tests are performed to calibrate the response of Creighton granite.The stressestrain curves,volumetric dilation,inter-crack(crack in the grain boundary),and intra-crack(crack within the grain)development,and failure patterns associated with different contact models are examined.It was found that both the SJ and LPB models can reproduce the pre-peak behavior observed for a granitic rock type.However,the LPB model is unable to reproduce the post-peak behavior.Due to the large interlocking effect originating from the balls in contact and the ball size in the LPB model,local dilation is induced at the grain boundaries.This overestimates the volumetric dilation and residual shear strength.The LPB model tends to result in discontinuous inter-cracks and stress localization in the rock specimen,resulting in fine fragments at the rock surface during failure.展开更多
In this paper, a parallel Surface Extraction from Binary Volumes with Higher-Order Smoothness (SEBVHOS) algorithm is proposed to accelerate the SEBVHOS execution. The original SEBVHOS algorithm is parallelized first, ...In this paper, a parallel Surface Extraction from Binary Volumes with Higher-Order Smoothness (SEBVHOS) algorithm is proposed to accelerate the SEBVHOS execution. The original SEBVHOS algorithm is parallelized first, and then several performance optimization techniques which are loop optimization, cache optimization, false sharing optimization, synchronization overhead op-timization, and thread affinity optimization, are used to improve the implementation's performance on multi-core systems. The performance of the parallel SEBVHOS algorithm is analyzed on a dual-core system. The experimental results show that the parallel SEBVHOS algorithm achieves an average of 1.86x speedup. More importantly, our method does not come with additional aliasing artifacts, com-paring to the original SEBVHOS algorithm.展开更多
This paper presented an idea to replace the traditionally expensive parallel machines by heterogeneous cluster of workstations. To emphasise the usability of cluster of workstations platform for parallel and distribut...This paper presented an idea to replace the traditionally expensive parallel machines by heterogeneous cluster of workstations. To emphasise the usability of cluster of workstations platform for parallel and distributed computing, also the paper presented the status report on the effort and experiences for the implementation of a dynamic load balancing for parallel tree computation depth first search(DFS) on the cluster of a workstations project. It compared the speedup performance obtained from our platform with that obtained from the traditional one. The speedup results show that cluster of workstations can be a serious alternative to the expensive parallel machines.展开更多
Based on CORBA (Common Object Request Broker Architect ) and Java techniques, a concrete solution to creating a parallel distributed FEM computing circumstance (PDFCC) on the platform of heterogeneous networks support...Based on CORBA (Common Object Request Broker Architect ) and Java techniques, a concrete solution to creating a parallel distributed FEM computing circumstance (PDFCC) on the platform of heterogeneous networks supporting TGP/IP protocol is proposed. In order to verify the feasibility of this solution, the basic frame of PDFCC has been implemented and tested on LAN (Local Area Network).展开更多
Multi-core architectures are widely used to in time-to-market and power consumption of the chips enhance the microprocessor performance within a limited increase Toward the application of high-density data signal pro...Multi-core architectures are widely used to in time-to-market and power consumption of the chips enhance the microprocessor performance within a limited increase Toward the application of high-density data signal processing, this paper presents a novel heterogeneous multi-core architecture digital signal processor (DSP), YHFT-QDSP, with one RISC CPU core and 4 VLIW DSP cores. By three kinds of interconnection, YHFT-QDSP provides high efficiency message communication for inner-chip RISC core and DSP cores, inner-chip and inter-chip DSP cores. A parallel programming platform is specifically developed for the heterogeneous nmlti-core architecture of YHFT-QDSP. This parallel programming environment provides a parallel support library and a friendly interface between high level application softwares and multi- core DSP. The 130 nm CMOS custom chip design results benchmarks show that the interconnection structure of in a high speed and moderate power design. The results of typical YHFT-QDSP is much better than other related structures and achieves better speedup when using the interconnection facilities in combing methods. YHFT-QDSP has been signed off and manufactured presently. The future applications of the multi-core chip could be found in 3G wireless base station, high performance radar, industrial applications, and so on.展开更多
基金Project(2008AA01A201) supported the National High-tech Research and Development Program of ChinaProjects(60833004, 60633050) supported by the National Natural Science Foundation of China
文摘Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at this problem,a parallelization approach was proposed with six memory optimization schemes for CG,four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20,the parallelization approach can reach up to 21 and 133 times speedups with size A and B,respectively,compared with single power processor element. Finally,the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV,simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores.
文摘Abstract In this paper, we introduce several on-going research projects to support parallel and distribut,ed computing on heterogeneous networks of workstations (NOW) in the High Performance Computing and Software Lahoratory at the University of Texas at San Antonio. The projects at aiming at addressing three technical issues. First, the factors of heterogeneity and time-sharing effects make traditional performance models/metrics for homogeneous computing performance measurement and evaluation not. suitable for bet.erogeneous computing. We develop practical models and metrics which quantify. the heterogeneity of networks and characterize the performance effects. Second, in order to perform parallel computation effectively, special system support is necessary. We are developing system schemes for heterogeneity management, process scheduling and efficient communications. Finally, to provide insight into system performance, we are developing two types of supporting tools : a graphical instrumentation monitor to aid users in investigating performance problems and in determining the most effective way of exploiting the NOW systems, and a trace-driven simulator to test and compare different system management and scheduling schemes.
文摘The Extensible Markup Language(XML)files,widely used for storing and exchanging information on the web require efficient parsing mechanisms to improve the performance of the applications.With the existing Document Object Model(DOM)based parsing,the performance degrades due to sequential processing and large memory requirements,thereby requiring an efficient XML parser to mitigate these issues.In this paper,we propose a Parallel XML Tree Generator(PXTG)algorithm for accelerating the parsing of XML files and a Regression-based XML Parsing Framework(RXPF)that analyzes and predicts performance through profiling,regression,and code generation for efficient parsing.The PXTG algorithm is based on dividing the XML file into n parts and producing n trees in parallel.The profiling phase of the RXPF framework produces a dataset by measuring the performance of various parsing models including StAX,SAX,DOM,JDOM,and PXTG on different cores by using multiple file sizes.The regression phase produces the prediction model,based on which the final code for efficient parsing of XML files is produced through the code generation phase.The RXPF framework has shown a significant improvement in performance varying from 9.54%to 32.34%over other existing models used for parsing XML files.
基金Project(61170049) supported by the National Natural Science Foundation of ChinaProject(2012AA010903) supported by the National High Technology Research and Development Program of China
文摘Peta-scale high-perfomlance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to conduct computational experiments of historic significance, these systems are presently difficult to program. The users, who are domain experts rather than computer experts, prefer to use programming models closer to their domains (e.g., physics and biology) rather than MPI and OpenME This has led the development of domain-specific programming that provides domain-specific programming interfaces but abstracts away some performance-critical architecture details. Based on experience in designing large-scale computing systems, a hybrid programming framework for scientific computing on heterogeneous architectures is proposed in this work. Its design philosophy is to provide a collaborative mechanism for domain experts and computer experts so that both domain-specific knowledge and performance-critical architecture details can be adequately exploited. Two real-world scientific applications have been evaluated on TH-IA, a peta-scale CPU-GPU heterogeneous system that is currently the 5th fastest supercomputer in the world. The experimental results show that the proposed framework is well suited for developing large-scale scientific computing applications on peta-scale heterogeneous CPU/GPU systems.
基金supported by ZTE‑University‑Institute Fund Project under Grant No.IA20230629009.
文摘This paper reviews task scheduling frameworks,methods,and evaluation metrics of central processing unit-graphics processing unit(CPU-GPU)heterogeneous clusters.Task scheduling of CPU-GPU heterogeneous clusters can be carried out on the system level,nodelevel,and device level.Most task-scheduling technologies are heuristic based on the experts’experience,while some technologies are based on statistic methods using machine learning,deep learning,or reinforcement learning.Many metrics have been adopted to evaluate and compare different task scheduling technologies that try to optimize different goals of task scheduling.Although statistic task scheduling has reached fewer research achievements than heuristic task scheduling,the statistic task scheduling still has significant research potential.
基金supported by the National Natural Science Foundation of China (Grant No.11772192).
文摘Heterogeneous multicore clusters are becoming more popular for high-performance computing due to their great computing power and cost-to-performance effectiveness nowadays.Nevertheless,parallel efficiency degradation is still a problem in large-scale structural analysis based on heterogeneousmulticore clusters.To solve it,a hybrid hierarchical parallel algorithm(HHPA)is proposed on the basis of the conventional domain decomposition algorithm(CDDA)and the parallel sparse solver.In this new algorithm,a three-layer parallelization of the computational procedure is introduced to enable the separation of the communication of inter-nodes,heterogeneous-core-groups(HCGs)and inside-heterogeneous-core-groups through mapping computing tasks to various hardware layers.This approach can not only achieve load balancing at different layers efficiently but can also improve the communication rate significantly through hierarchical communication.Additionally,the proposed hybrid parallel approach in this article can reduce the interface equation size and further reduce the solution time,which can make up for the shortcoming of growing communication overheads with the increase of interface equation size when employing CDDA.Moreover,the distributed sparse storage of a large amount of data is introduced to improve memory access.By solving benchmark instances on the Shenwei-Taihuzhiguang supercomputer,the results show that the proposed method can obtain higher speedup and parallel efficiency compared with CDDA and more superior extensibility of parallel partition compared with the two-level parallel computing algorithm(TPCA).
基金Supported by the China Postdoctoral Science Foundation(No.2014M552115)the Fundamental Research Funds for the Central Universities,ChinaUniversity of Geosciences(Wuhan)(No.CUGL140833)the National Key Technology Support Program of China(No.2011BAH06B04)
文摘In order to improve the concurrent access performance of the web-based spatial computing system in cluster,a parallel scheduling strategy based on the multi-core environment is proposed,which includes two levels of parallel processing mechanisms.One is that it can evenly allocate tasks to each server node in the cluster and the other is that it can implement the load balancing inside a server node.Based on the strategy,a new web-based spatial computing model is designed in this paper,in which,a task response ratio calculation method,a request queue buffer mechanism and a thread scheduling strategy are focused on.Experimental results show that the new model can fully use the multi-core computing advantage of each server node in the concurrent access environment and improve the average hits per second,average I/O Hits,CPU utilization and throughput.Using speed-up ratio to analyze the traditional model and the new one,the result shows that the new model has the best performance.The performance of the multi-core server nodes in the cluster is optimized;the resource utilization and the parallel processing capabilities are enhanced.The more CPU cores you have,the higher parallel processing capabilities will be obtained.
文摘The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Consequently,the single-core processor cannot meet the requirements of LTE system.This paper analyzes how to use multi-core processors to achieve parallel processing of uplink demodulation and decoding in LTE systems and designs an approach to parallel processing.The test results prove that this approach works quite well.
文摘Aim To develop a heterogeneous database united system(HDBUS)that combines the local database of Oracle, Sybase and SQL server distributed on different server into a global database,and supports the global transaction management and parallel query over the Intranet Methods In the designing and implementation of HDBUS two important concepts heterogeneous tables join. Results and Conclu- tion The first concept can be used to process the parallel query of multiple database server, the second one is the key technology of heterogeneous is the key technology of heterogeneous distribute database.
文摘The particulate discrete element method(DEM) can be employed to capture the response of rock,provided that appropriate bonding models are used to cement the particles to each other.Simulations of laboratory tests are important to establish the extent to which those models can capture realistic rock behaviors.Hitherto the focus in such comparison studies has either been on homogeneous specimens or use of two-dimensional(2D) models.In situ rock formations are often heterogeneous,thus exploring the ability of this type of models to capture heterogeneous material behavior is important to facilitate their use in design analysis.In situ stress states are basically three-dimensional(3D),and therefore it is important to develop 3D models for this purpose.This paper revisits an earlier experimental study on heterogeneous specimens,of which the relative proportions of weaker material(siltstone) and stronger,harder material(sandstone) were varied in a controlled manner.Using a 3D DEM model with the parallel bond model,virtual heterogeneous specimens were created.The overall responses in terms of variations in strength and stiffness with different percentages of weaker material(siltstone) were shown to agree with the experimental observations.There was also a good qualitative agreement in the failure patterns observed in the experiments and the simulations,suggesting that the DEM data enabled analysis of the initiation of localizations and micro fractures in the specimens.
文摘Heterogeneous computing is one effective method of high performance computing with many advantages. Task scheduling is a critical issue in heterogeneous environments as well as in homogeneous environments. A number of task scheduling algorithms for homogeneous environments have been proposed, whereas, a few for heterogeneous environments can be found in the literature. A novel task scheduling algorithm for heterogeneous environments, called the heterogeneous critical task (HCT) scheduling algorithm is presented. By means of the directed acyclic graph and the gantt graph, the HCT algorithm defines the critical task and the idle time slot. After determining the critical tasks of a given task, the HCT algorithm tentatively duplicates the critical tasks onto the processor that has the given task in the idle time slot, to reduce the start time of the given task. To compare the performance of the HCT algorithm with several recently proposed algorithms, a large set of randomly generated applications and the Gaussian elimination application are randomly generated. The experimental result has shown that the HCT algorithm outperforms the other algorithm.
基金financially supported by the National Natural Science Foundation of China(Grant Nos.12072217 and 42077254)the Natural Science Foundation of Hunan Province,China(Grant No.2022JJ30567).
文摘The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive computational costs.To overcome this limitation,a message passing interface(MPI)parallel DEM-IMB-LBM framework is proposed aimed at enhancing computation efficiency.This framework utilises a static domain decomposition scheme,with the entire computation domain being decomposed into multiple subdomains according to predefined processors.A detailed parallel strategy is employed for both contact detection and hydrodynamic force calculation.In particular,a particle ID re-numbering scheme is proposed to handle particle transitions across sub-domain interfaces.Two benchmarks are conducted to validate the accuracy and overall performance of the proposed framework.Subsequently,the framework is applied to simulate scenarios involving multi-particle sedimentation and submarine landslides.The numerical examples effectively demonstrate the robustness and applicability of the MPI parallel DEM-IMB-LBM framework.
文摘An improved algorithm, which solves cooperative concurrent computing tasks using the idle cycles of a number of high performance heterogeneous workstations interconnected through a high-speed network, was proposed. In order to get better parallel computation performance, this paper gave a model and an algorithm of task scheduling among heterogeneous workstations, in which the costs of loading data, computing, communication and collecting results are considered. Using this efficient algorithm, an optimal subset of heterogeneous workstations with the shortest parallel executing time of tasks can be selected.
基金Supported by the General Program of National Natural Science Foundation of China(No.61872043)。
文摘It is significant to efficiently support artificial intelligence(AI)applications on heterogeneous mobile platforms,especially coordinately execute a deep neural network(DNN)model on multiple computing devices of one mobile platform.This paper proposes HOPE,an end-to-end heterogeneous inference framework running on mobile platforms to distribute the operators in a DNN model to different computing devices.The problem is formalized into an integer linear programming(ILP)problem and a heuristic algorithm is proposed to determine the near-optimal heterogeneous execution plan.The experimental results demonstrate that HOPE can reduce up to 36.2%inference latency(with an average of 22.0%)than MOSAIC,22.0%(with an average of 10.2%)than StarPU and 41.8%(with an average of 18.4%)thanμLayer respectively.
基金Supports from the University Transportation Center for Underground Transportation Infrastructure(UTC-UTI)at the Colorado School of Mines for funding this research under Grant No.69A3551747118 from the US Department of Transportation(DOT)the Fundamental Research Funds for the Central Universities under Grant No.A0920502052401-210 are gratefully acknowledged.
文摘From a practical point of view,grain structure heterogeneities are key parameters that control the rock response and still remains a challenge to incorporate in a quantitative manner.One of the less discussed topics in the context of the grain-based model(GBM)in the particle flow code(PFC)is the contact heterogeneities and the appropriate contact model to mimic the grain boundary behavior.Generally,the smooth joint(SJ)model and linear parallel bond(LPB)model are used to simulate the grain boundary behavior.However,the literature does not document the suitability of different models for specific problems.Another challenge in implementing GBM in PFC is that only a single bonding parameter is used at the grain boundaries.The aim of this study is to investigate the responses of a laboratory-scale specimen with SJ and LPB models,considering grain boundary heterogeneous and homogeneous contact parameters.Uniaxial and biaxial compression tests are performed to calibrate the response of Creighton granite.The stressestrain curves,volumetric dilation,inter-crack(crack in the grain boundary),and intra-crack(crack within the grain)development,and failure patterns associated with different contact models are examined.It was found that both the SJ and LPB models can reproduce the pre-peak behavior observed for a granitic rock type.However,the LPB model is unable to reproduce the post-peak behavior.Due to the large interlocking effect originating from the balls in contact and the ball size in the LPB model,local dilation is induced at the grain boundaries.This overestimates the volumetric dilation and residual shear strength.The LPB model tends to result in discontinuous inter-cracks and stress localization in the rock specimen,resulting in fine fragments at the rock surface during failure.
基金Supported by the National Natural Science Foundation of China(No.61071173)
文摘In this paper, a parallel Surface Extraction from Binary Volumes with Higher-Order Smoothness (SEBVHOS) algorithm is proposed to accelerate the SEBVHOS execution. The original SEBVHOS algorithm is parallelized first, and then several performance optimization techniques which are loop optimization, cache optimization, false sharing optimization, synchronization overhead op-timization, and thread affinity optimization, are used to improve the implementation's performance on multi-core systems. The performance of the parallel SEBVHOS algorithm is analyzed on a dual-core system. The experimental results show that the parallel SEBVHOS algorithm achieves an average of 1.86x speedup. More importantly, our method does not come with additional aliasing artifacts, com-paring to the original SEBVHOS algorithm.
基金National Science Foundation of China(No.60 173 0 3 1)
文摘This paper presented an idea to replace the traditionally expensive parallel machines by heterogeneous cluster of workstations. To emphasise the usability of cluster of workstations platform for parallel and distributed computing, also the paper presented the status report on the effort and experiences for the implementation of a dynamic load balancing for parallel tree computation depth first search(DFS) on the cluster of a workstations project. It compared the speedup performance obtained from our platform with that obtained from the traditional one. The speedup results show that cluster of workstations can be a serious alternative to the expensive parallel machines.
文摘Based on CORBA (Common Object Request Broker Architect ) and Java techniques, a concrete solution to creating a parallel distributed FEM computing circumstance (PDFCC) on the platform of heterogeneous networks supporting TGP/IP protocol is proposed. In order to verify the feasibility of this solution, the basic frame of PDFCC has been implemented and tested on LAN (Local Area Network).
基金supported by the National Science and Technology Major Project of the Ministry of Science and Technology of China under Grant No.2009ZX01034-001-001-006the National High Technology Research and Development 863 Program of China under Grant No.2007AA01Z108the Program for Changjiang Scholars and Innovative Research Team in Universities of China under Grant No.IRT0614.
文摘Multi-core architectures are widely used to in time-to-market and power consumption of the chips enhance the microprocessor performance within a limited increase Toward the application of high-density data signal processing, this paper presents a novel heterogeneous multi-core architecture digital signal processor (DSP), YHFT-QDSP, with one RISC CPU core and 4 VLIW DSP cores. By three kinds of interconnection, YHFT-QDSP provides high efficiency message communication for inner-chip RISC core and DSP cores, inner-chip and inter-chip DSP cores. A parallel programming platform is specifically developed for the heterogeneous nmlti-core architecture of YHFT-QDSP. This parallel programming environment provides a parallel support library and a friendly interface between high level application softwares and multi- core DSP. The 130 nm CMOS custom chip design results benchmarks show that the interconnection structure of in a high speed and moderate power design. The results of typical YHFT-QDSP is much better than other related structures and achieves better speedup when using the interconnection facilities in combing methods. YHFT-QDSP has been signed off and manufactured presently. The future applications of the multi-core chip could be found in 3G wireless base station, high performance radar, industrial applications, and so on.