Many efforts have been devoted to efficient task scheduling in Multi-Unmanned Aerial Vehicle(UAV)edge computing.However,the heterogeneity of UAV computation resource,and the task re-allocating between UAVs have not be...Many efforts have been devoted to efficient task scheduling in Multi-Unmanned Aerial Vehicle(UAV)edge computing.However,the heterogeneity of UAV computation resource,and the task re-allocating between UAVs have not been fully considered yet.Moreover,most existing works neglect the fact that a task can only be executed on the UAV equipped with its desired service function(SF).In this backdrop,this paper formulates the task scheduling problem as a multi-objective task scheduling problem,which aims at maximizing the task execution success ratio while minimizing the average weighted sum of all tasks’completion time and energy consumption.Optimizing three coupled goals in a realtime manner with the dynamic arrival of tasks hinders us from adopting existing methods,like machine learning-based solutions that require a long training time and tremendous pre-knowledge about the task arrival process,or heuristic-based ones that usually incur a long decision-making time.To tackle this problem in a distributed manner,we establish a matching theory framework,in which three conflicting goals are treated as the preferences of tasks,SFs and UAVs.Then,a Distributed Matching Theory-based Re-allocating(DiMaToRe)algorithm is put forward.We formally proved that a stable matching can be achieved by our proposal.Extensive simulation results show that Di Ma To Re algorithm outperforms benchmark algorithms under diverse parameter settings and has good robustness.展开更多
Federated learning is an emerging machine learning techniquethat enables clients to collaboratively train a deep learning model withoutuploading raw data to the aggregation server. Each client may be equippedwith diff...Federated learning is an emerging machine learning techniquethat enables clients to collaboratively train a deep learning model withoutuploading raw data to the aggregation server. Each client may be equippedwith different computing resources for model training. The client equippedwith a lower computing capability requires more time for model training,resulting in a prolonged training time in federated learning. Moreover, it mayfail to train the entire model because of the out-of-memory issue. This studyaims to tackle these problems and propose the federated feature concatenate(FedFC) method for federated learning considering heterogeneous clients.FedFC leverages the model splitting and feature concatenate for offloadinga portion of the training loads from clients to the aggregation server. Eachclient in FedFC can collaboratively train a model with different cutting layers.Therefore, the specific features learned in the deeper layer of the serversidemodel are more identical for the data class classification. Accordingly,FedFC can reduce the computation loading for the resource-constrainedclient and accelerate the convergence time. The performance effectiveness isverified by considering different dataset scenarios, such as data and classimbalance for the participant clients in the experiments. The performanceimpacts of different cutting layers are evaluated during the model training.The experimental results show that the co-adapted features have a criticalimpact on the adequate classification of the deep learning model. Overall,FedFC not only shortens the convergence time, but also improves the bestaccuracy by up to 5.9% and 14.5% when compared to conventional federatedlearning and splitfed, respectively. In conclusion, the proposed approach isfeasible and effective for heterogeneous clients in federated learning.展开更多
Cloud computing has taken over the high-performance distributed computing area,and it currently provides on-demand services and resource polling over the web.As a result of constantly changing user service demand,the ...Cloud computing has taken over the high-performance distributed computing area,and it currently provides on-demand services and resource polling over the web.As a result of constantly changing user service demand,the task scheduling problem has emerged as a critical analytical topic in cloud computing.The primary goal of scheduling tasks is to distribute tasks to available processors to construct the shortest possible schedule without breaching precedence restrictions.Assignments and schedules of tasks substantially influence system operation in a heterogeneous multiprocessor system.The diverse processes inside the heuristic-based task scheduling method will result in varying makespan in the heterogeneous computing system.As a result,an intelligent scheduling algorithm should efficiently determine the priority of every subtask based on the resources necessary to lower the makespan.This research introduced a novel efficient scheduling task method in cloud computing systems based on the cooperation search algorithm to tackle an essential task and schedule a heterogeneous cloud computing problem.The basic idea of thismethod is to use the advantages of meta-heuristic algorithms to get the optimal solution.We assess our algorithm’s performance by running it through three scenarios with varying numbers of tasks.The findings demonstrate that the suggested technique beats existingmethods NewGenetic Algorithm(NGA),Genetic Algorithm(GA),Whale Optimization Algorithm(WOA),Gravitational Search Algorithm(GSA),and Hybrid Heuristic and Genetic(HHG)by 7.9%,2.1%,8.8%,7.7%,3.4%respectively according to makespan.展开更多
The Internet of Things(IoT)has characteristics such as node mobility,node heterogeneity,link heterogeneity,and topology heterogeneity.In the face of the IoT characteristics and the explosive growth of IoT nodes,which ...The Internet of Things(IoT)has characteristics such as node mobility,node heterogeneity,link heterogeneity,and topology heterogeneity.In the face of the IoT characteristics and the explosive growth of IoT nodes,which brings about large-scale data processing requirements,edge computing architecture has become an emerging network architecture to support IoT applications due to its ability to provide powerful computing capabilities and good service functions.However,the defense mechanism of Edge Computing-enabled IoT Nodes(ECIoTNs)is still weak due to their limited resources,so that they are susceptible to malicious software spread,which can compromise data confidentiality and network service availability.Facing this situation,we put forward an epidemiology-based susceptible-curb-infectious-removed-dead(SCIRD)model.Then,we analyze the dynamics of ECIoTNs with different infection levels under different initial conditions to obtain the dynamic differential equations.Additionally,we establish the presence of equilibrium states in the SCIRD model.Furthermore,we conduct an analysis of the model’s stability and examine the conditions under which malicious software will either spread or disappear within Edge Computing-enabled IoT(ECIoT)networks.Lastly,we validate the efficacy and superiority of the SCIRD model through MATLAB simulations.These research findings offer a theoretical foundation for suppressing the propagation of malicious software in ECIoT networks.The experimental results indicate that the theoretical SCIRD model has instructive significance,deeply revealing the principles of malicious software propagation in ECIoT networks.This study solves a challenging security problem of ECIoT networks by determining the malicious software propagation threshold,which lays the foundation for buildingmore secure and reliable ECIoT networks.展开更多
Molecular Dynamics(MD)simulation for computing Interatomic Potential(IAP)is a very important High-Performance Computing(HPC)application.MD simulation on particles of experimental relevance takes huge computation time,...Molecular Dynamics(MD)simulation for computing Interatomic Potential(IAP)is a very important High-Performance Computing(HPC)application.MD simulation on particles of experimental relevance takes huge computation time,despite using an expensive high-end server.Heterogeneous computing,a combination of the Field Programmable Gate Array(FPGA)and a computer,is proposed as a solution to compute MD simulation efficiently.In such heterogeneous computation,communication between FPGA and Computer is necessary.One such MD simulation,explained in the paper,is the(Artificial Neural Network)ANN-based IAP computation of gold(Au_(147)&Au_(309))nanoparticles.MD simulation calculates the forces between atoms and the total energy of the chemical system.This work proposes the novel design and implementation of an ANN IAP-based MD simulation for Au_(147)&Au_(309) using communication protocols,such as Universal Asynchronous Receiver-Transmitter(UART)and Ethernet,for communication between the FPGA and the host computer.To improve the latency of MD simulation through heterogeneous computing,Universal Asynchronous Receiver-Transmitter(UART)and Ethernet communication protocols were explored to conduct MD simulation of 50,000 cycles.In this study,computation times of 17.54 and 18.70 h were achieved with UART and Ethernet,respectively,compared to the conventional server time of 29 h for Au_(147) nanoparticles.The results pave the way for the development of a Lab-on-a-chip application.展开更多
By Mobile Edge Computing(MEC), computation-intensive tasks are offloaded from mobile devices to cloud servers, and thus the energy consumption of mobile devices can be notably reduced. In this paper, we study task off...By Mobile Edge Computing(MEC), computation-intensive tasks are offloaded from mobile devices to cloud servers, and thus the energy consumption of mobile devices can be notably reduced. In this paper, we study task offloading in multi-user MEC systems with heterogeneous clouds, including edge clouds and remote clouds. Tasks are forwarded from mobile devices to edge clouds via wireless channels, and they can be further forwarded to remote clouds via the Internet. Our objective is to minimize the total energy consumption of multiple mobile devices, subject to bounded-delay requirements of tasks. Based on dynamic programming, we propose an algorithm that minimizes the energy consumption, by jointly allocating bandwidth and computational resources to mobile devices. The algorithm is of pseudo-polynomial complexity. To further reduce the complexity, we propose an approximation algorithm with energy discretization, and its total energy consumption is proved to be within a bounded gap from the optimum. Simulation results show that, nearly 82.7% energy of mobile devices can be saved by task offloading compared with mobile device execution.展开更多
The problem of joint radio and cloud resources allocation is studied for heterogeneous mobile cloud computing networks. The objective of the proposed joint resource allocation schemes is to maximize the total utility ...The problem of joint radio and cloud resources allocation is studied for heterogeneous mobile cloud computing networks. The objective of the proposed joint resource allocation schemes is to maximize the total utility of users as well as satisfy the required quality of service(QoS) such as the end-to-end response latency experienced by each user. We formulate the problem of joint resource allocation as a combinatorial optimization problem. Three evolutionary approaches are considered to solve the problem: genetic algorithm(GA), ant colony optimization with genetic algorithm(ACO-GA), and quantum genetic algorithm(QGA). To decrease the time complexity, we propose a mapping process between the resource allocation matrix and the chromosome of GA, ACO-GA, and QGA, search the available radio and cloud resource pairs based on the resource availability matrixes for ACOGA, and encode the difference value between the allocated resources and the minimum resource requirement for QGA. Extensive simulation results show that our proposed methods greatly outperform the existing algorithms in terms of running time, the accuracy of final results, the total utility, resource utilization and the end-to-end response latency guaranteeing.展开更多
Abstract In this paper, we introduce several on-going research projects to support parallel and distribut,ed computing on heterogeneous networks of workstations (NOW) in the High Performance Computing and Software Lah...Abstract In this paper, we introduce several on-going research projects to support parallel and distribut,ed computing on heterogeneous networks of workstations (NOW) in the High Performance Computing and Software Lahoratory at the University of Texas at San Antonio. The projects at aiming at addressing three technical issues. First, the factors of heterogeneity and time-sharing effects make traditional performance models/metrics for homogeneous computing performance measurement and evaluation not. suitable for bet.erogeneous computing. We develop practical models and metrics which quantify. the heterogeneity of networks and characterize the performance effects. Second, in order to perform parallel computation effectively, special system support is necessary. We are developing system schemes for heterogeneity management, process scheduling and efficient communications. Finally, to provide insight into system performance, we are developing two types of supporting tools : a graphical instrumentation monitor to aid users in investigating performance problems and in determining the most effective way of exploiting the NOW systems, and a trace-driven simulator to test and compare different system management and scheduling schemes.展开更多
Peta-scale high-perfomlance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to co...Peta-scale high-perfomlance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to conduct computational experiments of historic significance, these systems are presently difficult to program. The users, who are domain experts rather than computer experts, prefer to use programming models closer to their domains (e.g., physics and biology) rather than MPI and OpenME This has led the development of domain-specific programming that provides domain-specific programming interfaces but abstracts away some performance-critical architecture details. Based on experience in designing large-scale computing systems, a hybrid programming framework for scientific computing on heterogeneous architectures is proposed in this work. Its design philosophy is to provide a collaborative mechanism for domain experts and computer experts so that both domain-specific knowledge and performance-critical architecture details can be adequately exploited. Two real-world scientific applications have been evaluated on TH-IA, a peta-scale CPU-GPU heterogeneous system that is currently the 5th fastest supercomputer in the world. The experimental results show that the proposed framework is well suited for developing large-scale scientific computing applications on peta-scale heterogeneous CPU/GPU systems.展开更多
An improved algorithm, which solves cooperative concurrent computing tasks using the idle cycles of a number of high performance heterogeneous workstations interconnected through a high-speed network, was proposed. In...An improved algorithm, which solves cooperative concurrent computing tasks using the idle cycles of a number of high performance heterogeneous workstations interconnected through a high-speed network, was proposed. In order to get better parallel computation performance, this paper gave a model and an algorithm of task scheduling among heterogeneous workstations, in which the costs of loading data, computing, communication and collecting results are considered. Using this efficient algorithm, an optimal subset of heterogeneous workstations with the shortest parallel executing time of tasks can be selected.展开更多
Heterogeneous computing (HC) environment utilizes diverse resources with different computational capabilities to solve computing-intensive applications having diverse computational requirements and constraints. The ta...Heterogeneous computing (HC) environment utilizes diverse resources with different computational capabilities to solve computing-intensive applications having diverse computational requirements and constraints. The task assignment problem in HC environment can be formally defined as for a given set of tasks and machines, assigning tasks to machines to achieve the minimum makespan. In this paper we propose a new task scheduling heuristic, high standard deviation first (HSTDF), which considers the standard deviation of the expected execution time of a task as a selection criterion. Standard deviation of the ex- pected execution time of a task represents the amount of variation in task execution time on different machines. Our conclusion is that tasks having high standard deviation must be assigned first for scheduling. A large number of experiments were carried out to check the effectiveness of the proposed heuristic in different scenarios, and the comparison with the existing heuristics (Max-min, Sufferage, Segmented Min-average, Segmented Min-min, and Segmented Max-min) clearly reveals that the proposed heuristic outperforms all existing heuristics in terms of average makespan.展开更多
Offloading Mobile Devices(MDs)computation tasks to Edge Nodes(ENs)is a promising solution to overcome computation and energy resources limitations of MDs.However,there exists an unreasonable profit allocation problem ...Offloading Mobile Devices(MDs)computation tasks to Edge Nodes(ENs)is a promising solution to overcome computation and energy resources limitations of MDs.However,there exists an unreasonable profit allocation problem between MDs and ENs caused by the excessive concern on MD profit.In this paper,we propose an auction-based computation offloading algorithm,inspiring ENs to provide high-quality service by maximizing the profit of ENs.Firstly,a novel cooperation auction framework is designed to avoid overall profit damage of ENs,which is derived from the high computation delay at the overloaded ENs.Thereafter,the bidding willingness of each MD in every round of auction is determined to ensure MD rationality.Furthermore,we put forward a payment rule for the pre-selected winner to effectively guarantee auction truthfulness.Finally,the auction-based profit maximization offloading algorithm is proposed,and the MD is allowed to occupy the computation and spectrum resources of the EN for offloading if it wins the auction.Numerical results verify the performance of the proposed algorithm.Compared with the VA algorithm,the ENs profit is increased by 23.8%,and the task discard ratio is decreased by 7.5%.展开更多
This paper deals with modeling of the phenomenon of fretting fatigue in heterogeneous materials using the multi-scale computational homogenization technique and finite element analysis(FEA).The heterogeneous material ...This paper deals with modeling of the phenomenon of fretting fatigue in heterogeneous materials using the multi-scale computational homogenization technique and finite element analysis(FEA).The heterogeneous material for the specimens consists of a single hole model(25% void/cell,16% void/cell and 10% void/cell)and a four-hole model(25%void/cell).Using a representative volume element(RVE),we try to produce the equivalent homogenized properties and work on a homogeneous specimen for the study of fretting fatigue.Next,the fretting fatigue contact problem is performed for 3 new cases of models that consist of a homogeneous and a heterogeneous part(single hole cell)in the contact area.The aim is to analyze the normal and shear stresses of these models and compare them with the results of the corresponding heterogeneous models based on the Direct Numerical Simulation(DNS)method.Finally,by comparing the computational time and%deviations,we draw conclusions about the reliability and effectiveness of the proposed method.展开更多
The possibility of carrying out a purely heterogeneous Heck reaction in practice without Pd leaching has been previously considered by a number of research groups but no general consent has yet arrived. Here, the reac...The possibility of carrying out a purely heterogeneous Heck reaction in practice without Pd leaching has been previously considered by a number of research groups but no general consent has yet arrived. Here, the reaction was, for the first time, evaluated by a simple computational approach. Modelling experiments were performed on one of the initial catalytic steps: phenyl halides attachment on Pd (111) to (100) and (111) to (111) ridges of a Pd crystal. Three surface structures of resulting were identified as possible reactive intermediates. Following potential energy minimisation calculations based on a universal force field, the relative stabilities of these surface species were then determined. Results showed the most stable species to be one in which a Pd ridge atom is removed from the Pd crystal structure, suggesting Pd leaching induced by phenyl halides is energetically favourable.展开更多
An extended multiscale finite element method (EMsFEM) is developed for solving the mechanical problems of heterogeneous materials in elasticity.The underlying idea of the method is to construct numerically the multi...An extended multiscale finite element method (EMsFEM) is developed for solving the mechanical problems of heterogeneous materials in elasticity.The underlying idea of the method is to construct numerically the multiscale base functions to capture the small-scale features of the coarse elements in the multiscale finite element analysis.On the basis of our existing work for periodic truss materials, the construction methods of the base functions for continuum heterogeneous materials are systematically introduced. Numerical experiments show that the choice of boundary conditions for the construction of the base functions has a big influence on the accuracy of the multiscale solutions, thus,different kinds of boundary conditions are proposed. The efficiency and accuracy of the developed method are validated and the results with different boundary conditions are verified through extensive numerical examples with both periodic and random heterogeneous micro-structures.Also, a consistency test of the method is performed numerically. The results show that the EMsFEM can effectively obtain the macro response of the heterogeneous structures as well as the response in micro-scale,especially under the periodic boundary conditions.展开更多
Based on the energy conversion of light into sound,photoacoustic computed tomography(PACT)is an emerging biomedical imaging modality and has unique applications in a range of biomedical fields.In PACT,image formation ...Based on the energy conversion of light into sound,photoacoustic computed tomography(PACT)is an emerging biomedical imaging modality and has unique applications in a range of biomedical fields.In PACT,image formation relies on a process called acoustic inversion from received photoacoustic signals.While most PACT systems perform this inversion with a basic assumption that biological tissues are acoustically homogeneous,the community gradually rea-lizes that the intrinsic acoustic heterogeneity of tissues could pose distortions and artifacts to finally formed images.This paper surveys the most recent research progress on acoustic het-erogeneity correction in PACT.Four major strategies are reviewed in detail,including half-time or partial-time reconstruction,autofocus reconstruction by optimizing sound speed maps,joint reconstruction of optical absorption and sound speed maps,and ultrasound computed tomog-raphy(USCT)enhanced reconstruction.The correction of acoustic heterogeneity helps improve the imaging performance of PACT.展开更多
Particle-in-cell (PIC) method has got much benefits from GPU-accelerated heterogeneous systems.However,the performance of PIC is constrained by the interpolation operations in the weighting process on GPU (graphic pro...Particle-in-cell (PIC) method has got much benefits from GPU-accelerated heterogeneous systems.However,the performance of PIC is constrained by the interpolation operations in the weighting process on GPU (graphic processing unit).Aiming at this problem,a fast weighting method for PIC simulation on GPU-accelerated systems was proposed to avoid the atomic memory operations during the weighting process.The method was implemented by taking advantage of GPU's thread synchronization mechanism and dividing the problem space properly.Moreover,software managed shared memory on the GPU was employed to buffer the intermediate data.The experimental results show that the method achieves speedups up to 3.5 times compared to previous works,and runs 20.08 times faster on one NVIDIA Tesla M2090 GPU compared to a single core of Intel Xeon X5670 CPU.展开更多
In recent years,with the development of processor architecture,heterogeneous processors including Center processing unit(CPU)and Graphics processing unit(GPU)have become the mainstream.However,due to the differences o...In recent years,with the development of processor architecture,heterogeneous processors including Center processing unit(CPU)and Graphics processing unit(GPU)have become the mainstream.However,due to the differences of heterogeneous core,the heterogeneous system is now facing many problems that need to be solved.In order to solve these problems,this paper try to focus on the utilization and efficiency of heterogeneous core and design some reasonable resource scheduling strategies.To improve the performance of the system,this paper proposes a combination strategy for a single task and a multi-task scheduling strategy for multiple tasks.The combination strategy consists of two sub-strategies,the first strategy improves the execution efficiency of tasks on the GPU by changing the thread organization structure.The second focuses on the working state of the efficient core and develops more reasonable workload balancing schemes to improve resource utilization of heterogeneous systems.The multi-task scheduling strategy obtains the execution efficiency of heterogeneous cores and global task information through the processing of task samples.Based on this information,an improved ant colony algorithm is used to quickly obtain a reasonable task allocation scheme,which fully utilizes the characteristics of heterogeneous cores.The experimental results show that the combination strategy reduces task execution time by 29.13%on average.In the case of processing multiple tasks,the multi-task scheduling strategy reduces the execution time by up to 23.38%based on the combined strategy.Both strategies can make better use of the resources of heterogeneous systems and significantly reduce the execution time of tasks on heterogeneous systems.展开更多
The Monte Carlo(MC)simulation is regarded as the gold standard for dose calculation in brachytherapy,but it consumes a large amount of computing resources.The development of heterogeneous computing makes it possible t...The Monte Carlo(MC)simulation is regarded as the gold standard for dose calculation in brachytherapy,but it consumes a large amount of computing resources.The development of heterogeneous computing makes it possible to substantially accelerate calculations with hardware accelerators.Accordingly,this study develops a fast MC tool,called THUBrachy,which can be accelerated by several types of hardware accelerators.THUBrachy can simulate photons with energy less than 3 MeV and considers all photon interactions in the energy range.It was benchmarked against the American Association of Physicists in Medicine Task Group No.43 Report using a water phantom and validated with Geant4 using a clinical case.A performance test was conducted using the clinical case,showing that a multicore central processing unit,Intel Xeon Phi,and graphics processing unit(GPU)can efficiently accelerate the simulation.GPU-accelerated THUBrachy is the fastest version,which is 200 times faster than the serial version and approximately 500 times faster than Geant4.The proposed tool shows great potential for fast and accurate dose calculations in clinical applications.展开更多
基金supported by the National Natural Science Foundation of China under Grant 62171465。
文摘Many efforts have been devoted to efficient task scheduling in Multi-Unmanned Aerial Vehicle(UAV)edge computing.However,the heterogeneity of UAV computation resource,and the task re-allocating between UAVs have not been fully considered yet.Moreover,most existing works neglect the fact that a task can only be executed on the UAV equipped with its desired service function(SF).In this backdrop,this paper formulates the task scheduling problem as a multi-objective task scheduling problem,which aims at maximizing the task execution success ratio while minimizing the average weighted sum of all tasks’completion time and energy consumption.Optimizing three coupled goals in a realtime manner with the dynamic arrival of tasks hinders us from adopting existing methods,like machine learning-based solutions that require a long training time and tremendous pre-knowledge about the task arrival process,or heuristic-based ones that usually incur a long decision-making time.To tackle this problem in a distributed manner,we establish a matching theory framework,in which three conflicting goals are treated as the preferences of tasks,SFs and UAVs.Then,a Distributed Matching Theory-based Re-allocating(DiMaToRe)algorithm is put forward.We formally proved that a stable matching can be achieved by our proposal.Extensive simulation results show that Di Ma To Re algorithm outperforms benchmark algorithms under diverse parameter settings and has good robustness.
基金supported by the National Science and Technology Council (NSTC)of Taiwan under Grants 108-2218-E-033-008-MY3,110-2634-F-A49-005,111-2221-E-033-033the Veterans General Hospitals and University System of Taiwan Joint Research Program under Grant VGHUST111-G6-5-1.
文摘Federated learning is an emerging machine learning techniquethat enables clients to collaboratively train a deep learning model withoutuploading raw data to the aggregation server. Each client may be equippedwith different computing resources for model training. The client equippedwith a lower computing capability requires more time for model training,resulting in a prolonged training time in federated learning. Moreover, it mayfail to train the entire model because of the out-of-memory issue. This studyaims to tackle these problems and propose the federated feature concatenate(FedFC) method for federated learning considering heterogeneous clients.FedFC leverages the model splitting and feature concatenate for offloadinga portion of the training loads from clients to the aggregation server. Eachclient in FedFC can collaboratively train a model with different cutting layers.Therefore, the specific features learned in the deeper layer of the serversidemodel are more identical for the data class classification. Accordingly,FedFC can reduce the computation loading for the resource-constrainedclient and accelerate the convergence time. The performance effectiveness isverified by considering different dataset scenarios, such as data and classimbalance for the participant clients in the experiments. The performanceimpacts of different cutting layers are evaluated during the model training.The experimental results show that the co-adapted features have a criticalimpact on the adequate classification of the deep learning model. Overall,FedFC not only shortens the convergence time, but also improves the bestaccuracy by up to 5.9% and 14.5% when compared to conventional federatedlearning and splitfed, respectively. In conclusion, the proposed approach isfeasible and effective for heterogeneous clients in federated learning.
文摘Cloud computing has taken over the high-performance distributed computing area,and it currently provides on-demand services and resource polling over the web.As a result of constantly changing user service demand,the task scheduling problem has emerged as a critical analytical topic in cloud computing.The primary goal of scheduling tasks is to distribute tasks to available processors to construct the shortest possible schedule without breaching precedence restrictions.Assignments and schedules of tasks substantially influence system operation in a heterogeneous multiprocessor system.The diverse processes inside the heuristic-based task scheduling method will result in varying makespan in the heterogeneous computing system.As a result,an intelligent scheduling algorithm should efficiently determine the priority of every subtask based on the resources necessary to lower the makespan.This research introduced a novel efficient scheduling task method in cloud computing systems based on the cooperation search algorithm to tackle an essential task and schedule a heterogeneous cloud computing problem.The basic idea of thismethod is to use the advantages of meta-heuristic algorithms to get the optimal solution.We assess our algorithm’s performance by running it through three scenarios with varying numbers of tasks.The findings demonstrate that the suggested technique beats existingmethods NewGenetic Algorithm(NGA),Genetic Algorithm(GA),Whale Optimization Algorithm(WOA),Gravitational Search Algorithm(GSA),and Hybrid Heuristic and Genetic(HHG)by 7.9%,2.1%,8.8%,7.7%,3.4%respectively according to makespan.
基金in part by National Undergraduate Innovation and Entrepreneurship Training Program under Grant No.202310347039Zhejiang Provincial Natural Science Foundation of China under Grant No.LZ22F020002Huzhou Science and Technology Planning Foundation under Grant No.2023GZ04.
文摘The Internet of Things(IoT)has characteristics such as node mobility,node heterogeneity,link heterogeneity,and topology heterogeneity.In the face of the IoT characteristics and the explosive growth of IoT nodes,which brings about large-scale data processing requirements,edge computing architecture has become an emerging network architecture to support IoT applications due to its ability to provide powerful computing capabilities and good service functions.However,the defense mechanism of Edge Computing-enabled IoT Nodes(ECIoTNs)is still weak due to their limited resources,so that they are susceptible to malicious software spread,which can compromise data confidentiality and network service availability.Facing this situation,we put forward an epidemiology-based susceptible-curb-infectious-removed-dead(SCIRD)model.Then,we analyze the dynamics of ECIoTNs with different infection levels under different initial conditions to obtain the dynamic differential equations.Additionally,we establish the presence of equilibrium states in the SCIRD model.Furthermore,we conduct an analysis of the model’s stability and examine the conditions under which malicious software will either spread or disappear within Edge Computing-enabled IoT(ECIoT)networks.Lastly,we validate the efficacy and superiority of the SCIRD model through MATLAB simulations.These research findings offer a theoretical foundation for suppressing the propagation of malicious software in ECIoT networks.The experimental results indicate that the theoretical SCIRD model has instructive significance,deeply revealing the principles of malicious software propagation in ECIoT networks.This study solves a challenging security problem of ECIoT networks by determining the malicious software propagation threshold,which lays the foundation for buildingmore secure and reliable ECIoT networks.
文摘Molecular Dynamics(MD)simulation for computing Interatomic Potential(IAP)is a very important High-Performance Computing(HPC)application.MD simulation on particles of experimental relevance takes huge computation time,despite using an expensive high-end server.Heterogeneous computing,a combination of the Field Programmable Gate Array(FPGA)and a computer,is proposed as a solution to compute MD simulation efficiently.In such heterogeneous computation,communication between FPGA and Computer is necessary.One such MD simulation,explained in the paper,is the(Artificial Neural Network)ANN-based IAP computation of gold(Au_(147)&Au_(309))nanoparticles.MD simulation calculates the forces between atoms and the total energy of the chemical system.This work proposes the novel design and implementation of an ANN IAP-based MD simulation for Au_(147)&Au_(309) using communication protocols,such as Universal Asynchronous Receiver-Transmitter(UART)and Ethernet,for communication between the FPGA and the host computer.To improve the latency of MD simulation through heterogeneous computing,Universal Asynchronous Receiver-Transmitter(UART)and Ethernet communication protocols were explored to conduct MD simulation of 50,000 cycles.In this study,computation times of 17.54 and 18.70 h were achieved with UART and Ethernet,respectively,compared to the conventional server time of 29 h for Au_(147) nanoparticles.The results pave the way for the development of a Lab-on-a-chip application.
基金the National Key R&D Program of China 2018YFB1800804the Nature Science Foundation of China (No. 61871254,No. 61861136003,No. 91638204)Hitachi Ltd.
文摘By Mobile Edge Computing(MEC), computation-intensive tasks are offloaded from mobile devices to cloud servers, and thus the energy consumption of mobile devices can be notably reduced. In this paper, we study task offloading in multi-user MEC systems with heterogeneous clouds, including edge clouds and remote clouds. Tasks are forwarded from mobile devices to edge clouds via wireless channels, and they can be further forwarded to remote clouds via the Internet. Our objective is to minimize the total energy consumption of multiple mobile devices, subject to bounded-delay requirements of tasks. Based on dynamic programming, we propose an algorithm that minimizes the energy consumption, by jointly allocating bandwidth and computational resources to mobile devices. The algorithm is of pseudo-polynomial complexity. To further reduce the complexity, we propose an approximation algorithm with energy discretization, and its total energy consumption is proved to be within a bounded gap from the optimum. Simulation results show that, nearly 82.7% energy of mobile devices can be saved by task offloading compared with mobile device execution.
基金supported by the National Natural Science Foundation of China (No. 61741102, No. 61471164)China Scholarship Council
文摘The problem of joint radio and cloud resources allocation is studied for heterogeneous mobile cloud computing networks. The objective of the proposed joint resource allocation schemes is to maximize the total utility of users as well as satisfy the required quality of service(QoS) such as the end-to-end response latency experienced by each user. We formulate the problem of joint resource allocation as a combinatorial optimization problem. Three evolutionary approaches are considered to solve the problem: genetic algorithm(GA), ant colony optimization with genetic algorithm(ACO-GA), and quantum genetic algorithm(QGA). To decrease the time complexity, we propose a mapping process between the resource allocation matrix and the chromosome of GA, ACO-GA, and QGA, search the available radio and cloud resource pairs based on the resource availability matrixes for ACOGA, and encode the difference value between the allocated resources and the minimum resource requirement for QGA. Extensive simulation results show that our proposed methods greatly outperform the existing algorithms in terms of running time, the accuracy of final results, the total utility, resource utilization and the end-to-end response latency guaranteeing.
文摘Abstract In this paper, we introduce several on-going research projects to support parallel and distribut,ed computing on heterogeneous networks of workstations (NOW) in the High Performance Computing and Software Lahoratory at the University of Texas at San Antonio. The projects at aiming at addressing three technical issues. First, the factors of heterogeneity and time-sharing effects make traditional performance models/metrics for homogeneous computing performance measurement and evaluation not. suitable for bet.erogeneous computing. We develop practical models and metrics which quantify. the heterogeneity of networks and characterize the performance effects. Second, in order to perform parallel computation effectively, special system support is necessary. We are developing system schemes for heterogeneity management, process scheduling and efficient communications. Finally, to provide insight into system performance, we are developing two types of supporting tools : a graphical instrumentation monitor to aid users in investigating performance problems and in determining the most effective way of exploiting the NOW systems, and a trace-driven simulator to test and compare different system management and scheduling schemes.
基金Project(61170049) supported by the National Natural Science Foundation of ChinaProject(2012AA010903) supported by the National High Technology Research and Development Program of China
文摘Peta-scale high-perfomlance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to conduct computational experiments of historic significance, these systems are presently difficult to program. The users, who are domain experts rather than computer experts, prefer to use programming models closer to their domains (e.g., physics and biology) rather than MPI and OpenME This has led the development of domain-specific programming that provides domain-specific programming interfaces but abstracts away some performance-critical architecture details. Based on experience in designing large-scale computing systems, a hybrid programming framework for scientific computing on heterogeneous architectures is proposed in this work. Its design philosophy is to provide a collaborative mechanism for domain experts and computer experts so that both domain-specific knowledge and performance-critical architecture details can be adequately exploited. Two real-world scientific applications have been evaluated on TH-IA, a peta-scale CPU-GPU heterogeneous system that is currently the 5th fastest supercomputer in the world. The experimental results show that the proposed framework is well suited for developing large-scale scientific computing applications on peta-scale heterogeneous CPU/GPU systems.
文摘An improved algorithm, which solves cooperative concurrent computing tasks using the idle cycles of a number of high performance heterogeneous workstations interconnected through a high-speed network, was proposed. In order to get better parallel computation performance, this paper gave a model and an algorithm of task scheduling among heterogeneous workstations, in which the costs of loading data, computing, communication and collecting results are considered. Using this efficient algorithm, an optimal subset of heterogeneous workstations with the shortest parallel executing time of tasks can be selected.
基金Project supported by the National Natural Science Foundation of China (No. 60703012)the National Basic Research Program (973) of China (No. 2006CB303000)the Heilongjiang Provincial Scientific and Technological Special Fund for Young Scholars (No. QC06C033),China
文摘Heterogeneous computing (HC) environment utilizes diverse resources with different computational capabilities to solve computing-intensive applications having diverse computational requirements and constraints. The task assignment problem in HC environment can be formally defined as for a given set of tasks and machines, assigning tasks to machines to achieve the minimum makespan. In this paper we propose a new task scheduling heuristic, high standard deviation first (HSTDF), which considers the standard deviation of the expected execution time of a task as a selection criterion. Standard deviation of the ex- pected execution time of a task represents the amount of variation in task execution time on different machines. Our conclusion is that tasks having high standard deviation must be assigned first for scheduling. A large number of experiments were carried out to check the effectiveness of the proposed heuristic in different scenarios, and the comparison with the existing heuristics (Max-min, Sufferage, Segmented Min-average, Segmented Min-min, and Segmented Max-min) clearly reveals that the proposed heuristic outperforms all existing heuristics in terms of average makespan.
基金Acknowledgements: This work has been st, pported in part by the National High-Tech Research and Dcvelopment Plan of China under Gram No. 2002BA711A08 and by the Natural Science Foundation of Hunan Province under Grant No. 03JJY4054.
基金supported by National Natural Science Foundation of China under grants 61901070,61801065,61771082,61871062,U20A20157in part by the Science and Technology Research Program of Chongqing Municipal Education Commission under grants KJQN202000603,KJQN201900611+1 种基金in part by the Natural Science Foundation of Chongqing under grant cstc2020jcyjzdxmX0024part by University Innovation Research Group of Chongqing under grant CXQT20017.
文摘Offloading Mobile Devices(MDs)computation tasks to Edge Nodes(ENs)is a promising solution to overcome computation and energy resources limitations of MDs.However,there exists an unreasonable profit allocation problem between MDs and ENs caused by the excessive concern on MD profit.In this paper,we propose an auction-based computation offloading algorithm,inspiring ENs to provide high-quality service by maximizing the profit of ENs.Firstly,a novel cooperation auction framework is designed to avoid overall profit damage of ENs,which is derived from the high computation delay at the overloaded ENs.Thereafter,the bidding willingness of each MD in every round of auction is determined to ensure MD rationality.Furthermore,we put forward a payment rule for the pre-selected winner to effectively guarantee auction truthfulness.Finally,the auction-based profit maximization offloading algorithm is proposed,and the MD is allowed to occupy the computation and spectrum resources of the EN for offloading if it wins the auction.Numerical results verify the performance of the proposed algorithm.Compared with the VA algorithm,the ENs profit is increased by 23.8%,and the task discard ratio is decreased by 7.5%.
文摘This paper deals with modeling of the phenomenon of fretting fatigue in heterogeneous materials using the multi-scale computational homogenization technique and finite element analysis(FEA).The heterogeneous material for the specimens consists of a single hole model(25% void/cell,16% void/cell and 10% void/cell)and a four-hole model(25%void/cell).Using a representative volume element(RVE),we try to produce the equivalent homogenized properties and work on a homogeneous specimen for the study of fretting fatigue.Next,the fretting fatigue contact problem is performed for 3 new cases of models that consist of a homogeneous and a heterogeneous part(single hole cell)in the contact area.The aim is to analyze the normal and shear stresses of these models and compare them with the results of the corresponding heterogeneous models based on the Direct Numerical Simulation(DNS)method.Finally,by comparing the computational time and%deviations,we draw conclusions about the reliability and effectiveness of the proposed method.
文摘The possibility of carrying out a purely heterogeneous Heck reaction in practice without Pd leaching has been previously considered by a number of research groups but no general consent has yet arrived. Here, the reaction was, for the first time, evaluated by a simple computational approach. Modelling experiments were performed on one of the initial catalytic steps: phenyl halides attachment on Pd (111) to (100) and (111) to (111) ridges of a Pd crystal. Three surface structures of resulting were identified as possible reactive intermediates. Following potential energy minimisation calculations based on a universal force field, the relative stabilities of these surface species were then determined. Results showed the most stable species to be one in which a Pd ridge atom is removed from the Pd crystal structure, suggesting Pd leaching induced by phenyl halides is energetically favourable.
基金supported by the National Natural Science Foundation(10721062,11072051,90715037,10728205,91015003, 51021140004)the Program of Introducing Talents of Discipline to Universities(B08014)the National Key Basic Research Special Foundation of China(2010CB832704).
文摘An extended multiscale finite element method (EMsFEM) is developed for solving the mechanical problems of heterogeneous materials in elasticity.The underlying idea of the method is to construct numerically the multiscale base functions to capture the small-scale features of the coarse elements in the multiscale finite element analysis.On the basis of our existing work for periodic truss materials, the construction methods of the base functions for continuum heterogeneous materials are systematically introduced. Numerical experiments show that the choice of boundary conditions for the construction of the base functions has a big influence on the accuracy of the multiscale solutions, thus,different kinds of boundary conditions are proposed. The efficiency and accuracy of the developed method are validated and the results with different boundary conditions are verified through extensive numerical examples with both periodic and random heterogeneous micro-structures.Also, a consistency test of the method is performed numerically. The results show that the EMsFEM can effectively obtain the macro response of the heterogeneous structures as well as the response in micro-scale,especially under the periodic boundary conditions.
基金supported in part by the National Natural Science Foundation of China(NSFC)under Grant No.61705216in part by the Major Science and Technology Project of Anhui Province under Grant No.18030801138+4 种基金in part by the Zhe-jiang Lab under Grant No.2019MC0AB01in part by the Research Funds of the Double First-Class Initiativein part by the Research Fund of the USTC Smart City Institutein part by the CAS Pioneer Hundred Talents Programin part by the Startup Fund of the University of Science and Technology of China(USTC)
文摘Based on the energy conversion of light into sound,photoacoustic computed tomography(PACT)is an emerging biomedical imaging modality and has unique applications in a range of biomedical fields.In PACT,image formation relies on a process called acoustic inversion from received photoacoustic signals.While most PACT systems perform this inversion with a basic assumption that biological tissues are acoustically homogeneous,the community gradually rea-lizes that the intrinsic acoustic heterogeneity of tissues could pose distortions and artifacts to finally formed images.This paper surveys the most recent research progress on acoustic het-erogeneity correction in PACT.Four major strategies are reviewed in detail,including half-time or partial-time reconstruction,autofocus reconstruction by optimizing sound speed maps,joint reconstruction of optical absorption and sound speed maps,and ultrasound computed tomog-raphy(USCT)enhanced reconstruction.The correction of acoustic heterogeneity helps improve the imaging performance of PACT.
基金Projects(61170049,60903044)supported by National Natural Science Foundation of ChinaProject(2012AA010903)supported by National High Technology Research and Development Program of China
文摘Particle-in-cell (PIC) method has got much benefits from GPU-accelerated heterogeneous systems.However,the performance of PIC is constrained by the interpolation operations in the weighting process on GPU (graphic processing unit).Aiming at this problem,a fast weighting method for PIC simulation on GPU-accelerated systems was proposed to avoid the atomic memory operations during the weighting process.The method was implemented by taking advantage of GPU's thread synchronization mechanism and dividing the problem space properly.Moreover,software managed shared memory on the GPU was employed to buffer the intermediate data.The experimental results show that the method achieves speedups up to 3.5 times compared to previous works,and runs 20.08 times faster on one NVIDIA Tesla M2090 GPU compared to a single core of Intel Xeon X5670 CPU.
基金This work is supported by Beijing Natural Science Foundation[4192007]the National Natural Science Foundation of China[61202076]Beijing University of Technology Project No.2021C02.
文摘In recent years,with the development of processor architecture,heterogeneous processors including Center processing unit(CPU)and Graphics processing unit(GPU)have become the mainstream.However,due to the differences of heterogeneous core,the heterogeneous system is now facing many problems that need to be solved.In order to solve these problems,this paper try to focus on the utilization and efficiency of heterogeneous core and design some reasonable resource scheduling strategies.To improve the performance of the system,this paper proposes a combination strategy for a single task and a multi-task scheduling strategy for multiple tasks.The combination strategy consists of two sub-strategies,the first strategy improves the execution efficiency of tasks on the GPU by changing the thread organization structure.The second focuses on the working state of the efficient core and develops more reasonable workload balancing schemes to improve resource utilization of heterogeneous systems.The multi-task scheduling strategy obtains the execution efficiency of heterogeneous cores and global task information through the processing of task samples.Based on this information,an improved ant colony algorithm is used to quickly obtain a reasonable task allocation scheme,which fully utilizes the characteristics of heterogeneous cores.The experimental results show that the combination strategy reduces task execution time by 29.13%on average.In the case of processing multiple tasks,the multi-task scheduling strategy reduces the execution time by up to 23.38%based on the combined strategy.Both strategies can make better use of the resources of heterogeneous systems and significantly reduce the execution time of tasks on heterogeneous systems.
基金supported by the National Natural Science Foundation of China(No.11875036)。
文摘The Monte Carlo(MC)simulation is regarded as the gold standard for dose calculation in brachytherapy,but it consumes a large amount of computing resources.The development of heterogeneous computing makes it possible to substantially accelerate calculations with hardware accelerators.Accordingly,this study develops a fast MC tool,called THUBrachy,which can be accelerated by several types of hardware accelerators.THUBrachy can simulate photons with energy less than 3 MeV and considers all photon interactions in the energy range.It was benchmarked against the American Association of Physicists in Medicine Task Group No.43 Report using a water phantom and validated with Geant4 using a clinical case.A performance test was conducted using the clinical case,showing that a multicore central processing unit,Intel Xeon Phi,and graphics processing unit(GPU)can efficiently accelerate the simulation.GPU-accelerated THUBrachy is the fastest version,which is 200 times faster than the serial version and approximately 500 times faster than Geant4.The proposed tool shows great potential for fast and accurate dose calculations in clinical applications.