This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from g...This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology.展开更多
Milling Process Simulation is one of the important re search areas in manufacturing science. For the purpose of improving the prec ision of simulation and extending its usability, numerical algorithm is more and more ...Milling Process Simulation is one of the important re search areas in manufacturing science. For the purpose of improving the prec ision of simulation and extending its usability, numerical algorithm is more and more used in the milling modeling areas. But simulative efficiency is decreasin g with increase of its complexity. As a result, application of the method is lim ited. Aimed at above question, high-efficient algorithm for milling process sim ulation is studied. It is important for milling process simulation’s applicatio n. Parallel computing is widely used to solve the large-scale computation question s. Its advantages include system flexibility, robust, high-efficient computing capability and high ratio of performance to price. With the development of compu ter network, utilizing the computing resource in the Internet, a virtual computi ng environment with powerful computing capability can be consisted by microc omputers, and the difficulty of building hardware environment which is used to s upport parallel computing is reduced. How to use network technology and parallel algorithm to improve simulative effic iency for milling forces simulation is investigated in the paper. In order to pr edict milling forces, a simplified local milling forces model is used in the pap er. End milling cutter is assumed to be divided by r number of differential elem ents along the axial direction of the cutter. For a given time, the total cuttin g forces can be obtained by summarizing the resultant cutting force produced by each differential cutter disc. Divide the whole simulative time into some segmen ts, send these program’s segments to microcomputers in the Internet and obtain the result of the program’s segments, all of the result of program’s segments a re composed the final result. For implementing the algorithm, a distributed Parallel computing framework is de signed in the paper. In the framework, web server plays a role of controller. Us ing Java RMI(remote method interface), the computing processes in computing serv er are called by web server. There are lots of control processes in web server a nd control the computing servers. The codes of simulative algorithm can be dynam ic sent to the computing servers, and milling forces at the different time are c omputed through utilizing the local computer’s resource. The results that are ca lculated by every computing servers are sent to the web server, and composed the final result. The framework can be used by different simulative algorithm. Comp ared with the algorithm running single machine, the efficiency of provided algor ithm is higher than that of single machine.展开更多
Parallel computing assigns the computing model to different processors on different devices and implements it simultaneously.Accordingly,it has broad applications in the numerical simulation of geotechnical engineerin...Parallel computing assigns the computing model to different processors on different devices and implements it simultaneously.Accordingly,it has broad applications in the numerical simulation of geotechnical engineering and underground engineering,of which models are always large-scale.With parallel computing,the computing time or the memory requirements will be reduced by splitting the original domain of the numerical model into many subdomains,which is thus named as the domain decomposition method.In this study,a cubic and equal volume domain decomposition strategy was utilized to realize the parallel computing on the distributed memory system of four-dimensional lattice spring model(4D-LSM)based on the message passing interface.With a more efficient communication strategy introduced,this study aimed at operating an one-billion-particle model on a supercomputer platform.The preprocessing procedure of the parallelized 4D-LSM was restructured and the particle generation strategy suitable for the supercomputer platform was employed to minimize the time consumption in preprocessing and calculation.On this basis,numerical calculations were performed on TianHe-3 prototype E class supercomputer at the National Supercomputer Center in Tianjin.Two fieldscale three-dimensional blasting wave propagation models were carried out,of which the numerical results verify the computing power and the advantage of the parallelized 4D-LSM in the simulation of large-scale three-dimension models.Subsequently,the time complexity and spatial complexity of 4D-LSM and other particle discrete element methods were analyzed.展开更多
High-performance computing(HPC)refers to the ability to process data and perform complex calculations at high speeds.It is one of the most essential tools fueling the advancement of science and technology.
Low temperature complementary metal oxide semiconductor(CMOS)or cryogenic CMOS is a promising avenue for the continuation of Moore’s law while serving the needs of high performance computing.With temperature as a con...Low temperature complementary metal oxide semiconductor(CMOS)or cryogenic CMOS is a promising avenue for the continuation of Moore’s law while serving the needs of high performance computing.With temperature as a control“knob”to steepen the subthreshold slope behavior of CMOS devices,the supply voltage of operation can be reduced with no impact on operating speed.With the optimal threshold voltage engineering,the device ON current can be further enhanced,translating to higher performance.In this article,the experimentally calibrated data was adopted to tune the threshold voltage and investigated the power performance area of cryogenic CMOS at device,circuit and system level.We also presented results from measurement and analysis of functional memory chips fabricated in 28 nm bulk CMOS and 22 nm fully depleted silicon on insulator(FDSOI)operating at cryogenic temperature.Finally,the challenges and opportunities in the further development and deployment of such systems were discussed.展开更多
Modern computer systems are increasingly bounded by the available or permissible power at multiple layers from individual components to data centers.To cope with this reality,it is necessary to understand how power bo...Modern computer systems are increasingly bounded by the available or permissible power at multiple layers from individual components to data centers.To cope with this reality,it is necessary to understand how power bounds im-pact performance,especially for systems built from high-end nodes,each consisting of multiple power hungry components.Because placing an inappropriate power bound on a node or a component can lead to severe performance loss,coordinat-ing power allocation among nodes and components is mandatory to achieve desired performance given a total power bud-get.In this article,we describe the paradigm of power bounded high-performance computing,which considers coordinated power bound assignment to be a key factor in computer system performance analysis and optimization.We apply this paradigm to the problem of power coordination across multiple layers for both CPU and GPU computing.Using several case studies,we demonstrate how the principles of balanced power coordination can be applied and adapted to the inter-play of workloads,hardware technology,and the available total power for performance improvement.展开更多
Magnesium(Mg),being the lightest structural metal,holds immense potential for widespread applications in various fields.The development of high-performance and cost-effective Mg alloys is crucial to further advancing ...Magnesium(Mg),being the lightest structural metal,holds immense potential for widespread applications in various fields.The development of high-performance and cost-effective Mg alloys is crucial to further advancing their commercial utilization.With the rapid advancement of machine learning(ML)technology in recent years,the“data-driven''approach for alloy design has provided new perspectives and opportunities for enhancing the performance of Mg alloys.This paper introduces a novel regression-based Bayesian optimization active learning model(RBOALM)for the development of high-performance Mg-Mn-based wrought alloys.RBOALM employs active learning to automatically explore optimal alloy compositions and process parameters within predefined ranges,facilitating the discovery of superior alloy combinations.This model further integrates pre-established regression models as surrogate functions in Bayesian optimization,significantly enhancing the precision of the design process.Leveraging RBOALM,several new high-performance alloys have been successfully designed and prepared.Notably,after mechanical property testing of the designed alloys,the Mg-2.1Zn-2.0Mn-0.5Sn-0.1Ca alloy demonstrates exceptional mechanical properties,including an ultimate tensile strength of 406 MPa,a yield strength of 287 MPa,and a 23%fracture elongation.Furthermore,the Mg-2.7Mn-0.5Al-0.1Ca alloy exhibits an ultimate tensile strength of 211 MPa,coupled with a remarkable 41%fracture elongation.展开更多
By pushing computation,cache,and network control to the edge,mobile edge computing(MEC)is expected to play a leading role in fifth generation(5G)and future sixth generation(6G).Nevertheless,facing ubiquitous fast-grow...By pushing computation,cache,and network control to the edge,mobile edge computing(MEC)is expected to play a leading role in fifth generation(5G)and future sixth generation(6G).Nevertheless,facing ubiquitous fast-growing computational demands,it is impossible for a single MEC paradigm to effectively support high-quality intelligent services at end user equipments(UEs).To address this issue,we propose an air-ground collaborative MEC(AGCMEC)architecture in this article.The proposed AGCMEC integrates all potentially available MEC servers within air and ground in the envisioned 6G,by a variety of collaborative ways to provide computation services at their best for UEs.Firstly,we introduce the AGC-MEC architecture and elaborate three typical use cases.Then,we discuss four main challenges in the AGC-MEC as well as their potential solutions.Next,we conduct a case study of collaborative service placement for AGC-MEC to validate the effectiveness of the proposed collaborative service placement strategy.Finally,we highlight several potential research directions of the AGC-MEC.展开更多
The utilization of processing capabilities within the detector holds significant promise in addressing energy consumption and latency challenges. Especially in the context of dynamic motion recognition tasks, where su...The utilization of processing capabilities within the detector holds significant promise in addressing energy consumption and latency challenges. Especially in the context of dynamic motion recognition tasks, where substantial data transfers are necessitated by the generation of extensive information and the need for frame-by-frame analysis. Herein, we present a novel approach for dynamic motion recognition, leveraging a spatial-temporal in-sensor computing system rooted in multiframe integration by employing photodetector. Our approach introduced a retinomorphic MoS_(2) photodetector device for motion detection and analysis. The device enables the generation of informative final states, nonlinearly embedding both past and present frames. Subsequent multiply-accumulate (MAC) calculations are efficiently performed as the classifier. When evaluating our devices for target detection and direction classification, we achieved an impressive recognition accuracy of 93.5%. By eliminating the need for frame-by-frame analysis, our system not only achieves high precision but also facilitates energy-efficient in-sensor computing.展开更多
The conventional computing architecture faces substantial chal-lenges,including high latency and energy consumption between memory and processing units.In response,in-memory computing has emerged as a promising altern...The conventional computing architecture faces substantial chal-lenges,including high latency and energy consumption between memory and processing units.In response,in-memory computing has emerged as a promising alternative architecture,enabling computing operations within memory arrays to overcome these limitations.Memristive devices have gained significant attention as key components for in-memory computing due to their high-density arrays,rapid response times,and ability to emulate biological synapses.Among these devices,two-dimensional(2D)material-based memristor and memtransistor arrays have emerged as particularly promising candidates for next-generation in-memory computing,thanks to their exceptional performance driven by the unique properties of 2D materials,such as layered structures,mechanical flexibility,and the capability to form heterojunctions.This review delves into the state-of-the-art research on 2D material-based memristive arrays,encompassing critical aspects such as material selection,device perfor-mance metrics,array structures,and potential applications.Furthermore,it provides a comprehensive overview of the current challenges and limitations associated with these arrays,along with potential solutions.The primary objective of this review is to serve as a significant milestone in realizing next-generation in-memory computing utilizing 2D materials and bridge the gap from single-device characterization to array-level and system-level implementations of neuromorphic computing,leveraging the potential of 2D material-based memristive devices.展开更多
Cloud computing provides a diverse and adaptable resource pool over the internet,allowing users to tap into various resources as needed.It has been seen as a robust solution to relevant challenges.A significant delay ...Cloud computing provides a diverse and adaptable resource pool over the internet,allowing users to tap into various resources as needed.It has been seen as a robust solution to relevant challenges.A significant delay can hamper the performance of IoT-enabled cloud platforms.However,efficient task scheduling can lower the cloud infrastructure’s energy consumption,thus maximizing the service provider’s revenue by decreasing user job processing times.The proposed Modified Chimp-Whale Optimization Algorithm called Modified Chimp-Whale Optimization Algorithm(MCWOA),combines elements of the Chimp Optimization Algorithm(COA)and the Whale Optimization Algorithm(WOA).To enhance MCWOA’s identification precision,the Sobol sequence is used in the population initialization phase,ensuring an even distribution of the population across the solution space.Moreover,the traditional MCWOA’s local search capabilities are augmented by incorporating the whale optimization algorithm’s bubble-net hunting and random search mechanisms into MCWOA’s position-updating process.This study demonstrates the effectiveness of the proposed approach using a two-story rigid frame and a simply supported beam model.Simulated outcomes reveal that the new method outperforms the original MCWOA,especially in multi-damage detection scenarios.MCWOA excels in avoiding false positives and enhancing computational speed,making it an optimal choice for structural damage detection.The efficiency of the proposed MCWOA is assessed against metrics such as energy usage,computational expense,task duration,and delay.The simulated data indicates that the new MCWOA outpaces other methods across all metrics.The study also references the Whale Optimization Algorithm(WOA),Chimp Algorithm(CA),Ant Lion Optimizer(ALO),Genetic Algorithm(GA)and Grey Wolf Optimizer(GWO).展开更多
In the past decade,there has been tremendous progress in integrating chalcogenide phase-change materials(PCMs)on the silicon photonic platform for non-volatile memory to neuromorphic in-memory computing applications.I...In the past decade,there has been tremendous progress in integrating chalcogenide phase-change materials(PCMs)on the silicon photonic platform for non-volatile memory to neuromorphic in-memory computing applications.In particular,these non von Neumann computational elements and systems benefit from mass manufacturing of silicon photonic integrated circuits(PICs)on 8-inch wafers using a 130 nm complementary metal-oxide semiconductor line.Chip manufacturing based on deep-ultraviolet lithography and electron-beam lithography enables rapid prototyping of PICs,which can be integrated with high-quality PCMs based on the wafer-scale sputtering technique as a back-end-of-line process.In this article,we present an overview of recent advances in waveguide integrated PCM memory cells,functional devices,and neuromorphic systems,with an emphasis on fabrication and integration processes to attain state-of-the-art device performance.After a short overview of PCM based photonic devices,we discuss the materials properties of the functional layer as well as the progress on the light guiding layer,namely,the silicon and germanium waveguide platforms.Next,we discuss the cleanroom fabrication flow of waveguide devices integrated with thin films and nanowires,silicon waveguides and plasmonic microheaters for the electrothermal switching of PCMs and mixed-mode operation.Finally,the fabrication of photonic and photonic–electronic neuromorphic computing systems is reviewed.These systems consist of arrays of PCM memory elements for associative learning,matrix-vector multiplication,and pattern recognition.With large-scale integration,the neuromorphic photonic computing paradigm holds the promise to outperform digital electronic accelerators by taking the advantages of ultra-high bandwidth,high speed,and energy-efficient operation in running machine learning algorithms.展开更多
Brain science accelerates the study of intelligence and behavior,contributes fundamental insights into human cognition,and offers prospective treatments for brain disease.Faced with the challenges posed by imaging tec...Brain science accelerates the study of intelligence and behavior,contributes fundamental insights into human cognition,and offers prospective treatments for brain disease.Faced with the challenges posed by imaging technologies and deep learning computational models,big data and high-performance computing(HPC)play essential roles in studying brain function,brain diseases,and large-scale brain models or connectomes.We review the driving forces behind big data and HPC methods applied to brain science,including deep learning,powerful data analysis capabilities,and computational performance solutions,each of which can be used to improve diagnostic accuracy and research output.This work reinforces predictions that big data and HPC will continue to improve brain science by making ultrahigh-performance analysis possible,by improving data standardization and sharing,and by providing new neuromorphic insights.展开更多
Wide-area high-performance computing is widely used for large-scale parallel computing applications owing to its high computing and storage resources.However,the geographical distribution of computing and storage reso...Wide-area high-performance computing is widely used for large-scale parallel computing applications owing to its high computing and storage resources.However,the geographical distribution of computing and storage resources makes efficient task distribution and data placement more challenging.To achieve a higher system performance,this study proposes a two-level global collaborative scheduling strategy for wide-area high-performance computing environments.The collaborative scheduling strategy integrates lightweight solution selection,redundant data placement and task stealing mechanisms,optimizing task distribution and data placement to achieve efficient computing in wide-area environments.The experimental results indicate that compared with the state-of-the-art collaborative scheduling algorithm HPS+,the proposed scheduling strategy reduces the makespan by 23.24%,improves computing and storage resource utilization by 8.28%and 21.73%respectively,and achieves similar global data migration costs.展开更多
Security issues in cloud networks and edge computing have become very common. This research focuses on analyzing such issues and developing the best solutions. A detailed literature review has been conducted in this r...Security issues in cloud networks and edge computing have become very common. This research focuses on analyzing such issues and developing the best solutions. A detailed literature review has been conducted in this regard. The findings have shown that many challenges are linked to edge computing, such as privacy concerns, security breaches, high costs, low efficiency, etc. Therefore, there is a need to implement proper security measures to overcome these issues. Using emerging trends, like machine learning, encryption, artificial intelligence, real-time monitoring, etc., can help mitigate security issues. They can also develop a secure and safe future in cloud computing. It was concluded that the security implications of edge computing can easily be covered with the help of new technologies and techniques.展开更多
The publication of Tsinghua Science and Technology was started in 1996.Since then,it has been an international academic journal sponsored by Tsinghua University and published bimonthly.This journal aims at presenting ...The publication of Tsinghua Science and Technology was started in 1996.Since then,it has been an international academic journal sponsored by Tsinghua University and published bimonthly.This journal aims at presenting the state-of-the-art scientific achievements in computer science and other IT fields.展开更多
In this paper,we consider mobile edge computing(MEC)networks against proactive eavesdropping.To maximize the transmission rate,IRS assisted UAV communications are applied.We take the joint design of the trajectory of ...In this paper,we consider mobile edge computing(MEC)networks against proactive eavesdropping.To maximize the transmission rate,IRS assisted UAV communications are applied.We take the joint design of the trajectory of UAV,the transmitting beamforming of users,and the phase shift matrix of IRS.The original problem is strong non-convex and difficult to solve.We first propose two basic modes of the proactive eavesdropper,and obtain the closed-form solution for the boundary conditions of the two modes.Then we transform the original problem into an equivalent one and propose an alternating optimization(AO)based method to obtain a local optimal solution.The convergence of the algorithm is illustrated by numerical results.Further,we propose a zero forcing(ZF)based method as sub-optimal solution,and the simulation section shows that the proposed two schemes could obtain better performance compared with traditional schemes.展开更多
With the rapid development of cloud computing,edge computing,and smart devices,computing power resources indicate a trend of ubiquitous deployment.The traditional network architecture cannot efficiently leverage these...With the rapid development of cloud computing,edge computing,and smart devices,computing power resources indicate a trend of ubiquitous deployment.The traditional network architecture cannot efficiently leverage these distributed computing power resources due to computing power island effect.To overcome these problems and improve network efficiency,a new network computing paradigm is proposed,i.e.,Computing Power Network(CPN).Computing power network can connect ubiquitous and heterogenous computing power resources through networking to realize computing power scheduling flexibly.In this survey,we make an exhaustive review on the state-of-the-art research efforts on computing power network.We first give an overview of computing power network,including definition,architecture,and advantages.Next,a comprehensive elaboration of issues on computing power modeling,information awareness and announcement,resource allocation,network forwarding,computing power transaction platform and resource orchestration platform is presented.The computing power network testbed is built and evaluated.The applications and use cases in computing power network are discussed.Then,the key enabling technologies for computing power network are introduced.Finally,open challenges and future research directions are presented as well.展开更多
Serverless computing is a promising paradigm in cloud computing that greatly simplifies cloud programming.With serverless computing,developers only provide function code to serverless platform,and these functions are ...Serverless computing is a promising paradigm in cloud computing that greatly simplifies cloud programming.With serverless computing,developers only provide function code to serverless platform,and these functions are invoked by its driven events.Nonetheless,security threats in serverless computing such as vulnerability-based security threats have become the pain point hindering its wide adoption.The ideas in proactive defense such as redundancy,diversity and dynamic provide promising approaches to protect against cyberattacks.However,these security technologies are mostly applied to serverless platform based on“stacked”mode,as they are designed independent with serverless computing.The lack of security consideration in the initial design makes it especially challenging to achieve the all life cycle protection for serverless application with limited cost.In this paper,we present ATSSC,a proactive defense enabled attack tolerant serverless platform.ATSSC integrates the characteristic of redundancy,diversity and dynamic into serverless seamless to achieve high-level security and efficiency.Specifically,ATSSC constructs multiple diverse function replicas to process the driven events and performs cross-validation to verify the results.In order to create diverse function replicas,both software diversity and environment diversity are adopted.Furthermore,a dynamic function refresh strategy is proposed to keep the clean state of serverless functions.We implement ATSSC based on Kubernetes and Knative.Analysis and experimental results demonstrate that ATSSC can effectively protect serverless computing against cyberattacks with acceptable costs.展开更多
In a network environment composed of different types of computing centers that can be divided into different layers(clod,edge layer,and others),the interconnection between them offers the possibility of peer-to-peer t...In a network environment composed of different types of computing centers that can be divided into different layers(clod,edge layer,and others),the interconnection between them offers the possibility of peer-to-peer task offloading.For many resource-constrained devices,the computation of many types of tasks is not feasible because they cannot support such computations as they do not have enough available memory and processing capacity.In this scenario,it is worth considering transferring these tasks to resource-rich platforms,such as Edge Data Centers or remote cloud servers.For different reasons,it is more exciting and appropriate to download various tasks to specific download destinations depending on the properties and state of the environment and the nature of the functions.At the same time,establishing an optimal offloading policy,which ensures that all tasks are executed within the required latency and avoids excessive workload on specific computing centers is not easy.This study presents two alternatives to solve the offloading decision paradigm by introducing two well-known algorithms,Graph Neural Networks(GNN)and Deep Q-Network(DQN).It applies the alternatives on a well-known Edge Computing simulator called PureEdgeSimand compares them with the two defaultmethods,Trade-Off and Round Robin.Experiments showed that variants offer a slight improvement in task success rate and workload distribution.In terms of energy efficiency,they provided similar results.Finally,the success rates of different computing centers are tested,and the lack of capacity of remote cloud servers to respond to applications in real-time is demonstrated.These novel ways of finding a download strategy in a local networking environment are unique as they emulate the state and structure of the environment innovatively,considering the quality of its connections and constant updates.The download score defined in this research is a crucial feature for determining the quality of a download path in the GNN training process and has not previously been proposed.Simultaneously,the suitability of Reinforcement Learning(RL)techniques is demonstrated due to the dynamism of the network environment,considering all the key factors that affect the decision to offload a given task,including the actual state of all devices.展开更多
文摘This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology.
文摘Milling Process Simulation is one of the important re search areas in manufacturing science. For the purpose of improving the prec ision of simulation and extending its usability, numerical algorithm is more and more used in the milling modeling areas. But simulative efficiency is decreasin g with increase of its complexity. As a result, application of the method is lim ited. Aimed at above question, high-efficient algorithm for milling process sim ulation is studied. It is important for milling process simulation’s applicatio n. Parallel computing is widely used to solve the large-scale computation question s. Its advantages include system flexibility, robust, high-efficient computing capability and high ratio of performance to price. With the development of compu ter network, utilizing the computing resource in the Internet, a virtual computi ng environment with powerful computing capability can be consisted by microc omputers, and the difficulty of building hardware environment which is used to s upport parallel computing is reduced. How to use network technology and parallel algorithm to improve simulative effic iency for milling forces simulation is investigated in the paper. In order to pr edict milling forces, a simplified local milling forces model is used in the pap er. End milling cutter is assumed to be divided by r number of differential elem ents along the axial direction of the cutter. For a given time, the total cuttin g forces can be obtained by summarizing the resultant cutting force produced by each differential cutter disc. Divide the whole simulative time into some segmen ts, send these program’s segments to microcomputers in the Internet and obtain the result of the program’s segments, all of the result of program’s segments a re composed the final result. For implementing the algorithm, a distributed Parallel computing framework is de signed in the paper. In the framework, web server plays a role of controller. Us ing Java RMI(remote method interface), the computing processes in computing serv er are called by web server. There are lots of control processes in web server a nd control the computing servers. The codes of simulative algorithm can be dynam ic sent to the computing servers, and milling forces at the different time are c omputed through utilizing the local computer’s resource. The results that are ca lculated by every computing servers are sent to the web server, and composed the final result. The framework can be used by different simulative algorithm. Comp ared with the algorithm running single machine, the efficiency of provided algor ithm is higher than that of single machine.
基金National Natural Science Foundation of China,Grant/Award Number:51979187。
文摘Parallel computing assigns the computing model to different processors on different devices and implements it simultaneously.Accordingly,it has broad applications in the numerical simulation of geotechnical engineering and underground engineering,of which models are always large-scale.With parallel computing,the computing time or the memory requirements will be reduced by splitting the original domain of the numerical model into many subdomains,which is thus named as the domain decomposition method.In this study,a cubic and equal volume domain decomposition strategy was utilized to realize the parallel computing on the distributed memory system of four-dimensional lattice spring model(4D-LSM)based on the message passing interface.With a more efficient communication strategy introduced,this study aimed at operating an one-billion-particle model on a supercomputer platform.The preprocessing procedure of the parallelized 4D-LSM was restructured and the particle generation strategy suitable for the supercomputer platform was employed to minimize the time consumption in preprocessing and calculation.On this basis,numerical calculations were performed on TianHe-3 prototype E class supercomputer at the National Supercomputer Center in Tianjin.Two fieldscale three-dimensional blasting wave propagation models were carried out,of which the numerical results verify the computing power and the advantage of the parallelized 4D-LSM in the simulation of large-scale three-dimension models.Subsequently,the time complexity and spatial complexity of 4D-LSM and other particle discrete element methods were analyzed.
文摘High-performance computing(HPC)refers to the ability to process data and perform complex calculations at high speeds.It is one of the most essential tools fueling the advancement of science and technology.
基金funded by the Defense Advanced Research Project Agency(DARPA)Low Temperature Logic Technology(LTLT)program.
文摘Low temperature complementary metal oxide semiconductor(CMOS)or cryogenic CMOS is a promising avenue for the continuation of Moore’s law while serving the needs of high performance computing.With temperature as a control“knob”to steepen the subthreshold slope behavior of CMOS devices,the supply voltage of operation can be reduced with no impact on operating speed.With the optimal threshold voltage engineering,the device ON current can be further enhanced,translating to higher performance.In this article,the experimentally calibrated data was adopted to tune the threshold voltage and investigated the power performance area of cryogenic CMOS at device,circuit and system level.We also presented results from measurement and analysis of functional memory chips fabricated in 28 nm bulk CMOS and 22 nm fully depleted silicon on insulator(FDSOI)operating at cryogenic temperature.Finally,the challenges and opportunities in the further development and deployment of such systems were discussed.
基金supported in part by the U.S.National Science Foundation under Grant Nos.CCF-1551511 and CNS-1551262.
文摘Modern computer systems are increasingly bounded by the available or permissible power at multiple layers from individual components to data centers.To cope with this reality,it is necessary to understand how power bounds im-pact performance,especially for systems built from high-end nodes,each consisting of multiple power hungry components.Because placing an inappropriate power bound on a node or a component can lead to severe performance loss,coordinat-ing power allocation among nodes and components is mandatory to achieve desired performance given a total power bud-get.In this article,we describe the paradigm of power bounded high-performance computing,which considers coordinated power bound assignment to be a key factor in computer system performance analysis and optimization.We apply this paradigm to the problem of power coordination across multiple layers for both CPU and GPU computing.Using several case studies,we demonstrate how the principles of balanced power coordination can be applied and adapted to the inter-play of workloads,hardware technology,and the available total power for performance improvement.
基金supported by the National Natural the Science Foundation of China(51971042,51901028)the Chongqing Academician Special Fund(cstc2020yszxjcyj X0001)+1 种基金the China Scholarship Council(CSC)Norwegian University of Science and Technology(NTNU)for their financial and technical support。
文摘Magnesium(Mg),being the lightest structural metal,holds immense potential for widespread applications in various fields.The development of high-performance and cost-effective Mg alloys is crucial to further advancing their commercial utilization.With the rapid advancement of machine learning(ML)technology in recent years,the“data-driven''approach for alloy design has provided new perspectives and opportunities for enhancing the performance of Mg alloys.This paper introduces a novel regression-based Bayesian optimization active learning model(RBOALM)for the development of high-performance Mg-Mn-based wrought alloys.RBOALM employs active learning to automatically explore optimal alloy compositions and process parameters within predefined ranges,facilitating the discovery of superior alloy combinations.This model further integrates pre-established regression models as surrogate functions in Bayesian optimization,significantly enhancing the precision of the design process.Leveraging RBOALM,several new high-performance alloys have been successfully designed and prepared.Notably,after mechanical property testing of the designed alloys,the Mg-2.1Zn-2.0Mn-0.5Sn-0.1Ca alloy demonstrates exceptional mechanical properties,including an ultimate tensile strength of 406 MPa,a yield strength of 287 MPa,and a 23%fracture elongation.Furthermore,the Mg-2.7Mn-0.5Al-0.1Ca alloy exhibits an ultimate tensile strength of 211 MPa,coupled with a remarkable 41%fracture elongation.
基金supported in part by the National Natural Science Foundation of China under Grant 62171465,62072303,62272223,U22A2031。
文摘By pushing computation,cache,and network control to the edge,mobile edge computing(MEC)is expected to play a leading role in fifth generation(5G)and future sixth generation(6G).Nevertheless,facing ubiquitous fast-growing computational demands,it is impossible for a single MEC paradigm to effectively support high-quality intelligent services at end user equipments(UEs).To address this issue,we propose an air-ground collaborative MEC(AGCMEC)architecture in this article.The proposed AGCMEC integrates all potentially available MEC servers within air and ground in the envisioned 6G,by a variety of collaborative ways to provide computation services at their best for UEs.Firstly,we introduce the AGC-MEC architecture and elaborate three typical use cases.Then,we discuss four main challenges in the AGC-MEC as well as their potential solutions.Next,we conduct a case study of collaborative service placement for AGC-MEC to validate the effectiveness of the proposed collaborative service placement strategy.Finally,we highlight several potential research directions of the AGC-MEC.
基金supported by the National Natural Science Foundation of China (52322210, 52172144, 22375069, 21825103, and U21A2069)National Key R&D Program of China (2021YFA1200501)+2 种基金Shenzhen Science and Technology Program (JCYJ20220818102215033, JCYJ20200109105422876)the Innovation Project of Optics Valley Laboratory (OVL2023PY007)Science and Technology Commission of Shanghai Municipality (21YF1454700)。
文摘The utilization of processing capabilities within the detector holds significant promise in addressing energy consumption and latency challenges. Especially in the context of dynamic motion recognition tasks, where substantial data transfers are necessitated by the generation of extensive information and the need for frame-by-frame analysis. Herein, we present a novel approach for dynamic motion recognition, leveraging a spatial-temporal in-sensor computing system rooted in multiframe integration by employing photodetector. Our approach introduced a retinomorphic MoS_(2) photodetector device for motion detection and analysis. The device enables the generation of informative final states, nonlinearly embedding both past and present frames. Subsequent multiply-accumulate (MAC) calculations are efficiently performed as the classifier. When evaluating our devices for target detection and direction classification, we achieved an impressive recognition accuracy of 93.5%. By eliminating the need for frame-by-frame analysis, our system not only achieves high precision but also facilitates energy-efficient in-sensor computing.
基金This work was supported by the National Research Foundation,Singapore under Award No.NRF-CRP24-2020-0002.
文摘The conventional computing architecture faces substantial chal-lenges,including high latency and energy consumption between memory and processing units.In response,in-memory computing has emerged as a promising alternative architecture,enabling computing operations within memory arrays to overcome these limitations.Memristive devices have gained significant attention as key components for in-memory computing due to their high-density arrays,rapid response times,and ability to emulate biological synapses.Among these devices,two-dimensional(2D)material-based memristor and memtransistor arrays have emerged as particularly promising candidates for next-generation in-memory computing,thanks to their exceptional performance driven by the unique properties of 2D materials,such as layered structures,mechanical flexibility,and the capability to form heterojunctions.This review delves into the state-of-the-art research on 2D material-based memristive arrays,encompassing critical aspects such as material selection,device perfor-mance metrics,array structures,and potential applications.Furthermore,it provides a comprehensive overview of the current challenges and limitations associated with these arrays,along with potential solutions.The primary objective of this review is to serve as a significant milestone in realizing next-generation in-memory computing utilizing 2D materials and bridge the gap from single-device characterization to array-level and system-level implementations of neuromorphic computing,leveraging the potential of 2D material-based memristive devices.
文摘Cloud computing provides a diverse and adaptable resource pool over the internet,allowing users to tap into various resources as needed.It has been seen as a robust solution to relevant challenges.A significant delay can hamper the performance of IoT-enabled cloud platforms.However,efficient task scheduling can lower the cloud infrastructure’s energy consumption,thus maximizing the service provider’s revenue by decreasing user job processing times.The proposed Modified Chimp-Whale Optimization Algorithm called Modified Chimp-Whale Optimization Algorithm(MCWOA),combines elements of the Chimp Optimization Algorithm(COA)and the Whale Optimization Algorithm(WOA).To enhance MCWOA’s identification precision,the Sobol sequence is used in the population initialization phase,ensuring an even distribution of the population across the solution space.Moreover,the traditional MCWOA’s local search capabilities are augmented by incorporating the whale optimization algorithm’s bubble-net hunting and random search mechanisms into MCWOA’s position-updating process.This study demonstrates the effectiveness of the proposed approach using a two-story rigid frame and a simply supported beam model.Simulated outcomes reveal that the new method outperforms the original MCWOA,especially in multi-damage detection scenarios.MCWOA excels in avoiding false positives and enhancing computational speed,making it an optimal choice for structural damage detection.The efficiency of the proposed MCWOA is assessed against metrics such as energy usage,computational expense,task duration,and delay.The simulated data indicates that the new MCWOA outpaces other methods across all metrics.The study also references the Whale Optimization Algorithm(WOA),Chimp Algorithm(CA),Ant Lion Optimizer(ALO),Genetic Algorithm(GA)and Grey Wolf Optimizer(GWO).
基金the support of the National Natural Science Foundation of China(Grant No.62204201)。
文摘In the past decade,there has been tremendous progress in integrating chalcogenide phase-change materials(PCMs)on the silicon photonic platform for non-volatile memory to neuromorphic in-memory computing applications.In particular,these non von Neumann computational elements and systems benefit from mass manufacturing of silicon photonic integrated circuits(PICs)on 8-inch wafers using a 130 nm complementary metal-oxide semiconductor line.Chip manufacturing based on deep-ultraviolet lithography and electron-beam lithography enables rapid prototyping of PICs,which can be integrated with high-quality PCMs based on the wafer-scale sputtering technique as a back-end-of-line process.In this article,we present an overview of recent advances in waveguide integrated PCM memory cells,functional devices,and neuromorphic systems,with an emphasis on fabrication and integration processes to attain state-of-the-art device performance.After a short overview of PCM based photonic devices,we discuss the materials properties of the functional layer as well as the progress on the light guiding layer,namely,the silicon and germanium waveguide platforms.Next,we discuss the cleanroom fabrication flow of waveguide devices integrated with thin films and nanowires,silicon waveguides and plasmonic microheaters for the electrothermal switching of PCMs and mixed-mode operation.Finally,the fabrication of photonic and photonic–electronic neuromorphic computing systems is reviewed.These systems consist of arrays of PCM memory elements for associative learning,matrix-vector multiplication,and pattern recognition.With large-scale integration,the neuromorphic photonic computing paradigm holds the promise to outperform digital electronic accelerators by taking the advantages of ultra-high bandwidth,high speed,and energy-efficient operation in running machine learning algorithms.
基金supported by the National Natural Science Foundation of China(Grant No.31771466)the National Key R&D Program of China(Grant Nos.2018YFB0203903,2016YFC0503607,and 2016YFB0200300)+3 种基金the Transformation Project in Scientific and Technological Achievements of Qinghai,China(Grant No.2016-SF-127)the Special Project of Informatization of Chinese Academy of Sciences,China(Grant No.XXH13504-08)the Strategic Pilot Science and Technology Project of Chinese Academy of Sciences,China(Grant No.XDA12010000)the 100-Talents Program of Chinese Academy of Sciences,China(awarded to BN)
文摘Brain science accelerates the study of intelligence and behavior,contributes fundamental insights into human cognition,and offers prospective treatments for brain disease.Faced with the challenges posed by imaging technologies and deep learning computational models,big data and high-performance computing(HPC)play essential roles in studying brain function,brain diseases,and large-scale brain models or connectomes.We review the driving forces behind big data and HPC methods applied to brain science,including deep learning,powerful data analysis capabilities,and computational performance solutions,each of which can be used to improve diagnostic accuracy and research output.This work reinforces predictions that big data and HPC will continue to improve brain science by making ultrahigh-performance analysis possible,by improving data standardization and sharing,and by providing new neuromorphic insights.
基金This work was supported by the National key R&D Program of China(2018YFB0203901)the National Natural Science Foundation of China under(Grant No.61772053)the fund of the State Key Laboratory of Software Development Environment(SKLSDE-2020ZX15).
文摘Wide-area high-performance computing is widely used for large-scale parallel computing applications owing to its high computing and storage resources.However,the geographical distribution of computing and storage resources makes efficient task distribution and data placement more challenging.To achieve a higher system performance,this study proposes a two-level global collaborative scheduling strategy for wide-area high-performance computing environments.The collaborative scheduling strategy integrates lightweight solution selection,redundant data placement and task stealing mechanisms,optimizing task distribution and data placement to achieve efficient computing in wide-area environments.The experimental results indicate that compared with the state-of-the-art collaborative scheduling algorithm HPS+,the proposed scheduling strategy reduces the makespan by 23.24%,improves computing and storage resource utilization by 8.28%and 21.73%respectively,and achieves similar global data migration costs.
文摘Security issues in cloud networks and edge computing have become very common. This research focuses on analyzing such issues and developing the best solutions. A detailed literature review has been conducted in this regard. The findings have shown that many challenges are linked to edge computing, such as privacy concerns, security breaches, high costs, low efficiency, etc. Therefore, there is a need to implement proper security measures to overcome these issues. Using emerging trends, like machine learning, encryption, artificial intelligence, real-time monitoring, etc., can help mitigate security issues. They can also develop a secure and safe future in cloud computing. It was concluded that the security implications of edge computing can easily be covered with the help of new technologies and techniques.
文摘The publication of Tsinghua Science and Technology was started in 1996.Since then,it has been an international academic journal sponsored by Tsinghua University and published bimonthly.This journal aims at presenting the state-of-the-art scientific achievements in computer science and other IT fields.
基金This work was supported by the Key Scientific and Technological Project of Henan Province(Grant Number 222102210212)Doctoral Research Start Project of Henan Institute of Technology(Grant Number KQ2005)Key Research Projects of Colleges and Universities in Henan Province(Grant Number 23B510006).
文摘In this paper,we consider mobile edge computing(MEC)networks against proactive eavesdropping.To maximize the transmission rate,IRS assisted UAV communications are applied.We take the joint design of the trajectory of UAV,the transmitting beamforming of users,and the phase shift matrix of IRS.The original problem is strong non-convex and difficult to solve.We first propose two basic modes of the proactive eavesdropper,and obtain the closed-form solution for the boundary conditions of the two modes.Then we transform the original problem into an equivalent one and propose an alternating optimization(AO)based method to obtain a local optimal solution.The convergence of the algorithm is illustrated by numerical results.Further,we propose a zero forcing(ZF)based method as sub-optimal solution,and the simulation section shows that the proposed two schemes could obtain better performance compared with traditional schemes.
基金supported by the National Science Foundation of China under Grant 62271062 and 62071063by the Zhijiang Laboratory Open Project Fund 2020LCOAB01。
文摘With the rapid development of cloud computing,edge computing,and smart devices,computing power resources indicate a trend of ubiquitous deployment.The traditional network architecture cannot efficiently leverage these distributed computing power resources due to computing power island effect.To overcome these problems and improve network efficiency,a new network computing paradigm is proposed,i.e.,Computing Power Network(CPN).Computing power network can connect ubiquitous and heterogenous computing power resources through networking to realize computing power scheduling flexibly.In this survey,we make an exhaustive review on the state-of-the-art research efforts on computing power network.We first give an overview of computing power network,including definition,architecture,and advantages.Next,a comprehensive elaboration of issues on computing power modeling,information awareness and announcement,resource allocation,network forwarding,computing power transaction platform and resource orchestration platform is presented.The computing power network testbed is built and evaluated.The applications and use cases in computing power network are discussed.Then,the key enabling technologies for computing power network are introduced.Finally,open challenges and future research directions are presented as well.
基金supported by the Foundation for Innovative Research Groups of the National Natural Science Foundation of China under Grant No.61521003the National Natural Science Foundation of China under Grant No.62072467 and 62002383.
文摘Serverless computing is a promising paradigm in cloud computing that greatly simplifies cloud programming.With serverless computing,developers only provide function code to serverless platform,and these functions are invoked by its driven events.Nonetheless,security threats in serverless computing such as vulnerability-based security threats have become the pain point hindering its wide adoption.The ideas in proactive defense such as redundancy,diversity and dynamic provide promising approaches to protect against cyberattacks.However,these security technologies are mostly applied to serverless platform based on“stacked”mode,as they are designed independent with serverless computing.The lack of security consideration in the initial design makes it especially challenging to achieve the all life cycle protection for serverless application with limited cost.In this paper,we present ATSSC,a proactive defense enabled attack tolerant serverless platform.ATSSC integrates the characteristic of redundancy,diversity and dynamic into serverless seamless to achieve high-level security and efficiency.Specifically,ATSSC constructs multiple diverse function replicas to process the driven events and performs cross-validation to verify the results.In order to create diverse function replicas,both software diversity and environment diversity are adopted.Furthermore,a dynamic function refresh strategy is proposed to keep the clean state of serverless functions.We implement ATSSC based on Kubernetes and Knative.Analysis and experimental results demonstrate that ATSSC can effectively protect serverless computing against cyberattacks with acceptable costs.
基金funding from TECNALIA,Basque Research and Technology Alliance(BRTA)supported by the project aOptimization of Deep Learning algorithms for Edge IoT devices for sensorization and control in Buildings and Infrastructures(EMBED)funded by the Gipuzkoa Provincial Council and approved under the 2023 call of the Guipuzcoan Network of Science,Technology and Innovation Program with File Number 2023-CIEN-000051-01.
文摘In a network environment composed of different types of computing centers that can be divided into different layers(clod,edge layer,and others),the interconnection between them offers the possibility of peer-to-peer task offloading.For many resource-constrained devices,the computation of many types of tasks is not feasible because they cannot support such computations as they do not have enough available memory and processing capacity.In this scenario,it is worth considering transferring these tasks to resource-rich platforms,such as Edge Data Centers or remote cloud servers.For different reasons,it is more exciting and appropriate to download various tasks to specific download destinations depending on the properties and state of the environment and the nature of the functions.At the same time,establishing an optimal offloading policy,which ensures that all tasks are executed within the required latency and avoids excessive workload on specific computing centers is not easy.This study presents two alternatives to solve the offloading decision paradigm by introducing two well-known algorithms,Graph Neural Networks(GNN)and Deep Q-Network(DQN).It applies the alternatives on a well-known Edge Computing simulator called PureEdgeSimand compares them with the two defaultmethods,Trade-Off and Round Robin.Experiments showed that variants offer a slight improvement in task success rate and workload distribution.In terms of energy efficiency,they provided similar results.Finally,the success rates of different computing centers are tested,and the lack of capacity of remote cloud servers to respond to applications in real-time is demonstrated.These novel ways of finding a download strategy in a local networking environment are unique as they emulate the state and structure of the environment innovatively,considering the quality of its connections and constant updates.The download score defined in this research is a crucial feature for determining the quality of a download path in the GNN training process and has not previously been proposed.Simultaneously,the suitability of Reinforcement Learning(RL)techniques is demonstrated due to the dynamism of the network environment,considering all the key factors that affect the decision to offload a given task,including the actual state of all devices.