BACKGROUND Abdominal wall deficiencies or weakness are a common complication of tem-porary ostomies,and incisional hernias frequently develop after colostomy or ileostomy takedown.The use of synthetic meshes to reinfo...BACKGROUND Abdominal wall deficiencies or weakness are a common complication of tem-porary ostomies,and incisional hernias frequently develop after colostomy or ileostomy takedown.The use of synthetic meshes to reinforce the abdominal wall has reduced hernia occurrence.Biologic meshes have also been used to enhance healing,particularly in contaminated conditions.Reinforced tissue matrices(R-TMs),which include a biologic scaffold of native extracellular matrix and a syn-thetic component for added strength/durability,are designed to take advantage of aspects of both synthetic and biologic materials.To date,RTMs have not been reported to reinforce the abdominal wall following stoma reversal.METHODS Twenty-eight patients were selected with a parastomal and/or incisional hernia who had received a temporary ileostomy or colostomy for fecal diversion after rectal cancer treatment or trauma.Following hernia repair and proximal stoma closure,RTM(OviTex®1S permanent or OviTex®LPR)was placed to reinforce the abdominal wall using a laparoscopic,robotic,or open surgical approach.Post-operative follow-up was performed at 1 month and 1 year.Hernia recurrence was determined by physical examination and,when necessary,via computed tomo-graphy scan.Secondary endpoints included length of hospital stay,time to return to work,and hospital readmissions.Evaluated complications of the wound/repair site included presence of surgical site infection,seroma,hematoma,wound dehiscence,or fistula formation.RESULTS The observational study cohort included 16 male and 12 female patients with average age of 58.5 years±16.3 years and average body mass index of 26.2 kg/m^(2)±4.1 kg/m^(2).Patients presented with a parastomal hernia(75.0%),in-cisional hernia(14.3%),or combined parastomal/incisional hernia(10.7%).Using a laparoscopic(53.6%),robotic(35.7%),or open(10.7%)technique,RTMs(OviTex®LPR:82.1%,OviTex®1S:17.9%)were placed using sublay(82.1%)or intraperitoneal onlay(IPOM;17.9%)mesh positioning.At 1-month and 1-year follow-ups,there were no hernia recurrences(0%).Average hospital stays were 2.1 d±1.2 d and return to work occurred at 8.3 post-operative days±3.0 post-operative days.Three patients(10.7%)were readmitted before the 1-month follow up due to mesh infection and/or gastrointestinal issues.Fistula and mesh infection were observed in two patients each(7.1%),leading to partial mesh removal in one patient(3.6%).There were no complications between 1 month and 1 year(0%).CONCLUSION RTMs were used successfully to treat parastomal and incisional hernias at ileostomy reversal,with no hernia recurrences and favorable outcomes after 1-month and 1-year.展开更多
How to find an effective trading policy is still an open question mainly due to the nonlinear and non-stationary dynamics in a financial market.Deep reinforcement learning,which has recently been used to develop tradi...How to find an effective trading policy is still an open question mainly due to the nonlinear and non-stationary dynamics in a financial market.Deep reinforcement learning,which has recently been used to develop trading strategies by automatically extracting complex features from a large amount of data,is struggling to deal with fastchanging markets due to sample inefficiency.This paper applies the meta-reinforcement learning method to tackle the trading challenges faced by conventional reinforcement learning(RL)approaches in non-stationary markets for the first time.In our work,the history trading data is divided into multiple task data and for each of these data themarket condition is relatively stationary.Then amodel agnosticmeta-learning(MAML)-based tradingmethod involving a meta-learner and a normal learner is proposed.A trading policy is learned by the meta-learner across multiple task data,which is then fine-tuned by the normal learner through a small amount of data from a new market task before trading in it.To improve the adaptability of the MAML-based method,an ordered multiplestep updating mechanism is also proposed to explore the changing dynamic within a task market.The simulation results demonstrate that the proposed MAML-based trading methods can increase the annualized return rate by approximately 180%,200%,and 160%,increase the Sharpe ratio by 180%,90%,and 170%,and decrease the maximum drawdown by 30%,20%,and 40%,compared to the traditional RL approach in three stock index future markets,respectively.展开更多
Most reinforced concrete structures in seaside locations suffer from corrosion damage to the reinforcement, limiting their durability and necessitating costly repairs. To improve their performance and durability, we h...Most reinforced concrete structures in seaside locations suffer from corrosion damage to the reinforcement, limiting their durability and necessitating costly repairs. To improve their performance and durability, we have investigated in this paper Aloe vera extracts as a green corrosion inhibitor for reinforcing steel in NaCl environments. Using electrochemical methods (zero-intensity chronopotentiometry, Tafel lines and electrochemical impedance spectroscopy), this experimental work investigated the effect of these Aloe vera (AV) extracts on corrosion inhibition of concrete reinforcing bar (HA, diameter 12mm) immersed in a 0.5M NaCl solution. The results show that Aloe vera extracts have an average corrosion-inhibiting efficacy of around 86% at an optimum concentration of 20%.展开更多
Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metavers...Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metaverses. However, avatar tasks include a multitude of human-to-avatar and avatar-to-avatar interactive applications, e.g., augmented reality navigation,which consumes intensive computing resources. It is inefficient and impractical for vehicles to process avatar tasks locally. Fortunately, migrating avatar tasks to the nearest roadside units(RSU)or unmanned aerial vehicles(UAV) for execution is a promising solution to decrease computation overhead and reduce task processing latency, while the high mobility of vehicles brings challenges for vehicles to independently perform avatar migration decisions depending on current and future vehicle status. To address these challenges, in this paper, we propose a novel avatar task migration system based on multi-agent deep reinforcement learning(MADRL) to execute immersive vehicular avatar tasks dynamically. Specifically, we first formulate the problem of avatar task migration from vehicles to RSUs/UAVs as a partially observable Markov decision process that can be solved by MADRL algorithms. We then design the multi-agent proximal policy optimization(MAPPO) approach as the MADRL algorithm for the avatar task migration problem. To overcome slow convergence resulting from the curse of dimensionality and non-stationary issues caused by shared parameters in MAPPO, we further propose a transformer-based MAPPO approach via sequential decision-making models for the efficient representation of relationships among agents. Finally, to motivate terrestrial or non-terrestrial edge servers(e.g., RSUs or UAVs) to share computation resources and ensure traceability of the sharing records, we apply smart contracts and blockchain technologies to achieve secure sharing management. Numerical results demonstrate that the proposed approach outperforms the MAPPO approach by around 2% and effectively reduces approximately 20% of the latency of avatar task execution in UAV-assisted vehicular Metaverses.展开更多
Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and ...Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.展开更多
While autonomous vehicles are vital components of intelligent transportation systems,ensuring the trustworthiness of decision-making remains a substantial challenge in realizing autonomous driving.Therefore,we present...While autonomous vehicles are vital components of intelligent transportation systems,ensuring the trustworthiness of decision-making remains a substantial challenge in realizing autonomous driving.Therefore,we present a novel robust reinforcement learning approach with safety guarantees to attain trustworthy decision-making for autonomous vehicles.The proposed technique ensures decision trustworthiness in terms of policy robustness and collision safety.Specifically,an adversary model is learned online to simulate the worst-case uncertainty by approximating the optimal adversarial perturbations on the observed states and environmental dynamics.In addition,an adversarial robust actor-critic algorithm is developed to enable the agent to learn robust policies against perturbations in observations and dynamics.Moreover,we devise a safety mask to guarantee the collision safety of the autonomous driving agent during both the training and testing processes using an interpretable knowledge model known as the Responsibility-Sensitive Safety Model.Finally,the proposed approach is evaluated through both simulations and experiments.These results indicate that the autonomous driving agent can make trustworthy decisions and drastically reduce the number of collisions through robust safety policies.展开更多
The stability of the ancient flood control levees is mainly influenced by water level fluctuations, groundwater concentration and rainfalls. This paper takes the Lanxi ancient levee as a research object to study the e...The stability of the ancient flood control levees is mainly influenced by water level fluctuations, groundwater concentration and rainfalls. This paper takes the Lanxi ancient levee as a research object to study the evolution laws of its seepage, displacement and stability before and after reinforcement with the upside-down hanging wells and grouting curtain through numerical simulation methods combined with experiments and observations. The study results indicate that the filled soil is less affected by water level fluctuations and groundwater concentration after reinforcement. A high groundwater level is detrimental to the levee's long-term stability, and the drainage issues need to be fully considered. The deformation of the reinforced levee is effectively controlled since the fill deformation is mainly borne by the upside-down hanging wells. The safety factors of the levee before reinforcement vary significantly with the water level. The minimum value of the safety factors is 0.886 during the water level decreasing period, indicating a very high risk of the instability. While it reached 1.478 after reinforcement, the stability of the ancient levee is improved by a large margin.展开更多
To solve the problem of the low interference success rate of air defense missile radio fuzes due to the unified interference form of the traditional fuze interference system,an interference decision method based Q-lea...To solve the problem of the low interference success rate of air defense missile radio fuzes due to the unified interference form of the traditional fuze interference system,an interference decision method based Q-learning algorithm is proposed.First,dividing the distance between the missile and the target into multiple states to increase the quantity of state spaces.Second,a multidimensional motion space is utilized,and the search range of which changes with the distance of the projectile,to select parameters and minimize the amount of ineffective interference parameters.The interference effect is determined by detecting whether the fuze signal disappears.Finally,a weighted reward function is used to determine the reward value based on the range state,output power,and parameter quantity information of the interference form.The effectiveness of the proposed method in selecting the range of motion space parameters and designing the discrimination degree of the reward function has been verified through offline experiments involving full-range missile rendezvous.The optimal interference form for each distance state has been obtained.Compared with the single-interference decision method,the proposed decision method can effectively improve the success rate of interference.展开更多
The collapse pressure is a key parameter when RTPs are applied in harsh deep-water environments.To investigate the collapse of RTPs,numerical simulations and hydrostatic pressure tests are conducted.For the numerical ...The collapse pressure is a key parameter when RTPs are applied in harsh deep-water environments.To investigate the collapse of RTPs,numerical simulations and hydrostatic pressure tests are conducted.For the numerical simulations,the eigenvalue analysis and Riks analysis are combined,in which the Hashin failure criterion and fracture energy stiffness degradation model are used to simulate the progressive failure of composites,and the“infinite”boundary conditions are applied to eliminate the boundary effects.As for the hydrostatic pressure tests,RTP specimens were placed in a hydrostatic chamber after filled with water.It has been observed that the cross-section of the middle part collapses when it reaches the maximum pressure.The collapse pressure obtained from the numerical simulations agrees well with that in the experiment.Meanwhile,the applicability of NASA SP-8007 formula on the collapse pressure prediction was also discussed.It has a relatively greater difference because of the ignorance of the progressive failure of composites.For the parametric study,it is found that RTPs have much higher first-ply-failure pressure when the winding angles are between 50°and 70°.Besides,the effect of debonding and initial ovality,and the contribution of the liner and coating are also discussed.展开更多
This article investigates a multi-circular path-following formation control with reinforced transient profiles for nonholonomic vehicles connected by a digraph.A multi-circular formation controller endowed with the fe...This article investigates a multi-circular path-following formation control with reinforced transient profiles for nonholonomic vehicles connected by a digraph.A multi-circular formation controller endowed with the feature of spatial-temporal decoupling is devised for a group of vehicles guided by a virtual leader evolving along an implicit path,which allows for a circumnavigation on multiple circles with an anticipant angular spacing.In addition,notice that it typically imposes a stringent time constraint on time-sensitive enclosing scenarios,hence an improved prescribed performance control(IPPC)using novel tighter behavior boundaries is presented to enhance transient capabilities with an ensured appointed-time convergence free from any overshoots.The significant merits are that coordinated circumnavigation along different circles can be realized via executing geometric and dynamic assignments independently with modified transient profiles.Furthermore,all variables existing in the entire system are analyzed to be convergent.Simulation and experimental results are provided to validate the utility of suggested solution.展开更多
This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a larg...This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a large class of engineering systems,such as vehicular systems,robot manipulators and satellites.All these systems are often characterized by highly nonlinear characteristics,heavy modeling uncertainties and unknown perturbations,therefore,accurate-model-based nonlinear control approaches become unavailable.Motivated by the challenge,a reinforcement learning(RL)adaptive control methodology based on the actor-critic framework is investigated to compensate the uncertain mechanical dynamics.The approximation inaccuracies caused by RL and the exogenous unknown disturbances are circumvented via a continuous robust integral of the sign of the error(RISE)control approach.Different from a classical RISE control law,a tanh(·)function is utilized instead of a sign(·)function to acquire a more smooth control signal.The developed controller requires very little prior knowledge of the dynamic model,is robust to unknown dynamics and exogenous disturbances,and can achieve asymptotic output tracking.Eventually,co-simulations through ADAMS and MATLAB/Simulink on a three degrees-of-freedom(3-DOF)manipulator and experiments on a real-time electromechanical servo system are performed to verify the performance of the proposed approach.展开更多
This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with u...This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise.The attack-defense engagement scenario is modeled as a partially observable Markov decision process(POMDP).Given the benefits of recurrent neural networks(RNNs)in processing sequence information,an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs.The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency.During training,the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent.The training curves show that the proposed RRTD3 successfully enhances data efficiency,training speed,and training stability.The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws.展开更多
The Multi-access Edge Cloud(MEC) networks extend cloud computing services and capabilities to the edge of the networks. By bringing computation and storage capabilities closer to end-users and connected devices, MEC n...The Multi-access Edge Cloud(MEC) networks extend cloud computing services and capabilities to the edge of the networks. By bringing computation and storage capabilities closer to end-users and connected devices, MEC networks can support a wide range of applications. MEC networks can also leverage various types of resources, including computation resources, network resources, radio resources,and location-based resources, to provide multidimensional resources for intelligent applications in 5/6G.However, tasks generated by users often consist of multiple subtasks that require different types of resources. It is a challenging problem to offload multiresource task requests to the edge cloud aiming at maximizing benefits due to the heterogeneity of resources provided by devices. To address this issue,we mathematically model the task requests with multiple subtasks. Then, the problem of task offloading of multi-resource task requests is proved to be NP-hard. Furthermore, we propose a novel Dual-Agent Deep Reinforcement Learning algorithm with Node First and Link features(NF_L_DA_DRL) based on the policy network, to optimize the benefits generated by offloading multi-resource task requests in MEC networks. Finally, simulation results show that the proposed algorithm can effectively improve the benefit of task offloading with higher resource utilization compared with baseline algorithms.展开更多
This survey paper provides a review and perspective on intermediate and advanced reinforcement learning(RL)techniques in process industries. It offers a holistic approach by covering all levels of the process control ...This survey paper provides a review and perspective on intermediate and advanced reinforcement learning(RL)techniques in process industries. It offers a holistic approach by covering all levels of the process control hierarchy. The survey paper presents a comprehensive overview of RL algorithms,including fundamental concepts like Markov decision processes and different approaches to RL, such as value-based, policy-based, and actor-critic methods, while also discussing the relationship between classical control and RL. It further reviews the wide-ranging applications of RL in process industries, such as soft sensors, low-level control, high-level control, distributed process control, fault detection and fault tolerant control, optimization,planning, scheduling, and supply chain. The survey paper discusses the limitations and advantages, trends and new applications, and opportunities and future prospects for RL in process industries. Moreover, it highlights the need for a holistic approach in complex systems due to the growing importance of digitalization in the process industries.展开更多
Tunnel construction is susceptible to accidents such as loosening, deformation, collapse, and water inrush, especiallyunder complex geological conditions like dense fault areas. These accidents can cause instability a...Tunnel construction is susceptible to accidents such as loosening, deformation, collapse, and water inrush, especiallyunder complex geological conditions like dense fault areas. These accidents can cause instability and damageto the tunnel. As a result, it is essential to conduct research on tunnel construction and grouting reinforcementtechnology in fault fracture zones to address these issues and ensure the safety of tunnel excavation projects. Thisstudy utilized the Xianglushan cross-fault tunnel to conduct a comprehensive analysis on the construction, support,and reinforcement of a tunnel crossing a fault fracture zone using the three-dimensional finite element numericalmethod. The study yielded the following research conclusions: The excavation conditions of the cross-fault tunnelarray were analyzed to determine the optimal construction method for excavation while controlling deformationand stress in the surrounding rock. The middle partition method (CD method) was found to be the most suitable.Additionally, the effects of advanced reinforcement grouting on the cross-fault fracture zone tunnel were studied,and the optimal combination of grouting reinforcement range (140°) and grouting thickness (1m) was determined.The stress and deformation data obtained fromon-site monitoring of the surrounding rock was slightly lower thanthe numerical simulation results. However, the change trend of both sets of data was found to be consistent. Theseresearch findings provide technical analysis and data support for the construction and design of cross-fault tunnels.展开更多
Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum co...Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum computer. For this new topological stabilizer code-XYZ^(2) code defined on the cellular lattice, it is implemented on a hexagonal lattice of qubits and it encodes the logical qubits with the help of stabilizer measurements of weight six and weight two. However topological stabilizer codes in cellular lattice quantum systems suffer from the detrimental effects of noise due to interaction with the environment. Several decoding approaches have been proposed to address this problem. Here, we propose the use of a state-attention based reinforcement learning decoder to decode XYZ^(2) codes, which enables the decoder to more accurately focus on the information related to the current decoding position, and the error correction accuracy of our reinforcement learning decoder model under the optimisation conditions can reach 83.27% under the depolarizing noise model, and we have measured thresholds of 0.18856 and 0.19043 for XYZ^(2) codes at code spacing of 3–7 and 7–11, respectively. our study provides directions and ideas for applications of decoding schemes combining reinforcement learning attention mechanisms to other topological quantum error-correcting codes.展开更多
In this paper,we propose the Two-way Deep Reinforcement Learning(DRL)-Based resource allocation algorithm,which solves the problem of resource allocation in the cognitive downlink network based on the underlay mode.Se...In this paper,we propose the Two-way Deep Reinforcement Learning(DRL)-Based resource allocation algorithm,which solves the problem of resource allocation in the cognitive downlink network based on the underlay mode.Secondary users(SUs)in the cognitive network are multiplexed by a new Power Domain Sparse Code Multiple Access(PD-SCMA)scheme,and the physical resources of the cognitive base station are virtualized into two types of slices:enhanced mobile broadband(eMBB)slice and ultrareliable low latency communication(URLLC)slice.We design the Double Deep Q Network(DDQN)network output the optimal codebook assignment scheme and simultaneously use the Deep Deterministic Policy Gradient(DDPG)network output the optimal power allocation scheme.The objective is to jointly optimize the spectral efficiency of the system and the Quality of Service(QoS)of SUs.Simulation results show that the proposed algorithm outperforms the CNDDQN algorithm and modified JEERA algorithm in terms of spectral efficiency and QoS satisfaction.Additionally,compared with the Power Domain Non-orthogonal Multiple Access(PD-NOMA)slices and the Sparse Code Multiple Access(SCMA)slices,the PD-SCMA slices can dramatically enhance spectral efficiency and increase the number of accessible users.展开更多
Emerging mobile edge computing(MEC)is considered a feasible solution for offloading the computation-intensive request tasks generated from mobile wireless equipment(MWE)with limited computational resources and energy....Emerging mobile edge computing(MEC)is considered a feasible solution for offloading the computation-intensive request tasks generated from mobile wireless equipment(MWE)with limited computational resources and energy.Due to the homogeneity of request tasks from one MWE during a longterm time period,it is vital to predeploy the particular service cachings required by the request tasks at the MEC server.In this paper,we model a service caching-assisted MEC framework that takes into account the constraint on the number of service cachings hosted by each edge server and the migration of request tasks from the current edge server to another edge server with service caching required by tasks.Furthermore,we propose a multiagent deep reinforcement learning-based computation offloading and task migrating decision-making scheme(MBOMS)to minimize the long-term average weighted cost.The proposed MBOMS can learn the near-optimal offloading and migrating decision-making policy by centralized training and decentralized execution.Systematic and comprehensive simulation results reveal that our proposed MBOMS can converge well after training and outperforms the other five baseline algorithms.展开更多
Grouting is a widely used approach to reinforce broken surrounding rock mass during the construction of underground tunnels in fault fracture zones,and its reinforcement effectiveness is highly affected by geostress.I...Grouting is a widely used approach to reinforce broken surrounding rock mass during the construction of underground tunnels in fault fracture zones,and its reinforcement effectiveness is highly affected by geostress.In this study,a numerical manifold method(NMM)based simulator has been developed to examine the impact of geostress conditions on grouting reinforcement during tunnel excavation.To develop this simulator,a detection technique for identifying slurry migration channels and an improved fluid-solid coupling(FeS)framework,which considers the influence of fracture properties and geostress states,is developed and incorporated into a zero-thickness cohesive element(ZE)based NMM(Co-NMM)for simulating tunnel excavation.Additionally,to simulate coagulation of injected slurry,a bonding repair algorithm is further proposed based on the ZE model.To verify the accuracy of the proposed simulator,a series of simulations about slurry migration in single fractures and fracture networks are numerically reproduced,and the results align well with analytical and laboratory test results.Furthermore,these numerical results show that neglecting the influence of geostress condition can lead to a serious over-estimation of slurry migration range and reinforcement effectiveness.After validations,a series of simulations about tunnel grouting reinforcement and tunnel excavation in fault fracture zones with varying fracture densities under different geostress conditions are conducted.Based on these simula-tions,the influence of geostress conditions and the optimization of grouting schemes are discussed.展开更多
The forward design of trajectory planning strategies requires preset trajectory optimization functions,resulting in poor adaptability of the strategy and an inability to accurately generate obstacle avoidance trajecto...The forward design of trajectory planning strategies requires preset trajectory optimization functions,resulting in poor adaptability of the strategy and an inability to accurately generate obstacle avoidance trajectories that conform to real driver behavior habits.In addition,owing to the strong time-varying dynamic characteristics of obstacle avoidance scenarios,it is necessary to design numerous trajectory optimization functions and adjust the corresponding parameters.Therefore,an anthropomorphic obstacle-avoidance trajectory planning strategy for adaptive driving scenarios is proposed.First,numerous expert-demonstrated trajectories are extracted from the HighD natural driving dataset.Subsequently,a trajectory expectation feature-matching algorithm is proposed that uses maximum entropy inverse reinforcement learning theory to learn the extracted expert-demonstrated trajectories and achieve automatic acquisition of the optimization function of the expert-demonstrated trajectory.Furthermore,a mapping model is constructed by combining the key driving scenario information that affects vehicle obstacle avoidance with the weight of the optimization function,and an anthropomorphic obstacle avoidance trajectory planning strategy for adaptive driving scenarios is proposed.Finally,the proposed strategy is verified based on real driving scenarios.The results show that the strategy can adjust the weight distribution of the trajectory optimization function in real time according to the“emergency degree”of obstacle avoidance and the state of the vehicle.Moreover,this strategy can generate anthropomorphic trajectories that are similar to expert-demonstrated trajectories,effectively improving the adaptability and acceptability of trajectories in driving scenarios.展开更多
基金This study was reviewed and approved by the UT Health Houston Institutional Review Board(approval No.HSC-MS-23-0471).
文摘BACKGROUND Abdominal wall deficiencies or weakness are a common complication of tem-porary ostomies,and incisional hernias frequently develop after colostomy or ileostomy takedown.The use of synthetic meshes to reinforce the abdominal wall has reduced hernia occurrence.Biologic meshes have also been used to enhance healing,particularly in contaminated conditions.Reinforced tissue matrices(R-TMs),which include a biologic scaffold of native extracellular matrix and a syn-thetic component for added strength/durability,are designed to take advantage of aspects of both synthetic and biologic materials.To date,RTMs have not been reported to reinforce the abdominal wall following stoma reversal.METHODS Twenty-eight patients were selected with a parastomal and/or incisional hernia who had received a temporary ileostomy or colostomy for fecal diversion after rectal cancer treatment or trauma.Following hernia repair and proximal stoma closure,RTM(OviTex®1S permanent or OviTex®LPR)was placed to reinforce the abdominal wall using a laparoscopic,robotic,or open surgical approach.Post-operative follow-up was performed at 1 month and 1 year.Hernia recurrence was determined by physical examination and,when necessary,via computed tomo-graphy scan.Secondary endpoints included length of hospital stay,time to return to work,and hospital readmissions.Evaluated complications of the wound/repair site included presence of surgical site infection,seroma,hematoma,wound dehiscence,or fistula formation.RESULTS The observational study cohort included 16 male and 12 female patients with average age of 58.5 years±16.3 years and average body mass index of 26.2 kg/m^(2)±4.1 kg/m^(2).Patients presented with a parastomal hernia(75.0%),in-cisional hernia(14.3%),or combined parastomal/incisional hernia(10.7%).Using a laparoscopic(53.6%),robotic(35.7%),or open(10.7%)technique,RTMs(OviTex®LPR:82.1%,OviTex®1S:17.9%)were placed using sublay(82.1%)or intraperitoneal onlay(IPOM;17.9%)mesh positioning.At 1-month and 1-year follow-ups,there were no hernia recurrences(0%).Average hospital stays were 2.1 d±1.2 d and return to work occurred at 8.3 post-operative days±3.0 post-operative days.Three patients(10.7%)were readmitted before the 1-month follow up due to mesh infection and/or gastrointestinal issues.Fistula and mesh infection were observed in two patients each(7.1%),leading to partial mesh removal in one patient(3.6%).There were no complications between 1 month and 1 year(0%).CONCLUSION RTMs were used successfully to treat parastomal and incisional hernias at ileostomy reversal,with no hernia recurrences and favorable outcomes after 1-month and 1-year.
文摘How to find an effective trading policy is still an open question mainly due to the nonlinear and non-stationary dynamics in a financial market.Deep reinforcement learning,which has recently been used to develop trading strategies by automatically extracting complex features from a large amount of data,is struggling to deal with fastchanging markets due to sample inefficiency.This paper applies the meta-reinforcement learning method to tackle the trading challenges faced by conventional reinforcement learning(RL)approaches in non-stationary markets for the first time.In our work,the history trading data is divided into multiple task data and for each of these data themarket condition is relatively stationary.Then amodel agnosticmeta-learning(MAML)-based tradingmethod involving a meta-learner and a normal learner is proposed.A trading policy is learned by the meta-learner across multiple task data,which is then fine-tuned by the normal learner through a small amount of data from a new market task before trading in it.To improve the adaptability of the MAML-based method,an ordered multiplestep updating mechanism is also proposed to explore the changing dynamic within a task market.The simulation results demonstrate that the proposed MAML-based trading methods can increase the annualized return rate by approximately 180%,200%,and 160%,increase the Sharpe ratio by 180%,90%,and 170%,and decrease the maximum drawdown by 30%,20%,and 40%,compared to the traditional RL approach in three stock index future markets,respectively.
文摘Most reinforced concrete structures in seaside locations suffer from corrosion damage to the reinforcement, limiting their durability and necessitating costly repairs. To improve their performance and durability, we have investigated in this paper Aloe vera extracts as a green corrosion inhibitor for reinforcing steel in NaCl environments. Using electrochemical methods (zero-intensity chronopotentiometry, Tafel lines and electrochemical impedance spectroscopy), this experimental work investigated the effect of these Aloe vera (AV) extracts on corrosion inhibition of concrete reinforcing bar (HA, diameter 12mm) immersed in a 0.5M NaCl solution. The results show that Aloe vera extracts have an average corrosion-inhibiting efficacy of around 86% at an optimum concentration of 20%.
基金supported in part by NSFC (62102099, U22A2054, 62101594)in part by the Pearl River Talent Recruitment Program (2021QN02S643)+9 种基金Guangzhou Basic Research Program (2023A04J1699)in part by the National Research Foundation, SingaporeInfocomm Media Development Authority under its Future Communications Research Development ProgrammeDSO National Laboratories under the AI Singapore Programme under AISG Award No AISG2-RP-2020-019Energy Research Test-Bed and Industry Partnership Funding Initiative, Energy Grid (EG) 2.0 programmeDesCartes and the Campus for Research Excellence and Technological Enterprise (CREATE) programmeMOE Tier 1 under Grant RG87/22in part by the Singapore University of Technology and Design (SUTD) (SRG-ISTD-2021- 165)in part by the SUTD-ZJU IDEA Grant SUTD-ZJU (VP) 202102in part by the Ministry of Education, Singapore, through its SUTD Kickstarter Initiative (SKI 20210204)。
文摘Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metaverses. However, avatar tasks include a multitude of human-to-avatar and avatar-to-avatar interactive applications, e.g., augmented reality navigation,which consumes intensive computing resources. It is inefficient and impractical for vehicles to process avatar tasks locally. Fortunately, migrating avatar tasks to the nearest roadside units(RSU)or unmanned aerial vehicles(UAV) for execution is a promising solution to decrease computation overhead and reduce task processing latency, while the high mobility of vehicles brings challenges for vehicles to independently perform avatar migration decisions depending on current and future vehicle status. To address these challenges, in this paper, we propose a novel avatar task migration system based on multi-agent deep reinforcement learning(MADRL) to execute immersive vehicular avatar tasks dynamically. Specifically, we first formulate the problem of avatar task migration from vehicles to RSUs/UAVs as a partially observable Markov decision process that can be solved by MADRL algorithms. We then design the multi-agent proximal policy optimization(MAPPO) approach as the MADRL algorithm for the avatar task migration problem. To overcome slow convergence resulting from the curse of dimensionality and non-stationary issues caused by shared parameters in MAPPO, we further propose a transformer-based MAPPO approach via sequential decision-making models for the efficient representation of relationships among agents. Finally, to motivate terrestrial or non-terrestrial edge servers(e.g., RSUs or UAVs) to share computation resources and ensure traceability of the sharing records, we apply smart contracts and blockchain technologies to achieve secure sharing management. Numerical results demonstrate that the proposed approach outperforms the MAPPO approach by around 2% and effectively reduces approximately 20% of the latency of avatar task execution in UAV-assisted vehicular Metaverses.
基金supported in part by the National Natural Science Foundation of China(62222301, 62073085, 62073158, 61890930-5, 62021003)the National Key Research and Development Program of China (2021ZD0112302, 2021ZD0112301, 2018YFC1900800-5)Beijing Natural Science Foundation (JQ19013)。
文摘Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.
基金supported in part by the Start-Up Grant-Nanyang Assistant Professorship Grant of Nanyang Technological Universitythe Agency for Science,Technology and Research(A*STAR)under Advanced Manufacturing and Engineering(AME)Young Individual Research under Grant(A2084c0156)+2 种基金the MTC Individual Research Grant(M22K2c0079)the ANR-NRF Joint Grant(NRF2021-NRF-ANR003 HM Science)the Ministry of Education(MOE)under the Tier 2 Grant(MOE-T2EP50222-0002)。
文摘While autonomous vehicles are vital components of intelligent transportation systems,ensuring the trustworthiness of decision-making remains a substantial challenge in realizing autonomous driving.Therefore,we present a novel robust reinforcement learning approach with safety guarantees to attain trustworthy decision-making for autonomous vehicles.The proposed technique ensures decision trustworthiness in terms of policy robustness and collision safety.Specifically,an adversary model is learned online to simulate the worst-case uncertainty by approximating the optimal adversarial perturbations on the observed states and environmental dynamics.In addition,an adversarial robust actor-critic algorithm is developed to enable the agent to learn robust policies against perturbations in observations and dynamics.Moreover,we devise a safety mask to guarantee the collision safety of the autonomous driving agent during both the training and testing processes using an interpretable knowledge model known as the Responsibility-Sensitive Safety Model.Finally,the proposed approach is evaluated through both simulations and experiments.These results indicate that the autonomous driving agent can make trustworthy decisions and drastically reduce the number of collisions through robust safety policies.
基金the scientific research foundation of Zhejiang Provincial Natural Science Foundation of China (LTGG24E090002)Zhejiang University of Water Resources and Electric Power (xky2022013)+1 种基金Major Science and Technology Plan Project of Zhejiang Provincial Department of Water Resources (RA1904)the water conservancy management department, Zhejiang Design Institute of Water Conservancy and Hydro Electric Power Co., Ltd. and the construction company for their support。
文摘The stability of the ancient flood control levees is mainly influenced by water level fluctuations, groundwater concentration and rainfalls. This paper takes the Lanxi ancient levee as a research object to study the evolution laws of its seepage, displacement and stability before and after reinforcement with the upside-down hanging wells and grouting curtain through numerical simulation methods combined with experiments and observations. The study results indicate that the filled soil is less affected by water level fluctuations and groundwater concentration after reinforcement. A high groundwater level is detrimental to the levee's long-term stability, and the drainage issues need to be fully considered. The deformation of the reinforced levee is effectively controlled since the fill deformation is mainly borne by the upside-down hanging wells. The safety factors of the levee before reinforcement vary significantly with the water level. The minimum value of the safety factors is 0.886 during the water level decreasing period, indicating a very high risk of the instability. While it reached 1.478 after reinforcement, the stability of the ancient levee is improved by a large margin.
基金National Natural Science Foundation of China(61973037)National 173 Program Project(2019-JCJQ-ZD-324).
文摘To solve the problem of the low interference success rate of air defense missile radio fuzes due to the unified interference form of the traditional fuze interference system,an interference decision method based Q-learning algorithm is proposed.First,dividing the distance between the missile and the target into multiple states to increase the quantity of state spaces.Second,a multidimensional motion space is utilized,and the search range of which changes with the distance of the projectile,to select parameters and minimize the amount of ineffective interference parameters.The interference effect is determined by detecting whether the fuze signal disappears.Finally,a weighted reward function is used to determine the reward value based on the range state,output power,and parameter quantity information of the interference form.The effectiveness of the proposed method in selecting the range of motion space parameters and designing the discrimination degree of the reward function has been verified through offline experiments involving full-range missile rendezvous.The optimal interference form for each distance state has been obtained.Compared with the single-interference decision method,the proposed decision method can effectively improve the success rate of interference.
基金financially supported by National Natural Science Foundation of China(Grant Nos.52088102,51879249)Fundamental Research Funds for the Central Universities(Grant No.202261055)。
文摘The collapse pressure is a key parameter when RTPs are applied in harsh deep-water environments.To investigate the collapse of RTPs,numerical simulations and hydrostatic pressure tests are conducted.For the numerical simulations,the eigenvalue analysis and Riks analysis are combined,in which the Hashin failure criterion and fracture energy stiffness degradation model are used to simulate the progressive failure of composites,and the“infinite”boundary conditions are applied to eliminate the boundary effects.As for the hydrostatic pressure tests,RTP specimens were placed in a hydrostatic chamber after filled with water.It has been observed that the cross-section of the middle part collapses when it reaches the maximum pressure.The collapse pressure obtained from the numerical simulations agrees well with that in the experiment.Meanwhile,the applicability of NASA SP-8007 formula on the collapse pressure prediction was also discussed.It has a relatively greater difference because of the ignorance of the progressive failure of composites.For the parametric study,it is found that RTPs have much higher first-ply-failure pressure when the winding angles are between 50°and 70°.Besides,the effect of debonding and initial ovality,and the contribution of the liner and coating are also discussed.
基金supported in part by the National Natural Science Foundation of China under Grant Nos.62173312 and 61803348in part by the National Major Scientific Instruments Development Project under Grant No.61927807+3 种基金in part by the Program for the Innovative Talents of Higher Education Institutions of ShanxiShanxi Province Science Foundation for Excellent Youthsin part by the Shanxi"1331 Project"Key Subjects Construction(1331KSC)in part by Graduate Innovation Project of Shanxi Province under Grant No.2021Y617。
文摘This article investigates a multi-circular path-following formation control with reinforced transient profiles for nonholonomic vehicles connected by a digraph.A multi-circular formation controller endowed with the feature of spatial-temporal decoupling is devised for a group of vehicles guided by a virtual leader evolving along an implicit path,which allows for a circumnavigation on multiple circles with an anticipant angular spacing.In addition,notice that it typically imposes a stringent time constraint on time-sensitive enclosing scenarios,hence an improved prescribed performance control(IPPC)using novel tighter behavior boundaries is presented to enhance transient capabilities with an ensured appointed-time convergence free from any overshoots.The significant merits are that coordinated circumnavigation along different circles can be realized via executing geometric and dynamic assignments independently with modified transient profiles.Furthermore,all variables existing in the entire system are analyzed to be convergent.Simulation and experimental results are provided to validate the utility of suggested solution.
基金supported in part by the National Key R&D Program of China under Grant 2021YFB2011300the National Natural Science Foundation of China under Grant 52075262。
文摘This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a large class of engineering systems,such as vehicular systems,robot manipulators and satellites.All these systems are often characterized by highly nonlinear characteristics,heavy modeling uncertainties and unknown perturbations,therefore,accurate-model-based nonlinear control approaches become unavailable.Motivated by the challenge,a reinforcement learning(RL)adaptive control methodology based on the actor-critic framework is investigated to compensate the uncertain mechanical dynamics.The approximation inaccuracies caused by RL and the exogenous unknown disturbances are circumvented via a continuous robust integral of the sign of the error(RISE)control approach.Different from a classical RISE control law,a tanh(·)function is utilized instead of a sign(·)function to acquire a more smooth control signal.The developed controller requires very little prior knowledge of the dynamic model,is robust to unknown dynamics and exogenous disturbances,and can achieve asymptotic output tracking.Eventually,co-simulations through ADAMS and MATLAB/Simulink on a three degrees-of-freedom(3-DOF)manipulator and experiments on a real-time electromechanical servo system are performed to verify the performance of the proposed approach.
基金supported by the National Natural Science Foundation of China(Grant No.12072090)。
文摘This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise.The attack-defense engagement scenario is modeled as a partially observable Markov decision process(POMDP).Given the benefits of recurrent neural networks(RNNs)in processing sequence information,an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs.The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency.During training,the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent.The training curves show that the proposed RRTD3 successfully enhances data efficiency,training speed,and training stability.The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws.
基金supported in part by the National Natural Science Foundation of China under Grants 62201105,62331017,and 62075024in part by the Natural Science Foundation of Chongqing under Grant cstc2021jcyj-msxmX0404+1 种基金in part by the Chongqing Municipal Education Commission under Grant KJQN202100643in part by Guangdong Basic and Applied Basic Research Foundation under Grant 2022A1515110056.
文摘The Multi-access Edge Cloud(MEC) networks extend cloud computing services and capabilities to the edge of the networks. By bringing computation and storage capabilities closer to end-users and connected devices, MEC networks can support a wide range of applications. MEC networks can also leverage various types of resources, including computation resources, network resources, radio resources,and location-based resources, to provide multidimensional resources for intelligent applications in 5/6G.However, tasks generated by users often consist of multiple subtasks that require different types of resources. It is a challenging problem to offload multiresource task requests to the edge cloud aiming at maximizing benefits due to the heterogeneity of resources provided by devices. To address this issue,we mathematically model the task requests with multiple subtasks. Then, the problem of task offloading of multi-resource task requests is proved to be NP-hard. Furthermore, we propose a novel Dual-Agent Deep Reinforcement Learning algorithm with Node First and Link features(NF_L_DA_DRL) based on the policy network, to optimize the benefits generated by offloading multi-resource task requests in MEC networks. Finally, simulation results show that the proposed algorithm can effectively improve the benefit of task offloading with higher resource utilization compared with baseline algorithms.
基金supported in part by the Natural Sciences Engineering Research Council of Canada (NSERC)。
文摘This survey paper provides a review and perspective on intermediate and advanced reinforcement learning(RL)techniques in process industries. It offers a holistic approach by covering all levels of the process control hierarchy. The survey paper presents a comprehensive overview of RL algorithms,including fundamental concepts like Markov decision processes and different approaches to RL, such as value-based, policy-based, and actor-critic methods, while also discussing the relationship between classical control and RL. It further reviews the wide-ranging applications of RL in process industries, such as soft sensors, low-level control, high-level control, distributed process control, fault detection and fault tolerant control, optimization,planning, scheduling, and supply chain. The survey paper discusses the limitations and advantages, trends and new applications, and opportunities and future prospects for RL in process industries. Moreover, it highlights the need for a holistic approach in complex systems due to the growing importance of digitalization in the process industries.
基金the Postgraduate Research and Practice Innovation Program of Jiangsu Province(Grant No.KYCX22_0621)the National Natural Science Foundation of China(Grant No.52209130)Jiangsu Funding Program for Excellent Postdoctoral Talent.
文摘Tunnel construction is susceptible to accidents such as loosening, deformation, collapse, and water inrush, especiallyunder complex geological conditions like dense fault areas. These accidents can cause instability and damageto the tunnel. As a result, it is essential to conduct research on tunnel construction and grouting reinforcementtechnology in fault fracture zones to address these issues and ensure the safety of tunnel excavation projects. Thisstudy utilized the Xianglushan cross-fault tunnel to conduct a comprehensive analysis on the construction, support,and reinforcement of a tunnel crossing a fault fracture zone using the three-dimensional finite element numericalmethod. The study yielded the following research conclusions: The excavation conditions of the cross-fault tunnelarray were analyzed to determine the optimal construction method for excavation while controlling deformationand stress in the surrounding rock. The middle partition method (CD method) was found to be the most suitable.Additionally, the effects of advanced reinforcement grouting on the cross-fault fracture zone tunnel were studied,and the optimal combination of grouting reinforcement range (140°) and grouting thickness (1m) was determined.The stress and deformation data obtained fromon-site monitoring of the surrounding rock was slightly lower thanthe numerical simulation results. However, the change trend of both sets of data was found to be consistent. Theseresearch findings provide technical analysis and data support for the construction and design of cross-fault tunnels.
基金supported by the Natural Science Foundation of Shandong Province,China (Grant No. ZR2021MF049)Joint Fund of Natural Science Foundation of Shandong Province (Grant Nos. ZR2022LLZ012 and ZR2021LLZ001)。
文摘Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum computer. For this new topological stabilizer code-XYZ^(2) code defined on the cellular lattice, it is implemented on a hexagonal lattice of qubits and it encodes the logical qubits with the help of stabilizer measurements of weight six and weight two. However topological stabilizer codes in cellular lattice quantum systems suffer from the detrimental effects of noise due to interaction with the environment. Several decoding approaches have been proposed to address this problem. Here, we propose the use of a state-attention based reinforcement learning decoder to decode XYZ^(2) codes, which enables the decoder to more accurately focus on the information related to the current decoding position, and the error correction accuracy of our reinforcement learning decoder model under the optimisation conditions can reach 83.27% under the depolarizing noise model, and we have measured thresholds of 0.18856 and 0.19043 for XYZ^(2) codes at code spacing of 3–7 and 7–11, respectively. our study provides directions and ideas for applications of decoding schemes combining reinforcement learning attention mechanisms to other topological quantum error-correcting codes.
基金supported by the National Natural Science Foundation of China(Grant No.61971057).
文摘In this paper,we propose the Two-way Deep Reinforcement Learning(DRL)-Based resource allocation algorithm,which solves the problem of resource allocation in the cognitive downlink network based on the underlay mode.Secondary users(SUs)in the cognitive network are multiplexed by a new Power Domain Sparse Code Multiple Access(PD-SCMA)scheme,and the physical resources of the cognitive base station are virtualized into two types of slices:enhanced mobile broadband(eMBB)slice and ultrareliable low latency communication(URLLC)slice.We design the Double Deep Q Network(DDQN)network output the optimal codebook assignment scheme and simultaneously use the Deep Deterministic Policy Gradient(DDPG)network output the optimal power allocation scheme.The objective is to jointly optimize the spectral efficiency of the system and the Quality of Service(QoS)of SUs.Simulation results show that the proposed algorithm outperforms the CNDDQN algorithm and modified JEERA algorithm in terms of spectral efficiency and QoS satisfaction.Additionally,compared with the Power Domain Non-orthogonal Multiple Access(PD-NOMA)slices and the Sparse Code Multiple Access(SCMA)slices,the PD-SCMA slices can dramatically enhance spectral efficiency and increase the number of accessible users.
基金supported by Jilin Provincial Science and Technology Department Natural Science Foundation of China(20210101415JC)Jilin Provincial Science and Technology Department Free exploration research project of China(YDZJ202201ZYTS642).
文摘Emerging mobile edge computing(MEC)is considered a feasible solution for offloading the computation-intensive request tasks generated from mobile wireless equipment(MWE)with limited computational resources and energy.Due to the homogeneity of request tasks from one MWE during a longterm time period,it is vital to predeploy the particular service cachings required by the request tasks at the MEC server.In this paper,we model a service caching-assisted MEC framework that takes into account the constraint on the number of service cachings hosted by each edge server and the migration of request tasks from the current edge server to another edge server with service caching required by tasks.Furthermore,we propose a multiagent deep reinforcement learning-based computation offloading and task migrating decision-making scheme(MBOMS)to minimize the long-term average weighted cost.The proposed MBOMS can learn the near-optimal offloading and migrating decision-making policy by centralized training and decentralized execution.Systematic and comprehensive simulation results reveal that our proposed MBOMS can converge well after training and outperforms the other five baseline algorithms.
基金This work was supported by the Guangdong Basic and Applied Basic Research Foundation(Grant No.2021A1515110304)the Na-tional Natural Science Foundation of China(Grant Nos.42077246 and 52278412).
文摘Grouting is a widely used approach to reinforce broken surrounding rock mass during the construction of underground tunnels in fault fracture zones,and its reinforcement effectiveness is highly affected by geostress.In this study,a numerical manifold method(NMM)based simulator has been developed to examine the impact of geostress conditions on grouting reinforcement during tunnel excavation.To develop this simulator,a detection technique for identifying slurry migration channels and an improved fluid-solid coupling(FeS)framework,which considers the influence of fracture properties and geostress states,is developed and incorporated into a zero-thickness cohesive element(ZE)based NMM(Co-NMM)for simulating tunnel excavation.Additionally,to simulate coagulation of injected slurry,a bonding repair algorithm is further proposed based on the ZE model.To verify the accuracy of the proposed simulator,a series of simulations about slurry migration in single fractures and fracture networks are numerically reproduced,and the results align well with analytical and laboratory test results.Furthermore,these numerical results show that neglecting the influence of geostress condition can lead to a serious over-estimation of slurry migration range and reinforcement effectiveness.After validations,a series of simulations about tunnel grouting reinforcement and tunnel excavation in fault fracture zones with varying fracture densities under different geostress conditions are conducted.Based on these simula-tions,the influence of geostress conditions and the optimization of grouting schemes are discussed.
基金supported by the National Natural Science Foundation of China(51875302)。
文摘The forward design of trajectory planning strategies requires preset trajectory optimization functions,resulting in poor adaptability of the strategy and an inability to accurately generate obstacle avoidance trajectories that conform to real driver behavior habits.In addition,owing to the strong time-varying dynamic characteristics of obstacle avoidance scenarios,it is necessary to design numerous trajectory optimization functions and adjust the corresponding parameters.Therefore,an anthropomorphic obstacle-avoidance trajectory planning strategy for adaptive driving scenarios is proposed.First,numerous expert-demonstrated trajectories are extracted from the HighD natural driving dataset.Subsequently,a trajectory expectation feature-matching algorithm is proposed that uses maximum entropy inverse reinforcement learning theory to learn the extracted expert-demonstrated trajectories and achieve automatic acquisition of the optimization function of the expert-demonstrated trajectory.Furthermore,a mapping model is constructed by combining the key driving scenario information that affects vehicle obstacle avoidance with the weight of the optimization function,and an anthropomorphic obstacle avoidance trajectory planning strategy for adaptive driving scenarios is proposed.Finally,the proposed strategy is verified based on real driving scenarios.The results show that the strategy can adjust the weight distribution of the trajectory optimization function in real time according to the“emergency degree”of obstacle avoidance and the state of the vehicle.Moreover,this strategy can generate anthropomorphic trajectories that are similar to expert-demonstrated trajectories,effectively improving the adaptability and acceptability of trajectories in driving scenarios.