In this paper, platoons of autonomous vehicles operating in urban road networks are considered. From a methodological point of view, the problem of interest consists of formally characterizing vehicle state trajectory...In this paper, platoons of autonomous vehicles operating in urban road networks are considered. From a methodological point of view, the problem of interest consists of formally characterizing vehicle state trajectory tubes by means of routing decisions complying with traffic congestion criteria. To this end, a novel distributed control architecture is conceived by taking advantage of two methodologies: deep reinforcement learning and model predictive control. On one hand, the routing decisions are obtained by using a distributed reinforcement learning algorithm that exploits available traffic data at each road junction. On the other hand, a bank of model predictive controllers is in charge of computing the more adequate control action for each involved vehicle. Such tasks are here combined into a single framework:the deep reinforcement learning output(action) is translated into a set-point to be tracked by the model predictive controller;conversely, the current vehicle position, resulting from the application of the control move, is exploited by the deep reinforcement learning unit for improving its reliability. The main novelty of the proposed solution lies in its hybrid nature: on one hand it fully exploits deep reinforcement learning capabilities for decisionmaking purposes;on the other hand, time-varying hard constraints are always satisfied during the dynamical platoon evolution imposed by the computed routing decisions. To efficiently evaluate the performance of the proposed control architecture, a co-design procedure, involving the SUMO and MATLAB platforms, is implemented so that complex operating environments can be used, and the information coming from road maps(links,junctions, obstacles, semaphores, etc.) and vehicle state trajectories can be shared and exchanged. Finally by considering as operating scenario a real entire city block and a platoon of eleven vehicles described by double-integrator models, several simulations have been performed with the aim to put in light the main f eatures of the proposed approach. Moreover, it is important to underline that in different operating scenarios the proposed reinforcement learning scheme is capable of significantly reducing traffic congestion phenomena when compared with well-reputed competitors.展开更多
Efficient exploration in complex coordination tasks has been considered a challenging problem in multi-agent reinforcement learning(MARL). It is significantly more difficult for those tasks with latent variables that ...Efficient exploration in complex coordination tasks has been considered a challenging problem in multi-agent reinforcement learning(MARL). It is significantly more difficult for those tasks with latent variables that agents cannot directly observe. However, most of the existing latent variable discovery methods lack a clear representation of latent variables and an effective evaluation of the influence of latent variables on the agent. In this paper, we propose a new MARL algorithm based on the soft actor-critic method for complex continuous control tasks with confounders. It is called the multi-agent soft actor-critic with latent variable(MASAC-LV) algorithm, which uses variational inference theory to infer the compact latent variables representation space from a large amount of offline experience.Besides, we derive the counterfactual policy whose input has no latent variables and quantify the difference between the actual policy and the counterfactual policy via a distance function. This quantified difference is considered an intrinsic motivation that gives additional rewards based on how much the latent variable affects each agent. The proposed algorithm is evaluated on two collaboration tasks with confounders, and the experimental results demonstrate the effectiveness of MASAC-LV compared to other baseline algorithms.展开更多
Model predictive control is widely used in the design of autonomous driving algorithms.However,its parameters are sensitive to dynamically varying driving conditions,making it difficult to be implemented into practice...Model predictive control is widely used in the design of autonomous driving algorithms.However,its parameters are sensitive to dynamically varying driving conditions,making it difficult to be implemented into practice.As a result,this study presents a self-learning algorithm based on reinforcement learning to tune a model predictive controller.Specifically,the proposed algorithm is used to extract features of dynamic traffic scenes and adjust the weight coefficients of the model predictive controller.In this method,a risk threshold model is proposed to classify the risk level of the scenes based on the scene features,and aid in the design of the reinforcement learning reward function and ultimately improve the adaptability of the model predictive controller to real-world scenarios.The proposed algorithm is compared to a pure model predictive controller in car-following case.According to the results,the proposed method enables autonomous vehicles to adjust the priority of performance indices reasonably in different scenarios according to risk variations,showing a good scenario adaptability with safety guaranteed.展开更多
The quantization algorithm compresses the original network by reducing the numerical bit width of the model,which improves the computation speed. Because different layers have different redundancy and sensitivity to d...The quantization algorithm compresses the original network by reducing the numerical bit width of the model,which improves the computation speed. Because different layers have different redundancy and sensitivity to databit width. Reducing the data bit width will result in a loss of accuracy. Therefore, it is difficult to determinethe optimal bit width for different parts of the network with guaranteed accuracy. Mixed precision quantizationcan effectively reduce the amount of computation while keeping the model accuracy basically unchanged. In thispaper, a hardware-aware mixed precision quantization strategy optimal assignment algorithm adapted to low bitwidth is proposed, and reinforcement learning is used to automatically predict the mixed precision that meets theconstraints of hardware resources. In the state-space design, the standard deviation of weights is used to measurethe distribution difference of data, the execution speed feedback of simulated neural network accelerator inferenceis used as the environment to limit the action space of the agent, and the accuracy of the quantization model afterretraining is used as the reward function to guide the agent to carry out deep reinforcement learning training. Theexperimental results show that the proposed method obtains a suitable model layer-by-layer quantization strategyunder the condition that the computational resources are satisfied, and themodel accuracy is effectively improved.The proposed method has strong intelligence and certain universality and has strong application potential in thefield of mixed precision quantization and embedded neural network model deployment.展开更多
This work aims to study the modeling and sizing of a floor reinforced by ballasted columns. We are studying the system of reinforcement by ballasted columns because this technique is able to replace deep foundations t...This work aims to study the modeling and sizing of a floor reinforced by ballasted columns. We are studying the system of reinforcement by ballasted columns because this technique is able to replace deep foundations that are technically difficult to realize and their cost is higher. The modelling and dimensioning of foundations on a ballasted column will be an important contribution to the state of the art of this method because it will highlight the mode of transfer of loads, and will expose the induced deformations by also allowing to verification criteria of bearing capacity and allowable settlement according to geometric information of the model. The columns on a substrate located at 9 m have a length of 9 m and a diameter of 40 cm and were obtained by incorporating ballast of granular class 0/31.5 of internal friction angle of 38˚ and a density weight of 21 kN/m3. The choice of this method is based on the geotechnical characteristics of the initial soil. Thus, identification and characterization tests were carried out to estimate the bearing capacity and the settlement giving respectively 125 kPa and 57 cm. These results show the ground does not have sufficient mechanical properties to withstand the loads transmitted by the tank. By adopting the reinforcement of the soil with ballasted columns, numerical calculations show that after applying a load equal to 265.1 KPa, 20 cm vertical settlement and 17 cm horizontal displacement were obtained. This is in the tolerable deformation range for our tank, namely, less than 20 cm. Analytically, in addition to reducing settlement, ballasted columns, Due to their high stiffness, they have effectively contributed to the increase of the permissible soil stress up to 257 kPa.展开更多
The aim of this study is to characterize soil/reinforcement interaction in reinforced earth structures. The study showed that the internal behavior of this type of structure depends on a number of factors, including t...The aim of this study is to characterize soil/reinforcement interaction in reinforced earth structures. The study showed that the internal behavior of this type of structure depends on a number of factors, including the engineering backfill, the reinforcement and the soil/reinforcement interaction. The study also showed that the soil-reinforcement interaction phenomenon is a fairly complex mechanism that depends on the applied load, the geometry of the structure, the characteristics of the soil and a set of parameters characterizing the nailing: density, number and length of reinforcements, inclination of the reinforcements in relation to the sliding surface, mechanical characteristics of the reinforcements and, in particular, the relative stiffness of the reinforcements and the soil. The results showed that the tensile forces developed in the reinforcement are not entirely reversible, and that the soil at the interface undergoes permanent deformation, leading to the appearance of irreversible tensile forces in the reinforcement.展开更多
Grain size effect on rock strength is a topic of great interest in geotechnical engineering.A consensus obtained from earlier laboratory tests is that rock strength generally decreases with the increase of grain size ...Grain size effect on rock strength is a topic of great interest in geotechnical engineering.A consensus obtained from earlier laboratory tests is that rock strength generally decreases with the increase of grain size for both silicate and carbonate rocks;however,some recent numerical results conflict with such laboratory test results.To address this intriguing issue,the effect of grain size on strength of polymineralic crystalline rock with low porosity is investigated numerically using the grain-based modeling(GBM)approach in discrete element method(DEM)by interpreting micro-cracking process in response to loading.In agreement with some previous DEM simulation results,the simulated rock strength is found to increase with increasing grain size for both homogeneous and heterogeneous models,even when the number of assembled disks in one mineral grain changes.The mechanism of strength increase with increasing grain size is mainly associated with the number of assembled smooth-joint contacts along grain interfaces and the generation of grain boundary cracks in response to loading.The grain interfaces significantly weaken the integrity of the rock model,which is similar to effects of inherent defects in real rock.As the grain size increases,fewer grain interfaces are built in the model and the rock strength becomes much higher.Hence,by solely changing the mineral grain size in a model,the mechanism of grain size effect as observed in laboratory tests cannot be replicated.To address this issue,a method of degradation of grain boundary strength parameters is used to mimic the possible mechanism of grain size effect.The simulated strength using the method becomes comparable with those obtained from laboratory tests when the heterogeneity in the rock is considered.Degradation of grain boundary parameters with increasing grain size provides a plausible explanation for the grain size effect on rock strength.展开更多
A grain-based distinct element model featuring three-dimensional (3D) Voronoi tessellations (randompoly-crystals) is proposed for simulation of crack damage development in brittle rocks. The grainboundaries in pol...A grain-based distinct element model featuring three-dimensional (3D) Voronoi tessellations (randompoly-crystals) is proposed for simulation of crack damage development in brittle rocks. The grainboundaries in poly-crystal structure produced by Voronoi tessellations can represent flaws in intact rockand allow for numerical replication of crack damage progression through initiation and propagation ofmicro-fractures along grain boundaries. The Voronoi modelling scheme has been used widely in the pastfor brittle fracture simulation of rock materials. However the difficulty of generating 3D Voronoi modelshas limited its application to two-dimensional (2D) codes. The proposed approach is implemented inNeper, an open-source engine for generation of 3D Voronoi grains, to generate block geometry files thatcan be read directly into 3DEC. A series of Unconfined Compressive Strength (UCS) tests are simulated in3DEC to verify the proposed methodology for 3D simulation of brittle fractures and to investigate therelationship between each micro-parameter and the model's macro-response. The possibility of numericalreplication of the classical U-shape strength curve for anisotropic rocks is also investigated innumerical UCS tests by using complex-shaped (elongated) grains that are cemented to one another alongtheir adjoining sides. A micro-parameter calibration procedure is established for 3D Voronoi models foraccurate replication of the mechanical behaviour of isotropic and anisotropic (containing a fabric) rocks. 2014 Institute of Rock and Soil Mechanics, Chinese Academy of Sciences. Production and hosting byElsevier B.V. All rights reserved.展开更多
Non aqueous reactive polymer materials produced by the reaction of isocyanate and polyol have been widely used in infrastructure construction,which may be subjected to explosion loads during complex service conditions...Non aqueous reactive polymer materials produced by the reaction of isocyanate and polyol have been widely used in infrastructure construction,which may be subjected to explosion loads during complex service conditions.The blast response of composite materials is a crucial aspect for applications in engineering structures potentially subjected to extreme loadings.In this work,damage caused to rebar reinforced polymer slabs by surface explosive charges was studied experimentally and numerically.A total of 6 field tests were carried out to investigate the performances of the failure modes of rebar reinforced polymer slabs under contact and near-field explosions.The influence of explosive quantity(10-40 g)and stand-off distances(0-20 cm)at the damage modes were studied.The results show that the failure modes of rebar reinforced polymer slabs under near-field explosion mainly were bending and surface spalling,while under the impact of contact explosion,the failure modes were craters of the top surface,spalling of the bottom surface,and middle perforation.Furthermore,a detailed fully coupled model was developed and validated with the test data.The influences of explosive quantity and slab thickness on rebar reinforced polymer slabs under contact explosion were studied.Based on this,the calculation formula between breach diameter,explosive quantity,and slab thickness is fitted.展开更多
The objective of this paper is to develop a methodology for calibration of a discrete element grain-based model(GBM)to replicate the hydro-mechanical properties of a brittle rock measured in the laboratory,and to appl...The objective of this paper is to develop a methodology for calibration of a discrete element grain-based model(GBM)to replicate the hydro-mechanical properties of a brittle rock measured in the laboratory,and to apply the calibrated model to simulating the formation of excavation damage zone(EDZ)around underground excavations.Firstly,a new cohesive crack model is implemented into the universal distinct element code(UDEC)to control the fracturing behaviour of materials under various loading modes.Next,a methodology for calibration of the components of the UDEC-Voronoi model is discussed.The role of connectivity of induced microcracks on increasing the permeability of laboratory-scale samples is investigated.The calibrated samples are used to investigate the influence of pore fluid pressure on weakening the drained strength of the laboratory-scale rock.The validity of the Terzaghi’s effective stress law for the drained peak strength of low-porosity rock is tested by performing a series of biaxial compression test simulations.Finally,the evolution of damage and pore pressure around two unsupported circular tunnels in crystalline granitic rock is studied.展开更多
A nonlinear finite element model (FEM) of the corrosion of steel reinforcement in concrete has been successfully developed on the basis of mathematical analysis of the electrochemical process of steel corrosion in c...A nonlinear finite element model (FEM) of the corrosion of steel reinforcement in concrete has been successfully developed on the basis of mathematical analysis of the electrochemical process of steel corrosion in concrete. The influences of the area ratio and the Tafel constants of the anode and cathode on the potential and corrosion current density have been examined with the model. It has been found that the finite element calculation is more suitable for assessing the corrosion condition of steel reinforcement than ordinary electrochemical techniques due to the fact that FEM can obtain the distributions of potential and corrosion current density on the steel surface. In addition, the local corrosion of steel reinforcement in concrete is strengthened with the decrease of both the area ratio and the Tafel constants. These results provide valuable information to the researchers who investigate steel corrosion.展开更多
This study investigates the bond between seawater scoria aggregate concrete(SSAC)and stainless reinforcement(SR)through a series of pull-out tests.A total of 39 specimens,considering five experimental parameters—con-...This study investigates the bond between seawater scoria aggregate concrete(SSAC)and stainless reinforcement(SR)through a series of pull-out tests.A total of 39 specimens,considering five experimental parameters—con-crete type(SSAC,ordinary concrete(OC)and seawater coral aggregate concrete(SCAC)),reinforcement type(SR,ordinary reinforcement(OR)),bond length(3,5 and 8 times bar diameter),concrete strength(C25 and C30)and concrete cover thickness(42 and 67 mm)—were prepared.The typical bond properties(failure pattern,bond strength,bond-slip curves and bond stress distribution,etc.)of seawater scoria aggregate concrete-stainless rein-forcement(SSAC-SR)specimen were systematically studied.Generally,the failure pattern changed with the con-crete type used,and the failure surface of SSAC specimen was different from that of OC specimen.SSAC enhanced the bond strength of specimen,while its effect on the deformation of SSAC-SR was negative.On aver-age,the peak slip of SSAC specimens was 20%lower while the bond strength was 6.7%higher compared to OC specimens under the similar conditions.The effects of variables on the bond strength of SSAC–SR in increasing order are concrete type,bond length,concrete strength and cover thickness.The bond-slip curve of SSAC-SR specimen consisted of micro-slipping,slipping and declining stages.It can be obtained that SSAC reduced the curve curvature of bond-slip,and the decline of curve became steep after adopting SR.The typical distribution of bond stress along bond length changed with the types of concrete and reinforcement used.Finally,a specific expression of the bond stress-slip curve considering the effects of various variables was established,which could provide a basis for the practical application of reinforced SSAC.展开更多
With the continuous development of artificial intelligence technology,its application field has gradually expanded.To further apply the deep reinforcement learning technology to the field of dynamic pricing,we build a...With the continuous development of artificial intelligence technology,its application field has gradually expanded.To further apply the deep reinforcement learning technology to the field of dynamic pricing,we build an intelligent dynamic pricing system,introduce the reinforcement learning technology related to dynamic pricing,and introduce existing research on the number of suppliers(single supplier and multiple suppliers),environmental models,and selection algorithms.A two-period dynamic pricing game model is designed to assess the optimal pricing strategy for e-commerce platforms under two market conditions and two consumer participation conditions.The first step is to analyze the pricing strategies of e-commerce platforms in mature markets,analyze the optimal pricing and profits of various enterprises under different strategy combinations,compare different market equilibriums and solve the Nash equilibrium.Then,assuming that all consumers are naive in the market,the pricing strategy of the duopoly e-commerce platform in emerging markets is analyzed.By comparing and analyzing the optimal pricing and total profit of each enterprise under different strategy combinations,the subgame refined Nash equilibrium is solved.Finally,assuming that the market includes all experienced consumers,the pricing strategy of the duopoly e-commerce platform in emerging markets is analyzed.展开更多
The dynamic responses and generated voltage in a curved sandwich beam with glass reinforced laminate(GRL)layers and a pliable core in the presence of a piezoelectric layer under low-velocity impact(LVI)are investigate...The dynamic responses and generated voltage in a curved sandwich beam with glass reinforced laminate(GRL)layers and a pliable core in the presence of a piezoelectric layer under low-velocity impact(LVI)are investigated.The current study aims to carry out a dynamic analysis on the sandwich beam when the impactor hits the top face sheet with an initial velocity.For the layer analysis,the high-order shear deformation theory(HSDT)and Frostig's second model for the displacement fields of the core layer are used.The classical non-adhesive elastic contact theory and Hunter's principle are used to calculate the dynamic responses in terms of time.In order to validate the analytical method,the outcomes of the current investigation are compared with those gained by the experimental tests carried out by other researchers for a rectangular composite plate subject to the LVI.Finite element(FE)simulations are conducted by means of the ABAQUS software.The effects of the parameters such as foam modulus,layer material,fiber angle,impactor mass,and its velocity on the generated voltage are reviewed.展开更多
It is shown that we can control spatiotemporal chaos in the Frenkel-Kontorova(FK)model by a model-free control method based on reinforcement learning.The method uses Q-learning to find optimal control strategies based...It is shown that we can control spatiotemporal chaos in the Frenkel-Kontorova(FK)model by a model-free control method based on reinforcement learning.The method uses Q-learning to find optimal control strategies based on the reward feedback from the environment that maximizes its performance.The optimal control strategies are recorded in a Q-table and then employed to implement controllers.The advantage of the method is that it does not require an explicit knowledge of the system,target states,and unstable periodic orbits.All that we need is the parameters that we are trying to control and an unknown simulation model that represents the interactive environment.To control the FK model,we employ the perturbation policy on two different kinds of parameters,i.e.,the pendulum lengths and the phase angles.We show that both of the two perturbation techniques,i.e.,changing the lengths and changing their phase angles,can suppress chaos in the system and make it create the periodic patterns.The form of patterns depends on the initial values of the angular displacements and velocities.In particular,we show that the pinning control strategy,which only changes a small number of lengths or phase angles,can be put into effect.展开更多
Conducting hydrodynamic and physical motion simulation tests using a large-scale self-propelled model under actual wave conditions is an important means for researching environmental adaptability of ships. During the ...Conducting hydrodynamic and physical motion simulation tests using a large-scale self-propelled model under actual wave conditions is an important means for researching environmental adaptability of ships. During the navigation test of the self-propelled model, the complex environment including various port facilities, navigation facilities, and the ships nearby must be considered carefully, because in this dense environment the impact of sea waves and winds on the model is particularly significant. In order to improve the security of the self-propelled model, this paper introduces the Q learning based on reinforcement learning combined with chaotic ideas for the model's collision avoidance, in order to improve the reliability of the local path planning. Simulation and sea test results show that this algorithm is a better solution for collision avoidance of the self navigation model under the interference of sea winds and waves with good adaptability.展开更多
Despite the salience of misinformation and its consequences, there still lies a tremendous gap in research on the broader tendencies in collective cognition that compels individuals to spread misinformation so excessi...Despite the salience of misinformation and its consequences, there still lies a tremendous gap in research on the broader tendencies in collective cognition that compels individuals to spread misinformation so excessively. This study examined social learning as an antecedent of engaging with misinformation online. Using data released by Twitter for academic research in 2018, Tweets that included URL news links of both known misinformation and reliable domains were analyzed. Lindström’s computational reinforcement learning model was adapted as an expression of social learning, where a Twitter user’s posting frequency of news links is dependent on the relative engagement they receive in consequence. The research found that those who shared misinformation were highly sensitive to social reward. Inflation of positive social feedback was associated with a decrease in posting latency, indicating that users that posted misinformation were strongly influenced by social learning. However, the posting frequency of authentic news sharers remained fixed, even after receiving an increase in relative and absolute engagement. The results identified social learning is a contributor to the spread of misinformation online. In addition, behavior driven by social validation suggests a positive correlation between posting frequency, gratification received from posting, and a growing mental health dependency on social media. Developing interventions for spreading misinformation online may profit by assessing which online environments amplify social learning, particularly the conditions under which misinformation proliferates.展开更多
Car following (CF) models are an appealing research area because they fundamentally describe longitudinal interactions of vehicles on the road, and contribute significantly to an understanding of traffic flow. There i...Car following (CF) models are an appealing research area because they fundamentally describe longitudinal interactions of vehicles on the road, and contribute significantly to an understanding of traffic flow. There is an emerging trend to use data-driven method to build CF models. One challenge to the data-driven CF models is their capability to achieve optimal longitudinal driven behavior because a lot of bad driving behaviors will be learnt from human drivers by the supervised learning manner. In this study, by utilizing the deep reinforcement learning (DRL) techniques trust region policy optimization (TRPO), a DRL based CF model for electric vehicle (EV) is built. The proposed CF model can learn optimal driving behavior by itself in simulation. The experiments on following standard driving cycle show that the DRL model outperforms the traditional CF model in terms of electricity consumption.展开更多
This paper focuses on methodological issues relevant to corrosion risk prediction models.A model was developed for the prediction of corrosion rates associated with hot-dip galvanised reinforcement bar material in con...This paper focuses on methodological issues relevant to corrosion risk prediction models.A model was developed for the prediction of corrosion rates associated with hot-dip galvanised reinforcement bar material in concrete exposed to carbonation and chlorides in outdoor environment.One-year follow-up experiments,over five years,were conducted at various carbonation depths and chloride contents.The observed dependence of corrosion rate on the depth of carbonation and chloride content is complex indicating that the interaction between the carbonation and chloride influencing the corrosion.A non-linear corrosion model was proposed with statistical analysis to model the relationship between the corrosion rate and the test parameters.The main methodological contributions are(i)the proposed modeling approach able to take into account the uncertain measurement errors including unobserved systematic and random heterogeneity over different measured specimens and correlation for the same specimen across different measuring times,which best suits the measurement data;(ii)the developed model in which an interaction parameter is introduced especially to account for the contribution and the degree of the unobserved carbonation-chloride interaction.The proposed model offers greater flexibility for the modelling of measurement data than traditional models.展开更多
Under the smart grid paradigm, in the near future all consumers will be exposed to variable pricing schemes introduced by utilities. Hence, there is a need to develop algorithms which could be used by the consumers to...Under the smart grid paradigm, in the near future all consumers will be exposed to variable pricing schemes introduced by utilities. Hence, there is a need to develop algorithms which could be used by the consumers to schedule their loads. In this paper, load scheduling problem is formulated as a LCP (load commitment problem). The load model is general and can model atomic and non-atomic loads. Furthermore, it can also take into consideration the relative discomfort caused by delay in scheduling any load. For this purpose, a single parameter "uric" is introduced in the load model which captures the relative discomfort caused by delay in scheduling a particular load. Guidelines for choosing this parameter are given. All the other parameters of the proposed load model can be easily specified by the consumer. The paper shows that the general LCP can be viewed as multi-stage decision making problem or a MDP (Markov decision problem). RL (reinforcement learning) based algorithm is developed to solve this problem. The efficacy of the algorithm is investigated when the price of electricity is available in advance as well as for the case when it is random. The scalability of the approach is also investigated.展开更多
文摘In this paper, platoons of autonomous vehicles operating in urban road networks are considered. From a methodological point of view, the problem of interest consists of formally characterizing vehicle state trajectory tubes by means of routing decisions complying with traffic congestion criteria. To this end, a novel distributed control architecture is conceived by taking advantage of two methodologies: deep reinforcement learning and model predictive control. On one hand, the routing decisions are obtained by using a distributed reinforcement learning algorithm that exploits available traffic data at each road junction. On the other hand, a bank of model predictive controllers is in charge of computing the more adequate control action for each involved vehicle. Such tasks are here combined into a single framework:the deep reinforcement learning output(action) is translated into a set-point to be tracked by the model predictive controller;conversely, the current vehicle position, resulting from the application of the control move, is exploited by the deep reinforcement learning unit for improving its reliability. The main novelty of the proposed solution lies in its hybrid nature: on one hand it fully exploits deep reinforcement learning capabilities for decisionmaking purposes;on the other hand, time-varying hard constraints are always satisfied during the dynamical platoon evolution imposed by the computed routing decisions. To efficiently evaluate the performance of the proposed control architecture, a co-design procedure, involving the SUMO and MATLAB platforms, is implemented so that complex operating environments can be used, and the information coming from road maps(links,junctions, obstacles, semaphores, etc.) and vehicle state trajectories can be shared and exchanged. Finally by considering as operating scenario a real entire city block and a platoon of eleven vehicles described by double-integrator models, several simulations have been performed with the aim to put in light the main f eatures of the proposed approach. Moreover, it is important to underline that in different operating scenarios the proposed reinforcement learning scheme is capable of significantly reducing traffic congestion phenomena when compared with well-reputed competitors.
基金supported in part by the National Natural Science Foundation of China (62136008,62236002,61921004,62173251,62103104)the “Zhishan” Scholars Programs of Southeast Universitythe Fundamental Research Funds for the Central Universities (2242023K30034)。
文摘Efficient exploration in complex coordination tasks has been considered a challenging problem in multi-agent reinforcement learning(MARL). It is significantly more difficult for those tasks with latent variables that agents cannot directly observe. However, most of the existing latent variable discovery methods lack a clear representation of latent variables and an effective evaluation of the influence of latent variables on the agent. In this paper, we propose a new MARL algorithm based on the soft actor-critic method for complex continuous control tasks with confounders. It is called the multi-agent soft actor-critic with latent variable(MASAC-LV) algorithm, which uses variational inference theory to infer the compact latent variables representation space from a large amount of offline experience.Besides, we derive the counterfactual policy whose input has no latent variables and quantify the difference between the actual policy and the counterfactual policy via a distance function. This quantified difference is considered an intrinsic motivation that gives additional rewards based on how much the latent variable affects each agent. The proposed algorithm is evaluated on two collaboration tasks with confounders, and the experimental results demonstrate the effectiveness of MASAC-LV compared to other baseline algorithms.
基金Supported by National Key R&D Program of China(Grant No.2022YFB2502900)Fundamental Research Funds for the Central Universities of China,Science and Technology Commission of Shanghai Municipality of China(Grant No.21ZR1465900)Shanghai Gaofeng&Gaoyuan Project for University Academic Program Development of China.
文摘Model predictive control is widely used in the design of autonomous driving algorithms.However,its parameters are sensitive to dynamically varying driving conditions,making it difficult to be implemented into practice.As a result,this study presents a self-learning algorithm based on reinforcement learning to tune a model predictive controller.Specifically,the proposed algorithm is used to extract features of dynamic traffic scenes and adjust the weight coefficients of the model predictive controller.In this method,a risk threshold model is proposed to classify the risk level of the scenes based on the scene features,and aid in the design of the reinforcement learning reward function and ultimately improve the adaptability of the model predictive controller to real-world scenarios.The proposed algorithm is compared to a pure model predictive controller in car-following case.According to the results,the proposed method enables autonomous vehicles to adjust the priority of performance indices reasonably in different scenarios according to risk variations,showing a good scenario adaptability with safety guaranteed.
文摘The quantization algorithm compresses the original network by reducing the numerical bit width of the model,which improves the computation speed. Because different layers have different redundancy and sensitivity to databit width. Reducing the data bit width will result in a loss of accuracy. Therefore, it is difficult to determinethe optimal bit width for different parts of the network with guaranteed accuracy. Mixed precision quantizationcan effectively reduce the amount of computation while keeping the model accuracy basically unchanged. In thispaper, a hardware-aware mixed precision quantization strategy optimal assignment algorithm adapted to low bitwidth is proposed, and reinforcement learning is used to automatically predict the mixed precision that meets theconstraints of hardware resources. In the state-space design, the standard deviation of weights is used to measurethe distribution difference of data, the execution speed feedback of simulated neural network accelerator inferenceis used as the environment to limit the action space of the agent, and the accuracy of the quantization model afterretraining is used as the reward function to guide the agent to carry out deep reinforcement learning training. Theexperimental results show that the proposed method obtains a suitable model layer-by-layer quantization strategyunder the condition that the computational resources are satisfied, and themodel accuracy is effectively improved.The proposed method has strong intelligence and certain universality and has strong application potential in thefield of mixed precision quantization and embedded neural network model deployment.
文摘This work aims to study the modeling and sizing of a floor reinforced by ballasted columns. We are studying the system of reinforcement by ballasted columns because this technique is able to replace deep foundations that are technically difficult to realize and their cost is higher. The modelling and dimensioning of foundations on a ballasted column will be an important contribution to the state of the art of this method because it will highlight the mode of transfer of loads, and will expose the induced deformations by also allowing to verification criteria of bearing capacity and allowable settlement according to geometric information of the model. The columns on a substrate located at 9 m have a length of 9 m and a diameter of 40 cm and were obtained by incorporating ballast of granular class 0/31.5 of internal friction angle of 38˚ and a density weight of 21 kN/m3. The choice of this method is based on the geotechnical characteristics of the initial soil. Thus, identification and characterization tests were carried out to estimate the bearing capacity and the settlement giving respectively 125 kPa and 57 cm. These results show the ground does not have sufficient mechanical properties to withstand the loads transmitted by the tank. By adopting the reinforcement of the soil with ballasted columns, numerical calculations show that after applying a load equal to 265.1 KPa, 20 cm vertical settlement and 17 cm horizontal displacement were obtained. This is in the tolerable deformation range for our tank, namely, less than 20 cm. Analytically, in addition to reducing settlement, ballasted columns, Due to their high stiffness, they have effectively contributed to the increase of the permissible soil stress up to 257 kPa.
文摘The aim of this study is to characterize soil/reinforcement interaction in reinforced earth structures. The study showed that the internal behavior of this type of structure depends on a number of factors, including the engineering backfill, the reinforcement and the soil/reinforcement interaction. The study also showed that the soil-reinforcement interaction phenomenon is a fairly complex mechanism that depends on the applied load, the geometry of the structure, the characteristics of the soil and a set of parameters characterizing the nailing: density, number and length of reinforcements, inclination of the reinforcements in relation to the sliding surface, mechanical characteristics of the reinforcements and, in particular, the relative stiffness of the reinforcements and the soil. The results showed that the tensile forces developed in the reinforcement are not entirely reversible, and that the soil at the interface undergoes permanent deformation, leading to the appearance of irreversible tensile forces in the reinforcement.
基金in part supported by the National Natural Science Foundation of China(Grant Nos.41877217 and 51609178)the General Research Fund of the Research Grants Council(Hong Kong,China)(Grant No.17303917)the Singapore Academic Research Fund Tier 1 Grant(RG112/14).
文摘Grain size effect on rock strength is a topic of great interest in geotechnical engineering.A consensus obtained from earlier laboratory tests is that rock strength generally decreases with the increase of grain size for both silicate and carbonate rocks;however,some recent numerical results conflict with such laboratory test results.To address this intriguing issue,the effect of grain size on strength of polymineralic crystalline rock with low porosity is investigated numerically using the grain-based modeling(GBM)approach in discrete element method(DEM)by interpreting micro-cracking process in response to loading.In agreement with some previous DEM simulation results,the simulated rock strength is found to increase with increasing grain size for both homogeneous and heterogeneous models,even when the number of assembled disks in one mineral grain changes.The mechanism of strength increase with increasing grain size is mainly associated with the number of assembled smooth-joint contacts along grain interfaces and the generation of grain boundary cracks in response to loading.The grain interfaces significantly weaken the integrity of the rock model,which is similar to effects of inherent defects in real rock.As the grain size increases,fewer grain interfaces are built in the model and the rock strength becomes much higher.Hence,by solely changing the mineral grain size in a model,the mechanism of grain size effect as observed in laboratory tests cannot be replicated.To address this issue,a method of degradation of grain boundary strength parameters is used to mimic the possible mechanism of grain size effect.The simulated strength using the method becomes comparable with those obtained from laboratory tests when the heterogeneity in the rock is considered.Degradation of grain boundary parameters with increasing grain size provides a plausible explanation for the grain size effect on rock strength.
文摘A grain-based distinct element model featuring three-dimensional (3D) Voronoi tessellations (randompoly-crystals) is proposed for simulation of crack damage development in brittle rocks. The grainboundaries in poly-crystal structure produced by Voronoi tessellations can represent flaws in intact rockand allow for numerical replication of crack damage progression through initiation and propagation ofmicro-fractures along grain boundaries. The Voronoi modelling scheme has been used widely in the pastfor brittle fracture simulation of rock materials. However the difficulty of generating 3D Voronoi modelshas limited its application to two-dimensional (2D) codes. The proposed approach is implemented inNeper, an open-source engine for generation of 3D Voronoi grains, to generate block geometry files thatcan be read directly into 3DEC. A series of Unconfined Compressive Strength (UCS) tests are simulated in3DEC to verify the proposed methodology for 3D simulation of brittle fractures and to investigate therelationship between each micro-parameter and the model's macro-response. The possibility of numericalreplication of the classical U-shape strength curve for anisotropic rocks is also investigated innumerical UCS tests by using complex-shaped (elongated) grains that are cemented to one another alongtheir adjoining sides. A micro-parameter calibration procedure is established for 3D Voronoi models foraccurate replication of the mechanical behaviour of isotropic and anisotropic (containing a fabric) rocks. 2014 Institute of Rock and Soil Mechanics, Chinese Academy of Sciences. Production and hosting byElsevier B.V. All rights reserved.
基金supported by the National Natural Science Foundation of China(Grant Nos.52009126,51939008)Foundation of Hubei Key Laboratory of Blasting Engineering(Grant No.BL202104)First-class Project Special Funding of Yellow River Laboratory(No.YRL22IR08)。
文摘Non aqueous reactive polymer materials produced by the reaction of isocyanate and polyol have been widely used in infrastructure construction,which may be subjected to explosion loads during complex service conditions.The blast response of composite materials is a crucial aspect for applications in engineering structures potentially subjected to extreme loadings.In this work,damage caused to rebar reinforced polymer slabs by surface explosive charges was studied experimentally and numerically.A total of 6 field tests were carried out to investigate the performances of the failure modes of rebar reinforced polymer slabs under contact and near-field explosions.The influence of explosive quantity(10-40 g)and stand-off distances(0-20 cm)at the damage modes were studied.The results show that the failure modes of rebar reinforced polymer slabs under near-field explosion mainly were bending and surface spalling,while under the impact of contact explosion,the failure modes were craters of the top surface,spalling of the bottom surface,and middle perforation.Furthermore,a detailed fully coupled model was developed and validated with the test data.The influences of explosive quantity and slab thickness on rebar reinforced polymer slabs under contact explosion were studied.Based on this,the calculation formula between breach diameter,explosive quantity,and slab thickness is fitted.
文摘The objective of this paper is to develop a methodology for calibration of a discrete element grain-based model(GBM)to replicate the hydro-mechanical properties of a brittle rock measured in the laboratory,and to apply the calibrated model to simulating the formation of excavation damage zone(EDZ)around underground excavations.Firstly,a new cohesive crack model is implemented into the universal distinct element code(UDEC)to control the fracturing behaviour of materials under various loading modes.Next,a methodology for calibration of the components of the UDEC-Voronoi model is discussed.The role of connectivity of induced microcracks on increasing the permeability of laboratory-scale samples is investigated.The calibrated samples are used to investigate the influence of pore fluid pressure on weakening the drained strength of the laboratory-scale rock.The validity of the Terzaghi’s effective stress law for the drained peak strength of low-porosity rock is tested by performing a series of biaxial compression test simulations.Finally,the evolution of damage and pore pressure around two unsupported circular tunnels in crystalline granitic rock is studied.
基金supported by the Opening Project of Key Laboratory of Coastal Disaster and Defence of Ministry of Education, Hohai Universitythe Natural Science Fund of Hohai University (No. 2008432111).
文摘A nonlinear finite element model (FEM) of the corrosion of steel reinforcement in concrete has been successfully developed on the basis of mathematical analysis of the electrochemical process of steel corrosion in concrete. The influences of the area ratio and the Tafel constants of the anode and cathode on the potential and corrosion current density have been examined with the model. It has been found that the finite element calculation is more suitable for assessing the corrosion condition of steel reinforcement than ordinary electrochemical techniques due to the fact that FEM can obtain the distributions of potential and corrosion current density on the steel surface. In addition, the local corrosion of steel reinforcement in concrete is strengthened with the decrease of both the area ratio and the Tafel constants. These results provide valuable information to the researchers who investigate steel corrosion.
基金funded by the National Natural Science Foundation of China(Nos.51408346,51978389)the Systematic Project of Guangxi Key Laboratory of Disaster Prevention and Structural Safety(2019ZDK035)the Opening Foundation of Shandong Key Laboratory of Civil Engineering Disaster Prevention and Mitigation(No.CDPM2019KF12).
文摘This study investigates the bond between seawater scoria aggregate concrete(SSAC)and stainless reinforcement(SR)through a series of pull-out tests.A total of 39 specimens,considering five experimental parameters—con-crete type(SSAC,ordinary concrete(OC)and seawater coral aggregate concrete(SCAC)),reinforcement type(SR,ordinary reinforcement(OR)),bond length(3,5 and 8 times bar diameter),concrete strength(C25 and C30)and concrete cover thickness(42 and 67 mm)—were prepared.The typical bond properties(failure pattern,bond strength,bond-slip curves and bond stress distribution,etc.)of seawater scoria aggregate concrete-stainless rein-forcement(SSAC-SR)specimen were systematically studied.Generally,the failure pattern changed with the con-crete type used,and the failure surface of SSAC specimen was different from that of OC specimen.SSAC enhanced the bond strength of specimen,while its effect on the deformation of SSAC-SR was negative.On aver-age,the peak slip of SSAC specimens was 20%lower while the bond strength was 6.7%higher compared to OC specimens under the similar conditions.The effects of variables on the bond strength of SSAC–SR in increasing order are concrete type,bond length,concrete strength and cover thickness.The bond-slip curve of SSAC-SR specimen consisted of micro-slipping,slipping and declining stages.It can be obtained that SSAC reduced the curve curvature of bond-slip,and the decline of curve became steep after adopting SR.The typical distribution of bond stress along bond length changed with the types of concrete and reinforcement used.Finally,a specific expression of the bond stress-slip curve considering the effects of various variables was established,which could provide a basis for the practical application of reinforced SSAC.
基金His work is supported by Scientific research planning project of Jilin Provincial Department of education in 2020:Analysis of the impact of industrial upgrading on employment of college students in Jilin Province(No.JJKH20200505JY).
文摘With the continuous development of artificial intelligence technology,its application field has gradually expanded.To further apply the deep reinforcement learning technology to the field of dynamic pricing,we build an intelligent dynamic pricing system,introduce the reinforcement learning technology related to dynamic pricing,and introduce existing research on the number of suppliers(single supplier and multiple suppliers),environmental models,and selection algorithms.A two-period dynamic pricing game model is designed to assess the optimal pricing strategy for e-commerce platforms under two market conditions and two consumer participation conditions.The first step is to analyze the pricing strategies of e-commerce platforms in mature markets,analyze the optimal pricing and profits of various enterprises under different strategy combinations,compare different market equilibriums and solve the Nash equilibrium.Then,assuming that all consumers are naive in the market,the pricing strategy of the duopoly e-commerce platform in emerging markets is analyzed.By comparing and analyzing the optimal pricing and total profit of each enterprise under different strategy combinations,the subgame refined Nash equilibrium is solved.Finally,assuming that the market includes all experienced consumers,the pricing strategy of the duopoly e-commerce platform in emerging markets is analyzed.
文摘The dynamic responses and generated voltage in a curved sandwich beam with glass reinforced laminate(GRL)layers and a pliable core in the presence of a piezoelectric layer under low-velocity impact(LVI)are investigated.The current study aims to carry out a dynamic analysis on the sandwich beam when the impactor hits the top face sheet with an initial velocity.For the layer analysis,the high-order shear deformation theory(HSDT)and Frostig's second model for the displacement fields of the core layer are used.The classical non-adhesive elastic contact theory and Hunter's principle are used to calculate the dynamic responses in terms of time.In order to validate the analytical method,the outcomes of the current investigation are compared with those gained by the experimental tests carried out by other researchers for a rectangular composite plate subject to the LVI.Finite element(FE)simulations are conducted by means of the ABAQUS software.The effects of the parameters such as foam modulus,layer material,fiber angle,impactor mass,and its velocity on the generated voltage are reviewed.
基金the National Natural Science Foundation of China(Grant Nos.12072262 and 11672231).
文摘It is shown that we can control spatiotemporal chaos in the Frenkel-Kontorova(FK)model by a model-free control method based on reinforcement learning.The method uses Q-learning to find optimal control strategies based on the reward feedback from the environment that maximizes its performance.The optimal control strategies are recorded in a Q-table and then employed to implement controllers.The advantage of the method is that it does not require an explicit knowledge of the system,target states,and unstable periodic orbits.All that we need is the parameters that we are trying to control and an unknown simulation model that represents the interactive environment.To control the FK model,we employ the perturbation policy on two different kinds of parameters,i.e.,the pendulum lengths and the phase angles.We show that both of the two perturbation techniques,i.e.,changing the lengths and changing their phase angles,can suppress chaos in the system and make it create the periodic patterns.The form of patterns depends on the initial values of the angular displacements and velocities.In particular,we show that the pinning control strategy,which only changes a small number of lengths or phase angles,can be put into effect.
基金Foundation item: Supported by the National Natural Science Foundation of China under Grant No.61100005.
文摘Conducting hydrodynamic and physical motion simulation tests using a large-scale self-propelled model under actual wave conditions is an important means for researching environmental adaptability of ships. During the navigation test of the self-propelled model, the complex environment including various port facilities, navigation facilities, and the ships nearby must be considered carefully, because in this dense environment the impact of sea waves and winds on the model is particularly significant. In order to improve the security of the self-propelled model, this paper introduces the Q learning based on reinforcement learning combined with chaotic ideas for the model's collision avoidance, in order to improve the reliability of the local path planning. Simulation and sea test results show that this algorithm is a better solution for collision avoidance of the self navigation model under the interference of sea winds and waves with good adaptability.
文摘Despite the salience of misinformation and its consequences, there still lies a tremendous gap in research on the broader tendencies in collective cognition that compels individuals to spread misinformation so excessively. This study examined social learning as an antecedent of engaging with misinformation online. Using data released by Twitter for academic research in 2018, Tweets that included URL news links of both known misinformation and reliable domains were analyzed. Lindström’s computational reinforcement learning model was adapted as an expression of social learning, where a Twitter user’s posting frequency of news links is dependent on the relative engagement they receive in consequence. The research found that those who shared misinformation were highly sensitive to social reward. Inflation of positive social feedback was associated with a decrease in posting latency, indicating that users that posted misinformation were strongly influenced by social learning. However, the posting frequency of authentic news sharers remained fixed, even after receiving an increase in relative and absolute engagement. The results identified social learning is a contributor to the spread of misinformation online. In addition, behavior driven by social validation suggests a positive correlation between posting frequency, gratification received from posting, and a growing mental health dependency on social media. Developing interventions for spreading misinformation online may profit by assessing which online environments amplify social learning, particularly the conditions under which misinformation proliferates.
基金supported by national natural science foundation of China (61620106002 and 5170520). The authors acknowledge the help of Renzong Lian, who has helped us to perform traffic simulation using SUMO. The parameters of Roewe E50 is provided by SAIC Motor.
文摘Car following (CF) models are an appealing research area because they fundamentally describe longitudinal interactions of vehicles on the road, and contribute significantly to an understanding of traffic flow. There is an emerging trend to use data-driven method to build CF models. One challenge to the data-driven CF models is their capability to achieve optimal longitudinal driven behavior because a lot of bad driving behaviors will be learnt from human drivers by the supervised learning manner. In this study, by utilizing the deep reinforcement learning (DRL) techniques trust region policy optimization (TRPO), a DRL based CF model for electric vehicle (EV) is built. The proposed CF model can learn optimal driving behavior by itself in simulation. The experiments on following standard driving cycle show that the DRL model outperforms the traditional CF model in terms of electricity consumption.
基金study is financed by the Academy of Finland(Grant number 324023)Dr.Esko Sistonen provided the experimental data.
文摘This paper focuses on methodological issues relevant to corrosion risk prediction models.A model was developed for the prediction of corrosion rates associated with hot-dip galvanised reinforcement bar material in concrete exposed to carbonation and chlorides in outdoor environment.One-year follow-up experiments,over five years,were conducted at various carbonation depths and chloride contents.The observed dependence of corrosion rate on the depth of carbonation and chloride content is complex indicating that the interaction between the carbonation and chloride influencing the corrosion.A non-linear corrosion model was proposed with statistical analysis to model the relationship between the corrosion rate and the test parameters.The main methodological contributions are(i)the proposed modeling approach able to take into account the uncertain measurement errors including unobserved systematic and random heterogeneity over different measured specimens and correlation for the same specimen across different measuring times,which best suits the measurement data;(ii)the developed model in which an interaction parameter is introduced especially to account for the contribution and the degree of the unobserved carbonation-chloride interaction.The proposed model offers greater flexibility for the modelling of measurement data than traditional models.
文摘Under the smart grid paradigm, in the near future all consumers will be exposed to variable pricing schemes introduced by utilities. Hence, there is a need to develop algorithms which could be used by the consumers to schedule their loads. In this paper, load scheduling problem is formulated as a LCP (load commitment problem). The load model is general and can model atomic and non-atomic loads. Furthermore, it can also take into consideration the relative discomfort caused by delay in scheduling any load. For this purpose, a single parameter "uric" is introduced in the load model which captures the relative discomfort caused by delay in scheduling a particular load. Guidelines for choosing this parameter are given. All the other parameters of the proposed load model can be easily specified by the consumer. The paper shows that the general LCP can be viewed as multi-stage decision making problem or a MDP (Markov decision problem). RL (reinforcement learning) based algorithm is developed to solve this problem. The efficacy of the algorithm is investigated when the price of electricity is available in advance as well as for the case when it is random. The scalability of the approach is also investigated.