Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and ...Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.展开更多
In order to address the output feedback issue for linear discrete-time systems, this work suggests a brand-new adaptive dynamic programming(ADP) technique based on the internal model principle(IMP). The proposed metho...In order to address the output feedback issue for linear discrete-time systems, this work suggests a brand-new adaptive dynamic programming(ADP) technique based on the internal model principle(IMP). The proposed method, termed as IMP-ADP, does not require complete state feedback-merely the measurement of input and output data. More specifically, based on the IMP, the output control problem can first be converted into a stabilization problem. We then design an observer to reproduce the full state of the system by measuring the inputs and outputs. Moreover, this technique includes both a policy iteration algorithm and a value iteration algorithm to determine the optimal feedback gain without using a dynamic system model. It is important that with this concept one does not need to solve the regulator equation. Finally, this control method was tested on an inverter system of grid-connected LCLs to demonstrate that the proposed method provides the desired performance in terms of both tracking and disturbance rejection.展开更多
An optimal tracking control problem for a class of nonlinear systems with guaranteed performance and asymmetric input constraints is discussed in this paper.The control policy is implemented by adaptive dynamic progra...An optimal tracking control problem for a class of nonlinear systems with guaranteed performance and asymmetric input constraints is discussed in this paper.The control policy is implemented by adaptive dynamic programming(ADP)algorithm under two event-based triggering mechanisms.It is often challenging to design an optimal control law due to the system deviation caused by asymmetric input constraints.First,a prescribed performance control technique is employed to guarantee the tracking errors within predetermined boundaries.Subsequently,considering the asymmetric input constraints,a discounted non-quadratic cost function is introduced.Moreover,in order to reduce controller updates,an event-triggered control law is developed for ADP algorithm.After that,to further simplify the complexity of controller design,this work is extended to a self-triggered case for relaxing the need for continuous signal monitoring by hardware devices.By employing the Lyapunov method,the uniform ultimate boundedness of all signals is proved to be guaranteed.Finally,a simulation example on a mass–spring–damper system subject to asymmetric input constraints is provided to validate the effectiveness of the proposed control scheme.展开更多
Aircraft designers strive to achieve optimal weight-reliability tradeoffs while designing an aircraft. Since aircraft wing skins account for more than fifty percent of their structural weight, aircraft wings must be d...Aircraft designers strive to achieve optimal weight-reliability tradeoffs while designing an aircraft. Since aircraft wing skins account for more than fifty percent of their structural weight, aircraft wings must be designed with utmost care and attention in terms of material types and thickness configurations. In particular, the selection of thickness at each location of the aircraft wing skin is the most consequential task for aircraft designers. To accomplish this, we present discrete mathematical programming models to obtain optimal thicknesses either to minimize weight or to maximize reliability. We present theoretical results for the decomposition of these discrete mathematical programming models to reduce computer memory requirements and facilitate the use of dynamic programming for design purposes. In particular, a decomposed version of the weight minimization problem is solved for an aircraft wing with thirty locations (or panels) and fourteen thickness choices for each location to yield an optimal minimum weight design.展开更多
This paper presents a novel cooperative value iteration(VI)-based adaptive dynamic programming method for multi-player differential game models with a convergence proof.The players are divided into two groups in the l...This paper presents a novel cooperative value iteration(VI)-based adaptive dynamic programming method for multi-player differential game models with a convergence proof.The players are divided into two groups in the learning process and adapt their policies sequentially.Our method removes the dependence of admissible initial policies,which is one of the main drawbacks of the PI-based frameworks.Furthermore,this algorithm enables the players to adapt their control policies without full knowledge of others’ system parameters or control laws.The efficacy of our method is illustrated by three examples.展开更多
Due to the overwhelming characteristics of the Internet of Things(IoT)and its adoption in approximately every aspect of our lives,the concept of individual devices’privacy has gained prominent attention from both cus...Due to the overwhelming characteristics of the Internet of Things(IoT)and its adoption in approximately every aspect of our lives,the concept of individual devices’privacy has gained prominent attention from both customers,i.e.,people,and industries as wearable devices collect sensitive information about patients(both admitted and outdoor)in smart healthcare infrastructures.In addition to privacy,outliers or noise are among the crucial issues,which are directly correlated with IoT infrastructures,as most member devices are resource-limited and could generate or transmit false data that is required to be refined before processing,i.e.,transmitting.Therefore,the development of privacy-preserving information fusion techniques is highly encouraged,especially those designed for smart IoT-enabled domains.In this paper,we are going to present an effective hybrid approach that can refine raw data values captured by the respectivemember device before transmission while preserving its privacy through the utilization of the differential privacy technique in IoT infrastructures.Sliding window,i.e.,δi based dynamic programming methodology,is implemented at the device level to ensure precise and accurate detection of outliers or noisy data,and refine it prior to activation of the respective transmission activity.Additionally,an appropriate privacy budget has been selected,which is enough to ensure the privacy of every individualmodule,i.e.,a wearable device such as a smartwatch attached to the patient’s body.In contrast,the end module,i.e.,the server in this case,can extract important information with approximately the maximum level of accuracy.Moreover,refined data has been processed by adding an appropriate nose through the Laplace mechanism to make it useless or meaningless for the adversary modules in the IoT.The proposed hybrid approach is trusted from both the device’s privacy and the integrity of the transmitted information perspectives.Simulation and analytical results have proved that the proposed privacy-preserving information fusion technique for wearable devices is an ideal solution for resource-constrained infrastructures such as IoT and the Internet ofMedical Things,where both device privacy and information integrity are important.Finally,the proposed hybrid approach is proven against well-known intruder attacks,especially those related to the privacy of the respective device in IoT infrastructures.展开更多
The plug-in hybrid vehicles(PHEV)technology can effectively address the issues of poor dynamics and higher energy consumption commonly found in traditional mining dump trucks.Meanwhile,plug-in hybrid electric trucks c...The plug-in hybrid vehicles(PHEV)technology can effectively address the issues of poor dynamics and higher energy consumption commonly found in traditional mining dump trucks.Meanwhile,plug-in hybrid electric trucks can achieve excellent fuel economy through efficient energy management strategies(EMS).Therefore,a series hybrid system is constructed based on a 100-ton mining dump truck in this paper.And inspired by the dynamic programming(DP)algorithm,a predictive equivalent consumption minimization strategy(P-ECMS)based on the DP optimization result is proposed.Based on the optimal control manifold and the SOC reference trajectory obtained by the DP algorithm,the P-ECMS strategy performs real-time stage parameter optimization to obtain the optimal equivalent factor(EF).Finally,applying the equivalent consumption minimization strategy(ECMS)realizes real-time control.The simulation results show that the equivalent fuel consumption of the P-ECMS strategy under the experimentally collected mining cycle conditions is 150.8 L/100 km,which is 10.9%less than that of the common CDCS strategy(169.3 L/100 km),and achieves 99.47%of the fuel saving effect of the DP strategy(150 L/100 km).展开更多
Over the past decade, Graphics Processing Units (GPUs) have revolutionized high-performance computing, playing pivotal roles in advancing fields like IoT, autonomous vehicles, and exascale computing. Despite these adv...Over the past decade, Graphics Processing Units (GPUs) have revolutionized high-performance computing, playing pivotal roles in advancing fields like IoT, autonomous vehicles, and exascale computing. Despite these advancements, efficiently programming GPUs remains a daunting challenge, often relying on trial-and-error optimization methods. This paper introduces an optimization technique for CUDA programs through a novel Data Layout strategy, aimed at restructuring memory data arrangement to significantly enhance data access locality. Focusing on the dynamic programming algorithm for chained matrix multiplication—a critical operation across various domains including artificial intelligence (AI), high-performance computing (HPC), and the Internet of Things (IoT)—this technique facilitates more localized access. We specifically illustrate the importance of efficient matrix multiplication in these areas, underscoring the technique’s broader applicability and its potential to address some of the most pressing computational challenges in GPU-accelerated applications. Our findings reveal a remarkable reduction in memory consumption and a substantial 50% decrease in execution time for CUDA programs utilizing this technique, thereby setting a new benchmark for optimization in GPU computing.展开更多
This paper studies the problem of optimal parallel tracking control for continuous-time general nonlinear systems.Unlike existing optimal state feedback control,the control input of the optimal parallel control is int...This paper studies the problem of optimal parallel tracking control for continuous-time general nonlinear systems.Unlike existing optimal state feedback control,the control input of the optimal parallel control is introduced into the feedback system.However,due to the introduction of control input into the feedback system,the optimal state feedback control methods can not be applied directly.To address this problem,an augmented system and an augmented performance index function are proposed firstly.Thus,the general nonlinear system is transformed into an affine nonlinear system.The difference between the optimal parallel control and the optimal state feedback control is analyzed theoretically.It is proven that the optimal parallel control with the augmented performance index function can be seen as the suboptimal state feedback control with the traditional performance index function.Moreover,an adaptive dynamic programming(ADP)technique is utilized to implement the optimal parallel tracking control using a critic neural network(NN)to approximate the value function online.The stability analysis of the closed-loop system is performed using the Lyapunov theory,and the tracking error and NN weights errors are uniformly ultimately bounded(UUB).Also,the optimal parallel controller guarantees the continuity of the control input under the circumstance that there are finite jump discontinuities in the reference signals.Finally,the effectiveness of the developed optimal parallel control method is verified in two cases.展开更多
The residential energy scheduling of solar energy is an important research area of smart grid. On the demand side, factors such as household loads, storage batteries, the outside public utility grid and renewable ener...The residential energy scheduling of solar energy is an important research area of smart grid. On the demand side, factors such as household loads, storage batteries, the outside public utility grid and renewable energy resources, are combined together as a nonlinear, time-varying, indefinite and complex system, which is difficult to manage or optimize. Many nations have already applied the residential real-time pricing to balance the burden on their grid. In order to enhance electricity efficiency of the residential micro grid, this paper presents an action dependent heuristic dynamic programming(ADHDP) method to solve the residential energy scheduling problem. The highlights of this paper are listed below. First,the weather-type classification is adopted to establish three types of programming models based on the features of the solar energy. In addition, the priorities of different energy resources are set to reduce the loss of electrical energy transmissions.Second, three ADHDP-based neural networks, which can update themselves during applications, are designed to manage the flows of electricity. Third, simulation results show that the proposed scheduling method has effectively reduced the total electricity cost and improved load balancing process. The comparison with the particle swarm optimization algorithm further proves that the present method has a promising effect on energy management to save cost.展开更多
Based on service-oriented architecture(SOA),a Bellman-dynamic-programming-based approach of service recovery decision-making is proposed to make valid recovery decisions.Both the attribute and the process of service...Based on service-oriented architecture(SOA),a Bellman-dynamic-programming-based approach of service recovery decision-making is proposed to make valid recovery decisions.Both the attribute and the process of services in the controllable distributed information system are analyzed as the preparatory work.Using the idea of service composition as a reference,the approach translates the recovery decision-making into a planning problem regarding artificial intelligence (AI) through two steps.The first is the self-organization based on a logical view of the network,and the second is the definition of evaluation standards.Applying Bellman dynamic programming to solve the planning problem,the approach offers timely emergency response and optimal recovery source selection,meeting multiple QoS (quality of service)requirements.Experimental results demonstrate the rationality and optimality of the approach,and the theoretical analysis of its computational complexity and the comparison with conventional methods exhibit its high efficiency.展开更多
This paper researches the adaptive scheduling problem of multiple electronic support measures(multi-ESM) in a ground moving radar targets tracking application. It is a sequential decision-making problem in uncertain e...This paper researches the adaptive scheduling problem of multiple electronic support measures(multi-ESM) in a ground moving radar targets tracking application. It is a sequential decision-making problem in uncertain environment. For adaptive selection of appropriate ESMs, we generalize an approximate dynamic programming(ADP) framework to the dynamic case. We define the environment model and agent model, respectively. To handle the partially observable challenge, we apply the unsented Kalman filter(UKF) algorithm for belief state estimation. To reduce the computational burden, a simulation-based approach rollout with a redesigned base policy is proposed to approximate the long-term cumulative reward. Meanwhile, Monte Carlo sampling is combined into the rollout to estimate the expectation of the rewards. The experiments indicate that our method outperforms other strategies due to its better performance in larger-scale problems.展开更多
Unmanned aerial vehicles(UAVs) may play an important role in data collection and offloading in vast areas deploying wireless sensor networks, and the UAV’s action strategy has a vital influence on achieving applicabi...Unmanned aerial vehicles(UAVs) may play an important role in data collection and offloading in vast areas deploying wireless sensor networks, and the UAV’s action strategy has a vital influence on achieving applicability and computational complexity. Dynamic programming(DP) has a good application in the path planning of UAV, but there are problems in the applicability of special terrain environment and the complexity of the algorithm.Based on the analysis of DP, this paper proposes a hierarchical directional DP(DDP) algorithm based on direction determination and hierarchical model. We compare our methods with Q-learning and DP algorithm by experiments, and the results show that our method can improve the terrain applicability, meanwhile greatly reduce the computational complexity.展开更多
A stochastic resource allocation model, based on the principles of Markov decision processes(MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource...A stochastic resource allocation model, based on the principles of Markov decision processes(MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource requests for both instant and future needs. The considered framework can handle two types of reservations(i.e., specified and unspecified time interval reservation requests), and implement an overbooking business strategy to further increase business revenues. The resulting dynamic pricing problems can be regarded as sequential decision-making problems under uncertainty, which is solved by means of stochastic dynamic programming(DP) based algorithms. In this regard, Bellman’s backward principle of optimality is exploited in order to provide all the implementation mechanisms for the proposed reservation pricing algorithm. The curse of dimensionality, as the inevitable issue of the DP both for instant resource requests and future resource reservations,occurs. In particular, an approximate dynamic programming(ADP) technique based on linear function approximations is applied to solve such scalability issues. Several examples are provided to show the effectiveness of the proposed approach.展开更多
A method of minimizing rankings inconsistency is proposed for a decision-making problem with rankings of alternatives given by multiple decision makers according to multiple criteria. For each criteria, at first, the ...A method of minimizing rankings inconsistency is proposed for a decision-making problem with rankings of alternatives given by multiple decision makers according to multiple criteria. For each criteria, at first, the total inconsistency between the rankings of all alternatives for the group and the ones for every decision maker is defined after the decision maker weights in respect to the criteria are considered. Similarly, the total inconsistency between their final rankings for the group and the ones under every criteria is determined after the criteria weights are taken into account. Then two nonlinear integer programming models minimizing respectively the two total inconsistencies above are developed and then transformed to two dynamic programming models to obtain separately the rankings of all alternatives for the group with respect to each criteria and their final rankings. A supplier selection case illustrated the proposed method, and some discussions on the results verified its effectiveness. This work develops a new measurement of ordinal preferences’ inconsistency in multi-criteria group decision-making (MCGDM) and extends the cook-seiford social selection function to MCGDM considering weights of criteria and decision makers and can obtain unique ranking result.展开更多
Color inconsistency between views is an important problem to be solved in multi-view video systems. A multi-view video color correction method using dynamic programming is proposed. Three-dimensional histograms are co...Color inconsistency between views is an important problem to be solved in multi-view video systems. A multi-view video color correction method using dynamic programming is proposed. Three-dimensional histograms are constructed with sequential conditional probability in HSI color space. Then, dynamic programming is used to seek the best color mapping relation with the minimum cost path between target image histogram and source image histogram. Finally, video tracking technique is performed to correct multi-view video. Experimental results show that the proposed method can obtain better subjective and objective performance in color correction.展开更多
In short-term operation of natural gas network,the impact of demand uncertainty is not negligible.To address this issue we propose a two-stage robust model for power cost minimization problem in gunbarrel natural gas ...In short-term operation of natural gas network,the impact of demand uncertainty is not negligible.To address this issue we propose a two-stage robust model for power cost minimization problem in gunbarrel natural gas networks.The demands between pipelines and compressor stations are uncertain with a budget parameter,since it is unlikely that all the uncertain demands reach the maximal deviation simultaneously.During solving the two-stage robust model we encounter a bilevel problem which is challenging to solve.We formulate it as a multi-dimensional dynamic programming problem and propose approximate dynamic programming methods to accelerate the calculation.Numerical results based on real network in China show that we obtain a speed gain of 7 times faster in average without compromising optimality compared with original dynamic programming algorithm.Numerical results also verify the advantage of robust model compared with deterministic model when facing uncertainties.These findings offer short-term operation methods for gunbarrel natural gas network management to handle with uncertainties.展开更多
An adaptive weighted stereo matching algorithm with multilevel and bidirectional dynamic programming based on ground control points (GCPs) is presented. To decrease time complexity without losing matching precision,...An adaptive weighted stereo matching algorithm with multilevel and bidirectional dynamic programming based on ground control points (GCPs) is presented. To decrease time complexity without losing matching precision, using a multilevel search scheme, the coarse matching is processed in typical disparity space image, while the fine matching is processed in disparity-offset space image. In the upper level, GCPs are obtained by enhanced volumetric iterative algorithm enforcing the mutual constraint and the threshold constraint. Under the supervision of the highly reliable GCPs, bidirectional dynamic programming framework is employed to solve the inconsistency in the optimization path. In the lower level, to reduce running time, disparity-offset space is proposed to efficiently achieve the dense disparity image. In addition, an adaptive dual support-weight strategy is presented to aggregate matching cost, which considers photometric and geometric information. Further, post-processing algorithm can ameliorate disparity results in areas with depth discontinuities and related by occlusions using dual threshold algorithm, where missing stereo information is substituted from surrounding regions. To demonstrate the effectiveness of the algorithm, we present the two groups of experimental results for four widely used standard stereo data sets, including discussion on performance and comparison with other methods, which show that the algorithm has not only a fast speed, but also significantly improves the efficiency of holistic optimization.展开更多
A policy iteration algorithm of adaptive dynamic programming(ADP) is developed to solve the optimal tracking control for a class of discrete-time chaotic systems. By system transformations, the optimal tracking prob...A policy iteration algorithm of adaptive dynamic programming(ADP) is developed to solve the optimal tracking control for a class of discrete-time chaotic systems. By system transformations, the optimal tracking problem is transformed into an optimal regulation one. The policy iteration algorithm for discrete-time chaotic systems is first described. Then,the convergence and admissibility properties of the developed policy iteration algorithm are presented, which show that the transformed chaotic system can be stabilized under an arbitrary iterative control law and the iterative performance index function simultaneously converges to the optimum. By implementing the policy iteration algorithm via neural networks,the developed optimal tracking control scheme for chaotic systems is verified by a simulation.展开更多
A certain number of considerations should be taken into account in the dynamic control of robot manipulators as highly complex non-linear systems.In this article,we provide a detailed presentation of the mechanical an...A certain number of considerations should be taken into account in the dynamic control of robot manipulators as highly complex non-linear systems.In this article,we provide a detailed presentation of the mechanical and electrical impli- cations of robots equipped with DC motor actuators.This model takes into account all non-linear aspects of the system.Then,we develop computational algorithms for optimal control based on dynamic programming.The robot's trajectory must be predefined,but performance criteria and constraints applying to the system are not limited and we may adapt them freely to the robot and the task being studied.As an example,a manipulator arm with 3 degrees of freedom is analyzed.展开更多
基金supported in part by the National Natural Science Foundation of China(62222301, 62073085, 62073158, 61890930-5, 62021003)the National Key Research and Development Program of China (2021ZD0112302, 2021ZD0112301, 2018YFC1900800-5)Beijing Natural Science Foundation (JQ19013)。
文摘Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.
基金supported by the National Science Fund for Distinguished Young Scholars (62225303)the Fundamental Research Funds for the Central Universities (buctrc202201)+1 种基金China Scholarship Council,and High Performance Computing PlatformCollege of Information Science and Technology,Beijing University of Chemical Technology。
文摘In order to address the output feedback issue for linear discrete-time systems, this work suggests a brand-new adaptive dynamic programming(ADP) technique based on the internal model principle(IMP). The proposed method, termed as IMP-ADP, does not require complete state feedback-merely the measurement of input and output data. More specifically, based on the IMP, the output control problem can first be converted into a stabilization problem. We then design an observer to reproduce the full state of the system by measuring the inputs and outputs. Moreover, this technique includes both a policy iteration algorithm and a value iteration algorithm to determine the optimal feedback gain without using a dynamic system model. It is important that with this concept one does not need to solve the regulator equation. Finally, this control method was tested on an inverter system of grid-connected LCLs to demonstrate that the proposed method provides the desired performance in terms of both tracking and disturbance rejection.
基金supported in part by the National Natural Science Foundation of China(62033003,62003093,62373113,U23A20341,U21A20522)the Natural Science Foundation of Guangdong Province,China(2023A1515011527,2022A1515011506).
文摘An optimal tracking control problem for a class of nonlinear systems with guaranteed performance and asymmetric input constraints is discussed in this paper.The control policy is implemented by adaptive dynamic programming(ADP)algorithm under two event-based triggering mechanisms.It is often challenging to design an optimal control law due to the system deviation caused by asymmetric input constraints.First,a prescribed performance control technique is employed to guarantee the tracking errors within predetermined boundaries.Subsequently,considering the asymmetric input constraints,a discounted non-quadratic cost function is introduced.Moreover,in order to reduce controller updates,an event-triggered control law is developed for ADP algorithm.After that,to further simplify the complexity of controller design,this work is extended to a self-triggered case for relaxing the need for continuous signal monitoring by hardware devices.By employing the Lyapunov method,the uniform ultimate boundedness of all signals is proved to be guaranteed.Finally,a simulation example on a mass–spring–damper system subject to asymmetric input constraints is provided to validate the effectiveness of the proposed control scheme.
文摘Aircraft designers strive to achieve optimal weight-reliability tradeoffs while designing an aircraft. Since aircraft wing skins account for more than fifty percent of their structural weight, aircraft wings must be designed with utmost care and attention in terms of material types and thickness configurations. In particular, the selection of thickness at each location of the aircraft wing skin is the most consequential task for aircraft designers. To accomplish this, we present discrete mathematical programming models to obtain optimal thicknesses either to minimize weight or to maximize reliability. We present theoretical results for the decomposition of these discrete mathematical programming models to reduce computer memory requirements and facilitate the use of dynamic programming for design purposes. In particular, a decomposed version of the weight minimization problem is solved for an aircraft wing with thirty locations (or panels) and fourteen thickness choices for each location to yield an optimal minimum weight design.
基金supported by the Industry-University-Research Cooperation Fund Project of the Eighth Research Institute of China Aerospace Science and Technology Corporation (USCAST2022-11)Aeronautical Science Foundation of China (20220001057001)。
文摘This paper presents a novel cooperative value iteration(VI)-based adaptive dynamic programming method for multi-player differential game models with a convergence proof.The players are divided into two groups in the learning process and adapt their policies sequentially.Our method removes the dependence of admissible initial policies,which is one of the main drawbacks of the PI-based frameworks.Furthermore,this algorithm enables the players to adapt their control policies without full knowledge of others’ system parameters or control laws.The efficacy of our method is illustrated by three examples.
基金Ministry of Higher Education of Malaysia under theResearch GrantLRGS/1/2019/UKM-UKM/5/2 and Princess Nourah bint Abdulrahman University for financing this researcher through Supporting Project Number(PNURSP2024R235),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Due to the overwhelming characteristics of the Internet of Things(IoT)and its adoption in approximately every aspect of our lives,the concept of individual devices’privacy has gained prominent attention from both customers,i.e.,people,and industries as wearable devices collect sensitive information about patients(both admitted and outdoor)in smart healthcare infrastructures.In addition to privacy,outliers or noise are among the crucial issues,which are directly correlated with IoT infrastructures,as most member devices are resource-limited and could generate or transmit false data that is required to be refined before processing,i.e.,transmitting.Therefore,the development of privacy-preserving information fusion techniques is highly encouraged,especially those designed for smart IoT-enabled domains.In this paper,we are going to present an effective hybrid approach that can refine raw data values captured by the respectivemember device before transmission while preserving its privacy through the utilization of the differential privacy technique in IoT infrastructures.Sliding window,i.e.,δi based dynamic programming methodology,is implemented at the device level to ensure precise and accurate detection of outliers or noisy data,and refine it prior to activation of the respective transmission activity.Additionally,an appropriate privacy budget has been selected,which is enough to ensure the privacy of every individualmodule,i.e.,a wearable device such as a smartwatch attached to the patient’s body.In contrast,the end module,i.e.,the server in this case,can extract important information with approximately the maximum level of accuracy.Moreover,refined data has been processed by adding an appropriate nose through the Laplace mechanism to make it useless or meaningless for the adversary modules in the IoT.The proposed hybrid approach is trusted from both the device’s privacy and the integrity of the transmitted information perspectives.Simulation and analytical results have proved that the proposed privacy-preserving information fusion technique for wearable devices is an ideal solution for resource-constrained infrastructures such as IoT and the Internet ofMedical Things,where both device privacy and information integrity are important.Finally,the proposed hybrid approach is proven against well-known intruder attacks,especially those related to the privacy of the respective device in IoT infrastructures.
文摘The plug-in hybrid vehicles(PHEV)technology can effectively address the issues of poor dynamics and higher energy consumption commonly found in traditional mining dump trucks.Meanwhile,plug-in hybrid electric trucks can achieve excellent fuel economy through efficient energy management strategies(EMS).Therefore,a series hybrid system is constructed based on a 100-ton mining dump truck in this paper.And inspired by the dynamic programming(DP)algorithm,a predictive equivalent consumption minimization strategy(P-ECMS)based on the DP optimization result is proposed.Based on the optimal control manifold and the SOC reference trajectory obtained by the DP algorithm,the P-ECMS strategy performs real-time stage parameter optimization to obtain the optimal equivalent factor(EF).Finally,applying the equivalent consumption minimization strategy(ECMS)realizes real-time control.The simulation results show that the equivalent fuel consumption of the P-ECMS strategy under the experimentally collected mining cycle conditions is 150.8 L/100 km,which is 10.9%less than that of the common CDCS strategy(169.3 L/100 km),and achieves 99.47%of the fuel saving effect of the DP strategy(150 L/100 km).
文摘Over the past decade, Graphics Processing Units (GPUs) have revolutionized high-performance computing, playing pivotal roles in advancing fields like IoT, autonomous vehicles, and exascale computing. Despite these advancements, efficiently programming GPUs remains a daunting challenge, often relying on trial-and-error optimization methods. This paper introduces an optimization technique for CUDA programs through a novel Data Layout strategy, aimed at restructuring memory data arrangement to significantly enhance data access locality. Focusing on the dynamic programming algorithm for chained matrix multiplication—a critical operation across various domains including artificial intelligence (AI), high-performance computing (HPC), and the Internet of Things (IoT)—this technique facilitates more localized access. We specifically illustrate the importance of efficient matrix multiplication in these areas, underscoring the technique’s broader applicability and its potential to address some of the most pressing computational challenges in GPU-accelerated applications. Our findings reveal a remarkable reduction in memory consumption and a substantial 50% decrease in execution time for CUDA programs utilizing this technique, thereby setting a new benchmark for optimization in GPU computing.
基金supported in part by the National Key Reseanch and Development Program of China(2018AAA0101502,2018YFB1702300)in part by the National Natural Science Foundation of China(61722312,61533019,U1811463,61533017)in part by the Intel Collaborative Research Institute for Intelligent and Automated Connected Vehicles。
文摘This paper studies the problem of optimal parallel tracking control for continuous-time general nonlinear systems.Unlike existing optimal state feedback control,the control input of the optimal parallel control is introduced into the feedback system.However,due to the introduction of control input into the feedback system,the optimal state feedback control methods can not be applied directly.To address this problem,an augmented system and an augmented performance index function are proposed firstly.Thus,the general nonlinear system is transformed into an affine nonlinear system.The difference between the optimal parallel control and the optimal state feedback control is analyzed theoretically.It is proven that the optimal parallel control with the augmented performance index function can be seen as the suboptimal state feedback control with the traditional performance index function.Moreover,an adaptive dynamic programming(ADP)technique is utilized to implement the optimal parallel tracking control using a critic neural network(NN)to approximate the value function online.The stability analysis of the closed-loop system is performed using the Lyapunov theory,and the tracking error and NN weights errors are uniformly ultimately bounded(UUB).Also,the optimal parallel controller guarantees the continuity of the control input under the circumstance that there are finite jump discontinuities in the reference signals.Finally,the effectiveness of the developed optimal parallel control method is verified in two cases.
基金supported in part by the National Natural Science Foundation of China(61533017,U1501251,61374105,61722312)
文摘The residential energy scheduling of solar energy is an important research area of smart grid. On the demand side, factors such as household loads, storage batteries, the outside public utility grid and renewable energy resources, are combined together as a nonlinear, time-varying, indefinite and complex system, which is difficult to manage or optimize. Many nations have already applied the residential real-time pricing to balance the burden on their grid. In order to enhance electricity efficiency of the residential micro grid, this paper presents an action dependent heuristic dynamic programming(ADHDP) method to solve the residential energy scheduling problem. The highlights of this paper are listed below. First,the weather-type classification is adopted to establish three types of programming models based on the features of the solar energy. In addition, the priorities of different energy resources are set to reduce the loss of electrical energy transmissions.Second, three ADHDP-based neural networks, which can update themselves during applications, are designed to manage the flows of electricity. Third, simulation results show that the proposed scheduling method has effectively reduced the total electricity cost and improved load balancing process. The comparison with the particle swarm optimization algorithm further proves that the present method has a promising effect on energy management to save cost.
文摘Based on service-oriented architecture(SOA),a Bellman-dynamic-programming-based approach of service recovery decision-making is proposed to make valid recovery decisions.Both the attribute and the process of services in the controllable distributed information system are analyzed as the preparatory work.Using the idea of service composition as a reference,the approach translates the recovery decision-making into a planning problem regarding artificial intelligence (AI) through two steps.The first is the self-organization based on a logical view of the network,and the second is the definition of evaluation standards.Applying Bellman dynamic programming to solve the planning problem,the approach offers timely emergency response and optimal recovery source selection,meeting multiple QoS (quality of service)requirements.Experimental results demonstrate the rationality and optimality of the approach,and the theoretical analysis of its computational complexity and the comparison with conventional methods exhibit its high efficiency.
基金supported by the National Natural Science Foundation of China(6157328561305133)
文摘This paper researches the adaptive scheduling problem of multiple electronic support measures(multi-ESM) in a ground moving radar targets tracking application. It is a sequential decision-making problem in uncertain environment. For adaptive selection of appropriate ESMs, we generalize an approximate dynamic programming(ADP) framework to the dynamic case. We define the environment model and agent model, respectively. To handle the partially observable challenge, we apply the unsented Kalman filter(UKF) algorithm for belief state estimation. To reduce the computational burden, a simulation-based approach rollout with a redesigned base policy is proposed to approximate the long-term cumulative reward. Meanwhile, Monte Carlo sampling is combined into the rollout to estimate the expectation of the rewards. The experiments indicate that our method outperforms other strategies due to its better performance in larger-scale problems.
基金supported by the National Natural Science Foundation of China(91648204 61601486)+1 种基金State Key Laboratory of High Performance Computing Project Fund(1502-02)Research Programs of National University of Defense Technology(ZDYYJCYJ140601)
文摘Unmanned aerial vehicles(UAVs) may play an important role in data collection and offloading in vast areas deploying wireless sensor networks, and the UAV’s action strategy has a vital influence on achieving applicability and computational complexity. Dynamic programming(DP) has a good application in the path planning of UAV, but there are problems in the applicability of special terrain environment and the complexity of the algorithm.Based on the analysis of DP, this paper proposes a hierarchical directional DP(DDP) algorithm based on direction determination and hierarchical model. We compare our methods with Q-learning and DP algorithm by experiments, and the results show that our method can improve the terrain applicability, meanwhile greatly reduce the computational complexity.
文摘A stochastic resource allocation model, based on the principles of Markov decision processes(MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource requests for both instant and future needs. The considered framework can handle two types of reservations(i.e., specified and unspecified time interval reservation requests), and implement an overbooking business strategy to further increase business revenues. The resulting dynamic pricing problems can be regarded as sequential decision-making problems under uncertainty, which is solved by means of stochastic dynamic programming(DP) based algorithms. In this regard, Bellman’s backward principle of optimality is exploited in order to provide all the implementation mechanisms for the proposed reservation pricing algorithm. The curse of dimensionality, as the inevitable issue of the DP both for instant resource requests and future resource reservations,occurs. In particular, an approximate dynamic programming(ADP) technique based on linear function approximations is applied to solve such scalability issues. Several examples are provided to show the effectiveness of the proposed approach.
基金supported by the National Natural Science Foundation of China (60904059 60975049)+1 种基金the Philosophy and Social Science Foundation of Hunan Province (2010YBA104)the National High Technology Research and Development Program of China (863 Program)(2009AA04Z107)
文摘A method of minimizing rankings inconsistency is proposed for a decision-making problem with rankings of alternatives given by multiple decision makers according to multiple criteria. For each criteria, at first, the total inconsistency between the rankings of all alternatives for the group and the ones for every decision maker is defined after the decision maker weights in respect to the criteria are considered. Similarly, the total inconsistency between their final rankings for the group and the ones under every criteria is determined after the criteria weights are taken into account. Then two nonlinear integer programming models minimizing respectively the two total inconsistencies above are developed and then transformed to two dynamic programming models to obtain separately the rankings of all alternatives for the group with respect to each criteria and their final rankings. A supplier selection case illustrated the proposed method, and some discussions on the results verified its effectiveness. This work develops a new measurement of ordinal preferences’ inconsistency in multi-criteria group decision-making (MCGDM) and extends the cook-seiford social selection function to MCGDM considering weights of criteria and decision makers and can obtain unique ranking result.
基金supported by the National Natural Science Foundation of China (60672073)the Program for New Century Excellent Talents in University (NCET-06-0537)+1 种基金the Natural Science Foundation of Ningbo (2008A610016)the K.C.Wong Magna Fund in Ningbo University.
文摘Color inconsistency between views is an important problem to be solved in multi-view video systems. A multi-view video color correction method using dynamic programming is proposed. Three-dimensional histograms are constructed with sequential conditional probability in HSI color space. Then, dynamic programming is used to seek the best color mapping relation with the minimum cost path between target image histogram and source image histogram. Finally, video tracking technique is performed to correct multi-view video. Experimental results show that the proposed method can obtain better subjective and objective performance in color correction.
基金partially supported by the National Science Foundation of China(Grants 71822105 and 91746210)。
文摘In short-term operation of natural gas network,the impact of demand uncertainty is not negligible.To address this issue we propose a two-stage robust model for power cost minimization problem in gunbarrel natural gas networks.The demands between pipelines and compressor stations are uncertain with a budget parameter,since it is unlikely that all the uncertain demands reach the maximal deviation simultaneously.During solving the two-stage robust model we encounter a bilevel problem which is challenging to solve.We formulate it as a multi-dimensional dynamic programming problem and propose approximate dynamic programming methods to accelerate the calculation.Numerical results based on real network in China show that we obtain a speed gain of 7 times faster in average without compromising optimality compared with original dynamic programming algorithm.Numerical results also verify the advantage of robust model compared with deterministic model when facing uncertainties.These findings offer short-term operation methods for gunbarrel natural gas network management to handle with uncertainties.
基金supported by the National Natural Science Foundation of China (No.60605023,60775048)Specialized Research Fund for the Doctoral Program of Higher Education (No.20060141006)
文摘An adaptive weighted stereo matching algorithm with multilevel and bidirectional dynamic programming based on ground control points (GCPs) is presented. To decrease time complexity without losing matching precision, using a multilevel search scheme, the coarse matching is processed in typical disparity space image, while the fine matching is processed in disparity-offset space image. In the upper level, GCPs are obtained by enhanced volumetric iterative algorithm enforcing the mutual constraint and the threshold constraint. Under the supervision of the highly reliable GCPs, bidirectional dynamic programming framework is employed to solve the inconsistency in the optimization path. In the lower level, to reduce running time, disparity-offset space is proposed to efficiently achieve the dense disparity image. In addition, an adaptive dual support-weight strategy is presented to aggregate matching cost, which considers photometric and geometric information. Further, post-processing algorithm can ameliorate disparity results in areas with depth discontinuities and related by occlusions using dual threshold algorithm, where missing stereo information is substituted from surrounding regions. To demonstrate the effectiveness of the algorithm, we present the two groups of experimental results for four widely used standard stereo data sets, including discussion on performance and comparison with other methods, which show that the algorithm has not only a fast speed, but also significantly improves the efficiency of holistic optimization.
基金supported by the National Natural Science Foundation of China(Grant Nos.61034002,61233001,61273140,61304086,and 61374105)the Beijing Natural Science Foundation,China(Grant No.4132078)
文摘A policy iteration algorithm of adaptive dynamic programming(ADP) is developed to solve the optimal tracking control for a class of discrete-time chaotic systems. By system transformations, the optimal tracking problem is transformed into an optimal regulation one. The policy iteration algorithm for discrete-time chaotic systems is first described. Then,the convergence and admissibility properties of the developed policy iteration algorithm are presented, which show that the transformed chaotic system can be stabilized under an arbitrary iterative control law and the iterative performance index function simultaneously converges to the optimum. By implementing the policy iteration algorithm via neural networks,the developed optimal tracking control scheme for chaotic systems is verified by a simulation.
文摘A certain number of considerations should be taken into account in the dynamic control of robot manipulators as highly complex non-linear systems.In this article,we provide a detailed presentation of the mechanical and electrical impli- cations of robots equipped with DC motor actuators.This model takes into account all non-linear aspects of the system.Then,we develop computational algorithms for optimal control based on dynamic programming.The robot's trajectory must be predefined,but performance criteria and constraints applying to the system are not limited and we may adapt them freely to the robot and the task being studied.As an example,a manipulator arm with 3 degrees of freedom is analyzed.