Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and ...Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.展开更多
Heat integration is important for energy-saving in the process industry.It is linked to the persistently challenging task of optimal design of heat exchanger networks(HEN).Due to the inherent highly nonconvex nonlinea...Heat integration is important for energy-saving in the process industry.It is linked to the persistently challenging task of optimal design of heat exchanger networks(HEN).Due to the inherent highly nonconvex nonlinear and combinatorial nature of the HEN problem,it is not easy to find solutions of high quality for large-scale problems.The reinforcement learning(RL)method,which learns strategies through ongoing exploration and exploitation,reveals advantages in such area.However,due to the complexity of the HEN design problem,the RL method for HEN should be dedicated and designed.A hybrid strategy combining RL with mathematical programming is proposed to take better advantage of both methods.An insightful state representation of the HEN structure as well as a customized reward function is introduced.A Q-learning algorithm is applied to update the HEN structure using theε-greedy strategy.Better results are obtained from three literature cases of different scales.展开更多
This study makes a significant progress in addressing the challenges of short-term slope displacement prediction in the Universal Landslide Monitoring Program,an unprecedented disaster mitigation program in China,wher...This study makes a significant progress in addressing the challenges of short-term slope displacement prediction in the Universal Landslide Monitoring Program,an unprecedented disaster mitigation program in China,where lots of newly established monitoring slopes lack sufficient historical deformation data,making it difficult to extract deformation patterns and provide effective predictions which plays a crucial role in the early warning and forecasting of landslide hazards.A slope displacement prediction method based on transfer learning is therefore proposed.Initially,the method transfers the deformation patterns learned from slopes with relatively rich deformation data by a pre-trained model based on a multi-slope integrated dataset to newly established monitoring slopes with limited or even no useful data,thus enabling rapid and efficient predictions for these slopes.Subsequently,as time goes on and monitoring data accumulates,fine-tuning of the pre-trained model for individual slopes can further improve prediction accuracy,enabling continuous optimization of prediction results.A case study indicates that,after being trained on a multi-slope integrated dataset,the TCN-Transformer model can efficiently serve as a pretrained model for displacement prediction at newly established monitoring slopes.The three-day average RMSE is significantly reduced by 34.6%compared to models trained only on individual slope data,and it also successfully predicts the majority of deformation peaks.The fine-tuned model based on accumulated data on the target newly established monitoring slope further reduced the three-day RMSE by 37.2%,demonstrating a considerable predictive accuracy.In conclusion,taking advantage of transfer learning,the proposed slope displacement prediction method effectively utilizes the available data,which enables the rapid deployment and continual refinement of displacement predictions on newly established monitoring slopes.展开更多
In the context of internationalization,China-UK Joint Education Programs are receiving increasing attention from universities.Based on the difficulties faced in China-UK Joint Education Program,this paper adopts a que...In the context of internationalization,China-UK Joint Education Programs are receiving increasing attention from universities.Based on the difficulties faced in China-UK Joint Education Program,this paper adopts a questionnaire survey method to study the learning effectiveness of students majoring in digital media technology in the China-UK Joint Education Program at Guangxi University of Finance and Economics,focusing on four aspects:learning materials,learning content,teacher conditions,and student learning outcomes.The research analysis in this paper not only provides strong support for the construction of China-UK Joint Education Program but also offers references for other China-UK Joint Education Programs.展开更多
This paper introduces a self-learning control approach based on approximate dynamic programming. Dynamic programming was introduced by Bellman in the 1950's for solving optimal control problems of nonlinear dynami...This paper introduces a self-learning control approach based on approximate dynamic programming. Dynamic programming was introduced by Bellman in the 1950's for solving optimal control problems of nonlinear dynamical systems. Due to its high computational complexity, the applications of dynamic programming have been limited to simple and small problems. The key step in finding approximate solutions to dynamic programming is to estimate the performance index in dynamic programming. The optimal control signal can then be determined by minimizing (or maximizing) the performance index. Artificial neural networks are very efficient tools in representing the performance index in dynamic programming. This paper assumes the use of neural networks for estimating the performance index in dynamic programming and for generating optimal control signals, thus to achieve optimal control through self-learning.展开更多
This paper is concerned with a novel integrated multi-step heuristic dynamic programming(MsHDP)algorithm for solving optimal control problems.It is shown that,initialized by the zero cost function,MsHDP can converge t...This paper is concerned with a novel integrated multi-step heuristic dynamic programming(MsHDP)algorithm for solving optimal control problems.It is shown that,initialized by the zero cost function,MsHDP can converge to the optimal solution of the Hamilton-Jacobi-Bellman(HJB)equation.Then,the stability of the system is analyzed using control policies generated by MsHDP.Also,a general stability criterion is designed to determine the admissibility of the current control policy.That is,the criterion is applicable not only to traditional value iteration and policy iteration but also to MsHDP.Further,based on the convergence and the stability criterion,the integrated MsHDP algorithm using immature control policies is developed to accelerate learning efficiency greatly.Besides,actor-critic is utilized to implement the integrated MsHDP scheme,where neural networks are used to evaluate and improve the iterative policy as the parameter architecture.Finally,two simulation examples are given to demonstrate that the learning effectiveness of the integrated MsHDP scheme surpasses those of other fixed or integrated methods.展开更多
Programming difficulties are one of the common problems faced by software engineering students,which can lead to a rapid decline in motivation and even drop out.Probing students’programming difficulties is a crucial ...Programming difficulties are one of the common problems faced by software engineering students,which can lead to a rapid decline in motivation and even drop out.Probing students’programming difficulties is a crucial step in understanding their current programming situation and implementing appropriate instructional interventions.However,how to detect students’programming difficulties accurately without students’awareness remains a big challenge.Address the issues above;this paper adopts a sensor-free difficulties detecting method based on a deep neural network which employs a recurrent neural network(RNN)model and uses the sequential timing data from programming behaviour.The method can detect students’programming difficulties in real-time with 93%accuracy without interference in the programming process.In the long term,this method is the first step for establishing an automated intelligent programming environment.At the same time,it can assist teachers in noticing the difficulties that students encounter.Then,teachers can adjust their teaching plans and provide manual tutoring intervention more quickly.展开更多
Pair programming has been widely acclaimed the best way to go in computer programming. Recently, collaboration involving more subjects has been shown to produce better results in programming environments. However, the...Pair programming has been widely acclaimed the best way to go in computer programming. Recently, collaboration involving more subjects has been shown to produce better results in programming environments. However, the optimum group size needed for the collaboration has not been adequately addressed. This paper seeks to inculcate and acquaint the students involved in the study with the spirit of team work in software projects and to empirically determine the effective (optimum) team size that may be desirable in programming/learning real life environments. Two different experiments were organized and conducted. Parameters for determining the optimal team size were formulated. Volunteered participants of different genders were randomly grouped into five parallel teams of different sizes ranging from 1 to 5 in the first experiment. Each team size was replicated six times. The second experiment involved teams of same gender compositions (males or females) in different sizes. The times (efforts) for problem analysis and coding as well as compile-time errors (bugs) were recorded for each team size. The effectiveness was finally analyzed for the teams. The study shows that collaboration is highly beneficial to new learners of computer programming. They easily grasp the programming concepts when the learning is done in the company of others. The study also demonstrates that the optimum team size that may be adopted in a collaborative learning of computer programming is four.展开更多
Students often face difficulties while taking basic programming courses due to several factors. In response, research has presented subjective assessments for diagnosing learning problems to improve the teaching of pr...Students often face difficulties while taking basic programming courses due to several factors. In response, research has presented subjective assessments for diagnosing learning problems to improve the teaching of programming in higher education. In this paper, the authors propose an Object Oriented conceptual map model and organize this approach into three levels: constructing a Concept Effect Propagation Table, constructing Test Item-Concept Relationships and diagnosing Student Learning Problems with Matrix Composition. The authors' work is a modification of the approaches of Chert and Bai as well as Chu et al., as the authors use statistical methods, rather than fuzzy sets, for the authors' analysis. This paper includes a statistical summary, which has been tested on a small sample of students in King Abdulaziz University, Jeddah, Saudi Arabia, illustrating the learning problems in an Object Oriented course. The experimental results have demonstrated that this approach might aid learning and teaching in an effective way.展开更多
Approximate dynamic programming (ADP) is a general and effective approach for solving optimal control and estimation problems by adapting to uncertain and nonconvex environments over time.
<span style="font-family:Verdana;">Students face difficulties in programming languages learning (PLL) which encourages many scholars to investigate the factors behind that. Although there a number of p...<span style="font-family:Verdana;">Students face difficulties in programming languages learning (PLL) which encourages many scholars to investigate the factors behind that. Although there a number of positive and negative factors found to be effective in PLL procedure, utilising online tools in PLL were recognized as a positive recommended means. This motivates many researchers to provide solutions and proposals which result in a number of choices and options. However, categorising those efforts and showing what has been done, would provide a better and clear image for future studies. Therefore, this paper aims to conduct a systematic literature review to show what studies have been done and then categorise them based on the type of online tools and the aims of the research. The study follows Kitchenham and Charters guidelines for writing SLR (Systematic Literature Review). The search result reached 1390 publications between 2013-09/2018. After the filtration which has been done through selected criteria, 160 publications were found to be adequate to answer the review questions. The main results of this systematic review are categorizing the aims of the studies in online PLL tools, classifying the tools and finding the current trends of the online PLL tools.</span>展开更多
In view of the current situation that offline teaching is the main mode of teaching Java Programming in higher vocational schools,this paper introduces the online and offline hybrid teaching method and expounds it fro...In view of the current situation that offline teaching is the main mode of teaching Java Programming in higher vocational schools,this paper introduces the online and offline hybrid teaching method and expounds it from the aspects of blended learning design,teaching organization,and implementation.At the same time,combined with the characteristics of blended learning,this paper proposes that under the new mode,teachers should actively change the form of teaching and research,the teaching mode,and the role of teachers,take students as the center,and build an independent and effective classroom.展开更多
基金supported in part by the National Natural Science Foundation of China(62222301, 62073085, 62073158, 61890930-5, 62021003)the National Key Research and Development Program of China (2021ZD0112302, 2021ZD0112301, 2018YFC1900800-5)Beijing Natural Science Foundation (JQ19013)。
文摘Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.
基金The financial support provided by the Project of National Natural Science Foundation of China(U22A20415,21978256,22308314)“Pioneer”and“Leading Goose”Research&Development Program of Zhejiang(2022C01SA442617)。
文摘Heat integration is important for energy-saving in the process industry.It is linked to the persistently challenging task of optimal design of heat exchanger networks(HEN).Due to the inherent highly nonconvex nonlinear and combinatorial nature of the HEN problem,it is not easy to find solutions of high quality for large-scale problems.The reinforcement learning(RL)method,which learns strategies through ongoing exploration and exploitation,reveals advantages in such area.However,due to the complexity of the HEN design problem,the RL method for HEN should be dedicated and designed.A hybrid strategy combining RL with mathematical programming is proposed to take better advantage of both methods.An insightful state representation of the HEN structure as well as a customized reward function is introduced.A Q-learning algorithm is applied to update the HEN structure using theε-greedy strategy.Better results are obtained from three literature cases of different scales.
基金funded by the project of the China Geological Survey(DD20211364)the Science and Technology Talent Program of Ministry of Natural Resources of China(grant number 121106000000180039–2201)。
文摘This study makes a significant progress in addressing the challenges of short-term slope displacement prediction in the Universal Landslide Monitoring Program,an unprecedented disaster mitigation program in China,where lots of newly established monitoring slopes lack sufficient historical deformation data,making it difficult to extract deformation patterns and provide effective predictions which plays a crucial role in the early warning and forecasting of landslide hazards.A slope displacement prediction method based on transfer learning is therefore proposed.Initially,the method transfers the deformation patterns learned from slopes with relatively rich deformation data by a pre-trained model based on a multi-slope integrated dataset to newly established monitoring slopes with limited or even no useful data,thus enabling rapid and efficient predictions for these slopes.Subsequently,as time goes on and monitoring data accumulates,fine-tuning of the pre-trained model for individual slopes can further improve prediction accuracy,enabling continuous optimization of prediction results.A case study indicates that,after being trained on a multi-slope integrated dataset,the TCN-Transformer model can efficiently serve as a pretrained model for displacement prediction at newly established monitoring slopes.The three-day average RMSE is significantly reduced by 34.6%compared to models trained only on individual slope data,and it also successfully predicts the majority of deformation peaks.The fine-tuned model based on accumulated data on the target newly established monitoring slope further reduced the three-day RMSE by 37.2%,demonstrating a considerable predictive accuracy.In conclusion,taking advantage of transfer learning,the proposed slope displacement prediction method effectively utilizes the available data,which enables the rapid deployment and continual refinement of displacement predictions on newly established monitoring slopes.
基金Guangxi Key Laboratory of Financial Big Data Fund Project(Guikejizi[2021]No.5)Research on the Innovation of Teaching Models for Foreign Professional Courses in China-UK Joint Education Under the Background of Internationalization-Taking Guangxi University of Finance and Economics as an Example(2023XJJG26)Exploration and Practice of Digital Media Technology Talent Training Models in the Context of New Productive Forces(XGK202423)。
文摘In the context of internationalization,China-UK Joint Education Programs are receiving increasing attention from universities.Based on the difficulties faced in China-UK Joint Education Program,this paper adopts a questionnaire survey method to study the learning effectiveness of students majoring in digital media technology in the China-UK Joint Education Program at Guangxi University of Finance and Economics,focusing on four aspects:learning materials,learning content,teacher conditions,and student learning outcomes.The research analysis in this paper not only provides strong support for the construction of China-UK Joint Education Program but also offers references for other China-UK Joint Education Programs.
基金Supported by the National Science Foundation (U.S.A.) under Grant ECS-0355364
文摘This paper introduces a self-learning control approach based on approximate dynamic programming. Dynamic programming was introduced by Bellman in the 1950's for solving optimal control problems of nonlinear dynamical systems. Due to its high computational complexity, the applications of dynamic programming have been limited to simple and small problems. The key step in finding approximate solutions to dynamic programming is to estimate the performance index in dynamic programming. The optimal control signal can then be determined by minimizing (or maximizing) the performance index. Artificial neural networks are very efficient tools in representing the performance index in dynamic programming. This paper assumes the use of neural networks for estimating the performance index in dynamic programming and for generating optimal control signals, thus to achieve optimal control through self-learning.
基金the National Key Research and Development Program of China(2021ZD0112302)the National Natural Science Foundation of China(62222301,61890930-5,62021003)the Beijing Natural Science Foundation(JQ19013).
文摘This paper is concerned with a novel integrated multi-step heuristic dynamic programming(MsHDP)algorithm for solving optimal control problems.It is shown that,initialized by the zero cost function,MsHDP can converge to the optimal solution of the Hamilton-Jacobi-Bellman(HJB)equation.Then,the stability of the system is analyzed using control policies generated by MsHDP.Also,a general stability criterion is designed to determine the admissibility of the current control policy.That is,the criterion is applicable not only to traditional value iteration and policy iteration but also to MsHDP.Further,based on the convergence and the stability criterion,the integrated MsHDP algorithm using immature control policies is developed to accelerate learning efficiency greatly.Besides,actor-critic is utilized to implement the integrated MsHDP scheme,where neural networks are used to evaluate and improve the iterative policy as the parameter architecture.Finally,two simulation examples are given to demonstrate that the learning effectiveness of the integrated MsHDP scheme surpasses those of other fixed or integrated methods.
基金supported by the 2018-2020 Higher Education Talent Training Quality and Teaching Reform Project of Sichuan Province(Grant No.JG2018-46)the Science and Technology Planning Program of Sichuan University and Luzhou(Grant No.2017CDLZG30)the Postdoctoral Science fund of Sichuan University(Grant No.2019SCU12058).
文摘Programming difficulties are one of the common problems faced by software engineering students,which can lead to a rapid decline in motivation and even drop out.Probing students’programming difficulties is a crucial step in understanding their current programming situation and implementing appropriate instructional interventions.However,how to detect students’programming difficulties accurately without students’awareness remains a big challenge.Address the issues above;this paper adopts a sensor-free difficulties detecting method based on a deep neural network which employs a recurrent neural network(RNN)model and uses the sequential timing data from programming behaviour.The method can detect students’programming difficulties in real-time with 93%accuracy without interference in the programming process.In the long term,this method is the first step for establishing an automated intelligent programming environment.At the same time,it can assist teachers in noticing the difficulties that students encounter.Then,teachers can adjust their teaching plans and provide manual tutoring intervention more quickly.
文摘Pair programming has been widely acclaimed the best way to go in computer programming. Recently, collaboration involving more subjects has been shown to produce better results in programming environments. However, the optimum group size needed for the collaboration has not been adequately addressed. This paper seeks to inculcate and acquaint the students involved in the study with the spirit of team work in software projects and to empirically determine the effective (optimum) team size that may be desirable in programming/learning real life environments. Two different experiments were organized and conducted. Parameters for determining the optimal team size were formulated. Volunteered participants of different genders were randomly grouped into five parallel teams of different sizes ranging from 1 to 5 in the first experiment. Each team size was replicated six times. The second experiment involved teams of same gender compositions (males or females) in different sizes. The times (efforts) for problem analysis and coding as well as compile-time errors (bugs) were recorded for each team size. The effectiveness was finally analyzed for the teams. The study shows that collaboration is highly beneficial to new learners of computer programming. They easily grasp the programming concepts when the learning is done in the company of others. The study also demonstrates that the optimum team size that may be adopted in a collaborative learning of computer programming is four.
文摘Students often face difficulties while taking basic programming courses due to several factors. In response, research has presented subjective assessments for diagnosing learning problems to improve the teaching of programming in higher education. In this paper, the authors propose an Object Oriented conceptual map model and organize this approach into three levels: constructing a Concept Effect Propagation Table, constructing Test Item-Concept Relationships and diagnosing Student Learning Problems with Matrix Composition. The authors' work is a modification of the approaches of Chert and Bai as well as Chu et al., as the authors use statistical methods, rather than fuzzy sets, for the authors' analysis. This paper includes a statistical summary, which has been tested on a small sample of students in King Abdulaziz University, Jeddah, Saudi Arabia, illustrating the learning problems in an Object Oriented course. The experimental results have demonstrated that this approach might aid learning and teaching in an effective way.
文摘Approximate dynamic programming (ADP) is a general and effective approach for solving optimal control and estimation problems by adapting to uncertain and nonconvex environments over time.
文摘<span style="font-family:Verdana;">Students face difficulties in programming languages learning (PLL) which encourages many scholars to investigate the factors behind that. Although there a number of positive and negative factors found to be effective in PLL procedure, utilising online tools in PLL were recognized as a positive recommended means. This motivates many researchers to provide solutions and proposals which result in a number of choices and options. However, categorising those efforts and showing what has been done, would provide a better and clear image for future studies. Therefore, this paper aims to conduct a systematic literature review to show what studies have been done and then categorise them based on the type of online tools and the aims of the research. The study follows Kitchenham and Charters guidelines for writing SLR (Systematic Literature Review). The search result reached 1390 publications between 2013-09/2018. After the filtration which has been done through selected criteria, 160 publications were found to be adequate to answer the review questions. The main results of this systematic review are categorizing the aims of the studies in online PLL tools, classifying the tools and finding the current trends of the online PLL tools.</span>
基金This study was supported by the General Project of Ganzhou Social Science Research in 2021-Research on the Transformation of Teaching and Research Form of Professional Teachers in the Blending Learning Mode of Colleges and Universities-Taking the Course“Java Programming”as an Example(Project Number:2021-028-0323).
文摘In view of the current situation that offline teaching is the main mode of teaching Java Programming in higher vocational schools,this paper introduces the online and offline hybrid teaching method and expounds it from the aspects of blended learning design,teaching organization,and implementation.At the same time,combined with the characteristics of blended learning,this paper proposes that under the new mode,teachers should actively change the form of teaching and research,the teaching mode,and the role of teachers,take students as the center,and build an independent and effective classroom.