Approximate dynamic programming (ADP) is a general and effective approach for solving optimal control and estimation problems by adapting to uncertain and nonconvex environments over time.
In online programming education,if teachers can determine any difficulties their students are experiencing and provide support,it would significantly improve the outcome of their teaching.This paper describes an attem...In online programming education,if teachers can determine any difficulties their students are experiencing and provide support,it would significantly improve the outcome of their teaching.This paper describes an attempt to build a time prediction model on the demand for personalized affective support based on a modified version of the Synthetic Minority Over-sampling Technique.We designed and conducted a data collection experiment based on the specific features of the affective support.Meanwhile,the modified oversampling algorithm can ascertain the time for providing such support for learners,which solves the problem of a class imbalance distribution.In addition,we obtained a sorting algorithm of the time prediction regarding the demand for personalized affective support in programming learning and constructed a time prediction model on the demand for affective support.Meanwhile,we conducted experiments on both public data and our own collected data to verify the effectiveness of the constructed model.The results show that the model is able to judge whether learners need affective support during the writing code process.展开更多
Tremendous amount of data are being generated and saved in many complex engineering and social systems every day.It is significant and feasible to utilize the big data to make better decisions by machine learning tech...Tremendous amount of data are being generated and saved in many complex engineering and social systems every day.It is significant and feasible to utilize the big data to make better decisions by machine learning techniques. In this paper, we focus on batch reinforcement learning(RL) algorithms for discounted Markov decision processes(MDPs) with large discrete or continuous state spaces, aiming to learn the best possible policy given a fixed amount of training data. The batch RL algorithms with handcrafted feature representations work well for low-dimensional MDPs. However, for many real-world RL tasks which often involve high-dimensional state spaces, it is difficult and even infeasible to use feature engineering methods to design features for value function approximation. To cope with high-dimensional RL problems, the desire to obtain data-driven features has led to a lot of works in incorporating feature selection and feature learning into traditional batch RL algorithms. In this paper, we provide a comprehensive survey on automatic feature selection and unsupervised feature learning for high-dimensional batch RL. Moreover, we present recent theoretical developments on applying statistical learning to establish finite-sample error bounds for batch RL algorithms based on weighted Lpnorms. Finally, we derive some future directions in the research of RL algorithms, theories and applications.展开更多
文摘Approximate dynamic programming (ADP) is a general and effective approach for solving optimal control and estimation problems by adapting to uncertain and nonconvex environments over time.
基金supported by the 2018-2020 Higher Education Talent Training Quality and Teaching Reform Project of Sichuan Province(Grant No.JG2018-46)the Science and Technology Planning Program of Sichuan University and Luzhou(Grant No.2017CDLZG30)the Postdoctoral Science fund of Sichuan University(Grant No.2019SCU12058).
文摘In online programming education,if teachers can determine any difficulties their students are experiencing and provide support,it would significantly improve the outcome of their teaching.This paper describes an attempt to build a time prediction model on the demand for personalized affective support based on a modified version of the Synthetic Minority Over-sampling Technique.We designed and conducted a data collection experiment based on the specific features of the affective support.Meanwhile,the modified oversampling algorithm can ascertain the time for providing such support for learners,which solves the problem of a class imbalance distribution.In addition,we obtained a sorting algorithm of the time prediction regarding the demand for personalized affective support in programming learning and constructed a time prediction model on the demand for affective support.Meanwhile,we conducted experiments on both public data and our own collected data to verify the effectiveness of the constructed model.The results show that the model is able to judge whether learners need affective support during the writing code process.
基金supported by National Natural Science Foundation of China(Nos.61034002,61233001 and 61273140)
文摘Tremendous amount of data are being generated and saved in many complex engineering and social systems every day.It is significant and feasible to utilize the big data to make better decisions by machine learning techniques. In this paper, we focus on batch reinforcement learning(RL) algorithms for discounted Markov decision processes(MDPs) with large discrete or continuous state spaces, aiming to learn the best possible policy given a fixed amount of training data. The batch RL algorithms with handcrafted feature representations work well for low-dimensional MDPs. However, for many real-world RL tasks which often involve high-dimensional state spaces, it is difficult and even infeasible to use feature engineering methods to design features for value function approximation. To cope with high-dimensional RL problems, the desire to obtain data-driven features has led to a lot of works in incorporating feature selection and feature learning into traditional batch RL algorithms. In this paper, we provide a comprehensive survey on automatic feature selection and unsupervised feature learning for high-dimensional batch RL. Moreover, we present recent theoretical developments on applying statistical learning to establish finite-sample error bounds for batch RL algorithms based on weighted Lpnorms. Finally, we derive some future directions in the research of RL algorithms, theories and applications.