摘要
针对投资组合管理问题,提出一种基于值分布强化学习算法(VD-MEAC)的投资组合框架.首先,以投资组合收益最大化为目标建立强化学习框架,智能体的动作就是投资组合的权重变化;然后,选择股票因子做为智能体观察到的状态信息.在算法设计上通过新颖的技巧来平衡风险与收益:在控制风险方面,Critic网络学习未来收益的整个分布,并排除过度自信的决策信息从而避免过估计带来的风险;在提高收益方面,增加熵正则,鼓励投资者探索动作空间,避免过早陷入局部最优.在数值实验方面,选择真实的股票数据做为金融环境,多次进行测试以验证策略的稳定性.实验结果表明:VD-MEAC策略的收益均值为2.490,夏普比率均值为2.978,并且在收益率、最大回撤和夏普比率等指标上明显优于对照组(等权重,沪深300,DDPG,TD3,SAC),证明了该策略的有效性.
Aiming at the problem of portfolio management,a portfolio framework based on value distributional reinforcement learning algorithm(VD-MEAC)was proposed.First,a reinforcement learning framework was established with the goal of maximizing the return of the portfolio,and the action of the agent was the weight change of the portfolio.Then,the stock factor information was selected as the state information observed by the agent.In the design of the algorithm,risks and benefits were balanced through novel techniques.In terms of risk control,the Critic network learned the entire distribution of future benefits,and excluded overconfident decision-making information to avoid the risk of overestimation.In terms of improving benefits,entropy regularization was increased,and investors were encouraged to explore the action space,avoiding falling into local optimum prematurely.In terms of numerical experiments,real stock data was selected as the financial environment,and multiple tests were performed to verify the stability of the strategy.Experimental results show that the average return of the VD-MEAC strategy is 2.490,the average Sharpe ratio is 2.978,and it is significantly better than the control group(equal weight,CSI 300,DDPG,TD3,SAC)in terms of return,maximum drawdown and Sharpe ratio,reflecting the effectiveness of the strategy.
作者
刘磊
陈浩
LIU Lei;CHEN Hao(College of Science,Hohai University,Nanjing 210000,China)
出处
《华中科技大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2023年第5期26-32,共7页
Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金
国家自然科学基金面上项目(61773152).
关键词
值分布强化学习
投资组合管理
量化投资
因子模型
深度学习
value distributional reinforcement learning
portfolio management
quantitative investment
factor model
deep learning