股票市场具有变化快、干扰因素多、周期数据不足等特点,股票交易是一种不完全信息下的博弈过程,单目标的监督学习模型很难处理这类序列化决策问题。强化学习是解决该类问题的有效途径之一。提出了基于深度强化学习的智能股市操盘手模型I...股票市场具有变化快、干扰因素多、周期数据不足等特点,股票交易是一种不完全信息下的博弈过程,单目标的监督学习模型很难处理这类序列化决策问题。强化学习是解决该类问题的有效途径之一。提出了基于深度强化学习的智能股市操盘手模型ISTG(Intelligent Stock Trader and Gym),融合历史行情数据、技术指标、宏观经济指标等多数据类型,分析评判标准和优秀控制策略,加工长周期数据,实现可增量扩展不同类型数据的复盘模型,自动计算回报标签,训练智能操盘手,并提出直接利用行情数据计算单步确定性动作值的方法。采用中国股市1400多支的有10年以上数据的股票进行多种对比实验,ISTG的总体收益达到13%,优于买入持有总体−7%的表现。展开更多
This paper presents the multi-step Q-learning(MQL)algorithm as an autonomic approach to thejoint radio resource management(JRRM)among heterogeneous radio access technologies(RATs)in theB3G environment.Through the'...This paper presents the multi-step Q-learning(MQL)algorithm as an autonomic approach to thejoint radio resource management(JRRM)among heterogeneous radio access technologies(RATs)in theB3G environment.Through the'trial-and-error'on-line learning process,the JRRM controller can con-verge to the optimized admission control policy.The JRRM controller learns to give the best allocation foreach session in terms of both the access RAT and the service bandwidth.Simulation results show that theproposed algorithm realizes the autonomy of JRRM and achieves well trade-off between the spectrum utilityand the blocking probability comparing to the load-balancing algorithm and the utility-maximizing algo-rithm.Besides,the proposed algorithm has better online performances and convergence speed than theone-step Q-learning(QL)algorithm.Therefore,the user statisfaction degree could be improved also.展开更多
文摘随着大量直流电源和负荷的接入,交直流混合的配电网技术已成为未来配电网的发展趋势.然而,源荷不确定性及可调度设备的类型多样化给配电网调度带来了巨大的挑战.本文提出了基于分支决斗深度强化网络(branching dueling Q-network,BDQ)和软演员-评论家(soft actor critic,SAC)双智能体深度强化学习的交直流配电网调度方法.该方法首先将经济调度问题与两智能体的动作、奖励、状态相结合,建立经济调度的马尔可夫决策过程,并分别基于BDQ和SAC方法设置两个智能体,其中,BDQ智能体用于控制配电网中离散动作设备,SAC智能体用于控制连续动作设备.然后,通过集中训练分散执行的方式,两智能体与环境进行交互,进行离线训练.最后,固定智能体的参数,进行在线调度.该方法的优势在于采用双智能体能够同时控制离散动作设备电容器组、载调压变压器和连续动作设备变流器、储能,同时通过对双智能体的集中训练,可以自适应源荷的不确定性.改进的IEEE33节点交直流配电网算例测试验证了所提方法的有效性.
文摘股票市场具有变化快、干扰因素多、周期数据不足等特点,股票交易是一种不完全信息下的博弈过程,单目标的监督学习模型很难处理这类序列化决策问题。强化学习是解决该类问题的有效途径之一。提出了基于深度强化学习的智能股市操盘手模型ISTG(Intelligent Stock Trader and Gym),融合历史行情数据、技术指标、宏观经济指标等多数据类型,分析评判标准和优秀控制策略,加工长周期数据,实现可增量扩展不同类型数据的复盘模型,自动计算回报标签,训练智能操盘手,并提出直接利用行情数据计算单步确定性动作值的方法。采用中国股市1400多支的有10年以上数据的股票进行多种对比实验,ISTG的总体收益达到13%,优于买入持有总体−7%的表现。
基金the National Natural Science Foundation of China(No.60632030)the National High Technology Research and Development Program of China(No.2006AA01Z276)
文摘This paper presents the multi-step Q-learning(MQL)algorithm as an autonomic approach to thejoint radio resource management(JRRM)among heterogeneous radio access technologies(RATs)in theB3G environment.Through the'trial-and-error'on-line learning process,the JRRM controller can con-verge to the optimized admission control policy.The JRRM controller learns to give the best allocation foreach session in terms of both the access RAT and the service bandwidth.Simulation results show that theproposed algorithm realizes the autonomy of JRRM and achieves well trade-off between the spectrum utilityand the blocking probability comparing to the load-balancing algorithm and the utility-maximizing algo-rithm.Besides,the proposed algorithm has better online performances and convergence speed than theone-step Q-learning(QL)algorithm.Therefore,the user statisfaction degree could be improved also.