期刊文献+
共找到1,653篇文章
< 1 2 83 >
每页显示 20 50 100
基于softmax的加权Double Q-Learning算法
1
作者 钟雨昂 袁伟伟 关东海 《计算机科学》 CSCD 北大核心 2024年第S01期46-50,共5页
强化学习作为机器学习的一个分支,用于描述和解决智能体在与环境的交互过程中,通过学习策略以达成回报最大化的问题。Q-Learning作为无模型强化学习的经典方法,存在过估计引起的最大化偏差问题,并且在环境中奖励存在噪声时表现不佳。Dou... 强化学习作为机器学习的一个分支,用于描述和解决智能体在与环境的交互过程中,通过学习策略以达成回报最大化的问题。Q-Learning作为无模型强化学习的经典方法,存在过估计引起的最大化偏差问题,并且在环境中奖励存在噪声时表现不佳。Double Q-Learning(DQL)的出现解决了过估计问题,但同时造成了低估问题。为解决以上算法的高低估问题,提出了基于softmax的加权Q-Learning算法,并将其与DQL相结合,提出了一种新的基于softmax的加权Double Q-Learning算法(WDQL-Softmax)。该算法基于加权双估计器的构造,对样本期望值进行softmax操作得到权重,使用权重估计动作价值,有效平衡对动作价值的高估和低估问题,使估计值更加接近理论值。实验结果表明,在离散动作空间中,相比于Q-Learning算法、DQL算法和WDQL算法,WDQL-Softmax算法的收敛速度更快且估计值与理论值的误差更小。 展开更多
关键词 强化学习 q-learning double q-learning Softmax
下载PDF
Double BP Q-Learning Algorithm for Local Path Planning of Mobile Robot 被引量:1
2
作者 Guoming Liu Caihong Li +2 位作者 Tengteng Gao Yongdi Li Xiaopei He 《Journal of Computer and Communications》 2021年第6期138-157,共20页
Aiming at the dimension disaster problem, poor model generalization ability and deadlock problem in special obstacles environment caused by the increase of state information in the local path planning process of mobil... Aiming at the dimension disaster problem, poor model generalization ability and deadlock problem in special obstacles environment caused by the increase of state information in the local path planning process of mobile robot, this paper proposed a Double BP Q-learning algorithm based on the fusion of Double Q-learning algorithm and BP neural network. In order to solve the dimensional disaster problem, two BP neural network fitting value functions with the same network structure were used to replace the two <i>Q</i> value tables in Double Q-Learning algorithm to solve the problem that the <i>Q</i> value table cannot store excessive state information. By adding the mechanism of priority experience replay and using the parameter transfer to initialize the model parameters in different environments, it could accelerate the convergence rate of the algorithm, improve the learning efficiency and the generalization ability of the model. By designing specific action selection strategy in special environment, the deadlock state could be avoided and the mobile robot could reach the target point. Finally, the designed Double BP Q-learning algorithm was simulated and verified, and the probability of mobile robot reaching the target point in the parameter update process was compared with the Double Q-learning algorithm under the same condition of the planned path length. The results showed that the model trained by the improved Double BP Q-learning algorithm had a higher success rate in finding the optimal or sub-optimal path in the dense discrete environment, besides, it had stronger model generalization ability, fewer redundant sections, and could reach the target point without entering the deadlock zone in the special obstacles environment. 展开更多
关键词 Mobile Robot Local Path Planning double BP q-learning BP Neural Network Transfer Learning
下载PDF
基于Double Deep Q-learning的无线通信节点覆盖优化 被引量:1
3
作者 李忠涛 《电子技术与软件工程》 2021年第14期1-3,共3页
本文拟采用Double Deep Q-learning模型进行算法设计,该算法是强化学习中的一种values-based算法,实现一种神经网络模型来代替表格Q-Table,解决了系统状态过多导致的Q-Table过大问题。
关键词 无线通信节点 最优路径 double Deep q-learning
下载PDF
Supervisory control of the hybrid off-highway vehicle for fuel economy improvement using predictive double Q-learning with backup models
4
作者 SHUAI Bin LI Yan-fei +2 位作者 ZHOU Quan XU Hong-ming SHUAI Shi-jin 《Journal of Central South University》 SCIE EI CAS CSCD 2022年第7期2266-2278,共13页
This paper studied a supervisory control system for a hybrid off-highway electric vehicle under the chargesustaining(CS)condition.A new predictive double Q-learning with backup models(PDQL)scheme is proposed to optimi... This paper studied a supervisory control system for a hybrid off-highway electric vehicle under the chargesustaining(CS)condition.A new predictive double Q-learning with backup models(PDQL)scheme is proposed to optimize the engine fuel in real-world driving and improve energy efficiency with a faster and more robust learning process.Unlike the existing“model-free”methods,which solely follow on-policy and off-policy to update knowledge bases(Q-tables),the PDQL is developed with the capability to merge both on-policy and off-policy learning by introducing a backup model(Q-table).Experimental evaluations are conducted based on software-in-the-loop(SiL)and hardware-in-the-loop(HiL)test platforms based on real-time modelling of the studied vehicle.Compared to the standard double Q-learning(SDQL),the PDQL only needs half of the learning iterations to achieve better energy efficiency than the SDQL at the end learning process.In the SiL under 35 rounds of learning,the results show that the PDQL can improve the vehicle energy efficiency by 1.75%higher than SDQL.By implementing the PDQL in HiL under four predefined real-world conditions,the PDQL can robustly save more than 5.03%energy than the SDQL scheme. 展开更多
关键词 supervisory charge-sustaining control hybrid electric vehicle reinforcement learning predictive double q-learning
下载PDF
CNOP-P-based parameter sensitivity for double-gyre variation in ROMS with simulated annealing algorithm 被引量:3
5
作者 YUAN Shijin ZHANG Huazhen +1 位作者 LI Mi MU Bin 《Journal of Oceanology and Limnology》 SCIE CAS CSCD 2019年第3期957-967,共11页
Reducing the error of sensitive parameters by studying the parameters sensitivity can reduce the uncertainty of the model,while simulating double-gyre variation in Regional Ocean Modeling System(ROMS).Conditional Nonl... Reducing the error of sensitive parameters by studying the parameters sensitivity can reduce the uncertainty of the model,while simulating double-gyre variation in Regional Ocean Modeling System(ROMS).Conditional Nonlinear Optimal Perturbation related to Parameter(CNOP-P)is an effective method of studying the parameters sensitivity,which represents a type of parameter error with maximum nonlinear development at the prediction time.Intelligent algorithms have been widely applied to solving Conditional Nonlinear Optimal Perturbation(CNOP).In the paper,we proposed an improved simulated annealing(SA)algorithm to solve CNOP-P to get the optimal parameters error,studied the sensitivity of the single parameter and the combination of multiple parameters and verified the effect of reducing the error of sensitive parameters on reducing the uncertainty of model simulation.Specifically,we firstly found the non-period oscillation of kinetic energy time series of double gyre variation,then extracted two transition periods,which are respectively from high energy to low energy and from low energy to high energy.For every transition period,three parameters,respectively wind amplitude(WD),viscosity coefficient(VC)and linear bottom drag coefficient(RDRG),were studied by CNOP-P solved with SA algorithm.Finally,for sensitive parameters,their effect on model simulation is verified.Experiments results showed that the sensitivity order is WD>VC>>RDRG,the effect of the combination of multiple sensitive parameters is greater than that of single parameter superposition and the reduction of error of sensitive parameters can effectively reduce model prediction error which confirmed the importance of sensitive parameters analysis. 展开更多
关键词 parameter sensitivity double GYRE Regional Ocean Modeling System(ROMS) CONDITIONAL Nonlinear Optimal Perturbation(CNOP-P) simulated annealing(SA)algorithm
下载PDF
Constrained Optimization Algorithm Based on Double Populations 被引量:1
6
作者 Xiaojun B Lei Zhang Yan Cang 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2016年第2期66-71,共6页
In order to improve the distribution and convergence of constrained optimization algorithms,this paper proposes a constrained optimization algorithm based on double populations. Firstly the feasible solutions and infe... In order to improve the distribution and convergence of constrained optimization algorithms,this paper proposes a constrained optimization algorithm based on double populations. Firstly the feasible solutions and infeasible solutions are stored separately through two populations,which can avoid direct comparison between them. The usage of efficient information carried by the infeasible solutions will enlarge exploitation scope and strength diversity of populations. At the same time,adopting the presented concept of constraints domination to update the infeasible set may keep good variety of population and give consideration to convergence. Also the improved mutation operation is employed to further raise the diversity and convergence.The suggested algorithm is compared with 3 state- of- the- art constrained optimization algorithms on standard test problems g01- g13. Simulation results show that the presented algorithm has certain advantages than other algorithms because it can ensure good convergence accuracy while it has good robustness. 展开更多
关键词 CONSTRAINED optimization problems CONSTRAINT HANDLING evolution algorithms double POPULATIONS CONSTRAINT domination.
下载PDF
Relocation of the 1998 Zhangbei-Shangyi earthquake sequence using the double difference earthquake location algorithm 被引量:1
7
作者 YANG Zhi-xian(杨智娴) +1 位作者 CHEN Yun-tai(陈运泰) 《Acta Seismologica Sinica(English Edition)》 CSCD 2004年第2期125-130,共6页
On January 10, 1998, at 11h50min Beijing Time (03h50min UTC), an earthquake of ML=6.2 occurred in the border region between the Zhangbei County and Shangyi County of Hebei Province. This earthquake is the most signifi... On January 10, 1998, at 11h50min Beijing Time (03h50min UTC), an earthquake of ML=6.2 occurred in the border region between the Zhangbei County and Shangyi County of Hebei Province. This earthquake is the most significant event to have occurred in northern China in the recent years. The earthquake-generating structure of this event was not clear due to no active fault capable of generating a moderate earthquake was found in the epicentral area, nor surface ruptures with any predominate orientation were observed, no distinct orientation of its aftershock distribution given by routine earthquake location was shown. To study the seismogenic structure of the Zhangbei- Shangyi earthquake, the main shock and its aftershocks with ML3.0 of the Zhangbei-Shangyi earthquake sequence were relocated by the authors of this paper in 2002 using the master event relative relocation technique. The relocated epicenter of the main shock was located at 41.145癗, 114.462癊, which was located 4 km to the NE of the macro-epicenter of this event. The relocated focal depth of the main shock was 15 km. Hypocenters of the aftershocks distributed in a nearly vertical plane striking 180~200 and its vicinity. The relocated results of the Zhangbei-Shangyi earthquake sequence clearly indicated that the seismogenic structure of this event was a NNE-SSW-striking fault with right-lateral and reverse slip. In this paper, a relocation of the Zhangbei-Shangyi earthquake sequence has been done using the double difference earthquake location algorithm (DD algorithm), and consistent results with that obtained by the master event technique were obtained. The relocated hypocenters of the main shock are located at 41.131癗, 114.456癊, which was located 2.5 km to the NE of the macro-epicenter of the main shock. The relocated focal depth of the main shock was 12.8 km. Hypocenters of the aftershocks also distributed in a nearly vertical N10E-striking plane and its vicinity. The relocated results using DD algorithm clearly indicated that the seismogenic structure of this event was a NNE-striking fault again. 展开更多
关键词 Zhangbei-Shangyi earthquake double difference earthquake location algorithm earthquake relocation seismogenic structure source process
下载PDF
A blockchain bee colony double inhibition labor division algorithm for spatio-temporal coupling task with application to UAV swarm task allocation 被引量:4
8
作者 WU Husheng LI Hao XIAO Renbin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2021年第5期1180-1199,共20页
It is difficult for the double suppression division algorithm of bee colony to solve the spatio-temporal coupling or have higher dimensional attributes and undertake sudden tasks.Using the idea of clustering,after clu... It is difficult for the double suppression division algorithm of bee colony to solve the spatio-temporal coupling or have higher dimensional attributes and undertake sudden tasks.Using the idea of clustering,after clustering tasks according to spatio-temporal attributes,the clustered groups are linked into task sub-chains according to similarity.Then,based on the correlation between clusters,the child chains are connected to form a task chain.Therefore,the limitation is solved that the task chain in the bee colony algorithm can only be connected according to one dimension.When a sudden task occurs,a method of inserting a small number of tasks into the original task chain and a task chain reconstruction method are designed according to the relative relationship between the number of sudden tasks and the number of remaining tasks.Through the above improvements,the algorithm can be used to process tasks with spatio-temporal coupling and burst tasks.In order to reflect the efficiency and applicability of the algorithm,a task allocation model for the unmanned aerial vehicle(UAV)group is constructed,and a one-to-one correspondence between the improved bee colony double suppression division algorithm and each attribute in the UAV group is proposed.Task assignment has been constructed.The study uses the self-adjusting characteristics of the bee colony to achieve task allocation.Simulation verification and algorithm comparison show that the algorithm has stronger planning advantages and algorithm performance. 展开更多
关键词 bee colony double inhibition labor division algorithm high dimensional attribute sudden task reforming the task chain task allocation model
下载PDF
基于Q-Learning算法的无人机空战机动决策研究
9
作者 姚培源 魏潇龙 +1 位作者 俞利新 李胜厚 《电光与控制》 CSCD 北大核心 2023年第5期16-22,共7页
针对无人机空战对抗自主机动决策问题,设计了侧向机动决策算法。通过加入启发式因子的方式和双Q表交替学习的机制,弥补了传统Q-Learning算法学习速度慢、无效学习多的不足。通过路径规划仿真和数据的对比,验证了改进Q-Learning算法具有... 针对无人机空战对抗自主机动决策问题,设计了侧向机动决策算法。通过加入启发式因子的方式和双Q表交替学习的机制,弥补了传统Q-Learning算法学习速度慢、无效学习多的不足。通过路径规划仿真和数据的对比,验证了改进Q-Learning算法具有更好的稳定性和求解能力。设计了动态的栅格规划环境,能够使无人机根据变化的空战态势自适应调整栅格尺寸大小,且对求解的速率不产生影响。基于Q-Learning算法,构建了无人机空战对抗侧向机动决策模型,并通过武器平台调换的方式验证了改进Q-Learning算法能显著提升无人机空战胜负比。 展开更多
关键词 无人机 空战 机动决策 动态栅格环境 路径规划 q-learning学习表算法
下载PDF
Hydraulic Optimization of a Double-channel Pump's Impeller Based on Multi-objective Genetic Algorithm 被引量:11
10
作者 ZHAO Binjuan WANG Yu +2 位作者 CHEN Huilong QIU Jing HOU Duohua 《Chinese Journal of Mechanical Engineering》 SCIE EI CAS CSCD 2015年第3期634-640,共7页
Computational fluid dynamics(CFD) can give a lot of potentially very useful information for hydraulic optimization design of pumps, however, it cannot directly state what kind of modification should be made to impro... Computational fluid dynamics(CFD) can give a lot of potentially very useful information for hydraulic optimization design of pumps, however, it cannot directly state what kind of modification should be made to improve such hydrodynamic performance. In this paper, a more convenient and effective approach is proposed by combined using of CFD, multi-objective genetic algorithm(MOGA) and artificial neural networks(ANN) for a double-channel pump's impeller, with maximum head and efficiency set as optimization objectives, four key geometrical parameters including inlet diameter, outlet diameter, exit width and midline wrap angle chosen as optimization parameters. Firstly, a multi-fidelity fitness assignment system in which fitness of impellers serving as training and comparison samples for ANN is evaluated by CFD, meanwhile fitness of impellers generated by MOGA is evaluated by ANN, is established and dramatically reduces the computational expense. Then, a modified MOGA optimization process, in which selection is performed independently in two sub-populations according to two optimization objectives, crossover and mutation is performed afterword in the merged population, is developed to ensure the global optimal solution to be found. Finally, Pareto optimal frontier is found after 500 steps of iterations, and two optimal design schemes are chosen according to the design requirements. The preliminary and optimal design schemes are compared, and the comparing results show that hydraulic performances of both pumps 1 and 2 are improved, with the head and efficiency of pump 1 increased by 5.7% and 5.2%, respectively in the design working conditions, meanwhile shaft power decreased in all working conditions, the head and efficiency of pump 2 increased by 11.7% and 5.9%, respectively while shaft power increased by 5.5%. Inner flow field analyses also show that the backflow phenomenon significantly diminishes at the entrance of the optimal impellers 1 and 2, both the area of vortex and intensity of vortex decreases in the whole flow channel. This paper provides a promising tool to solve the hydraulic optimization problem of pumps' impellers. 展开更多
关键词 double-channel pump's impeller multi-objective genetic algorithm artificial neural network computational fluid dynamics(CFD) UNI
下载PDF
Double Optimal Regularization Algorithms for Solving Ill-Posed Linear Problems under Large Noise
11
作者 Chein-Shan Liu Satya N.Atluri 《Computer Modeling in Engineering & Sciences》 SCIE EI 2015年第1期1-39,共39页
A double optimal solution of an n-dimensional system of linear equations Ax=b has been derived in an affine m-dimensional Krylov subspace with m <<n.We further develop a double optimal iterative algorithm(DOIA),... A double optimal solution of an n-dimensional system of linear equations Ax=b has been derived in an affine m-dimensional Krylov subspace with m <<n.We further develop a double optimal iterative algorithm(DOIA),with the descent direction z being solved from the residual equation Az=r0 by using its double optimal solution,to solve ill-posed linear problem under large noise.The DOIA is proven to be absolutely convergent step-by-step with the square residual error ||r||^2=||b-Ax||^2 being reduced by a positive quantity ||Azk||^2 at each iteration step,which is found to be better than those algorithms based on the minimization of the square residual error in an m-dimensional Krylov subspace.In order to tackle the ill-posed linear problem under a large noise,we also propose a novel double optimal regularization algorithm(DORA)to solve it,which is an improvement of the Tikhonov regularization method.Some numerical tests reveal the high performance of DOIA and DORA against large noise.These methods are of use in the ill-posed problems of structural health-monitoring. 展开更多
关键词 ILL-POSED LINEAR equations system double OPTIMAL solution Affine Krylov subspace double OPTIMAL iterative algorithm double OPTIMAL REGULARIZATION algorithm
下载PDF
Assembly Line Balancing Based on Double Chromosome Genetic Algorithm
12
作者 刘俨后 左敦稳 张丹 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI 2014年第6期622-628,共7页
Aiming at assembly line balancing problem,a double chromosome genetic algorithm(DCGA)is proposed to avoid trapping in local optimum,which is a disadvantage of standard genetic algorithm(SGA).In this algorithm,there ar... Aiming at assembly line balancing problem,a double chromosome genetic algorithm(DCGA)is proposed to avoid trapping in local optimum,which is a disadvantage of standard genetic algorithm(SGA).In this algorithm,there are two chromosomes of each individual,and the better one,regarded as dominant chromosome,determines the fitness.Dominant chromosome keeps excellent gene segments to speed up the convergence,and recessive chromosome maintains population diversity to get better global search ability to avoid local optimal solution.When the amounts of chromosomes are equal,the population size of DCGA is half that of SGA,which significantly reduces evolutionary time.Finally,the effectiveness is verified by experiments. 展开更多
关键词 double chromosome genetic algorithm assembly line balancing mathematical model global optimum
下载PDF
A Routing Algorithm for Distributed Optimal Double Loop Computer Networks
13
作者 Li Layuan(Department of Electrical Engineering and Computer Science.Wuhan University of Water Transportation, Wuhan 430063, P. R. China) 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 1994年第1期37-43,共7页
A routing algorithm for distributed optimal double loop computer networks is proposed and analyzed. In this paper, the routing algorithm rule is described, and the procedures realizing the algorithm are given. The pr... A routing algorithm for distributed optimal double loop computer networks is proposed and analyzed. In this paper, the routing algorithm rule is described, and the procedures realizing the algorithm are given. The proposed algorithm is shown to be optimal and robust for optimal double loop. In the absence of failures,the algorithm can send a packet along the shortest path to destination; when there are failures,the packet can bypasss failed nodes and links. 展开更多
关键词 Computer networks double loop Routing algorithm
下载PDF
Dynamic Self-Adaptive Double Population Particle Swarm Optimization Algorithm Based on Lorenz Equation
14
作者 Yan Wu Genqin Sun +4 位作者 Keming Su Liang Liu Huaijin Zhang Bingsheng Chen Mengshan Li 《Journal of Computer and Communications》 2017年第13期9-20,共12页
In order to improve some shortcomings of the standard particle swarm optimization algorithm, such as premature convergence and slow local search speed, a double population particle swarm optimization algorithm based o... In order to improve some shortcomings of the standard particle swarm optimization algorithm, such as premature convergence and slow local search speed, a double population particle swarm optimization algorithm based on Lorenz equation and dynamic self-adaptive strategy is proposed. Chaotic sequences produced by Lorenz equation are used to tune the acceleration coefficients for the balance between exploration and exploitation, the dynamic self-adaptive inertia weight factor is used to accelerate the converging speed, and the double population purposes to enhance convergence accuracy. The experiment was carried out with four multi-objective test functions compared with two classical multi-objective algorithms, non-dominated sorting genetic algorithm and multi-objective particle swarm optimization algorithm. The results show that the proposed algorithm has excellent performance with faster convergence rate and strong ability to jump out of local optimum, could use to solve many optimization problems. 展开更多
关键词 Improved Particle SWARM Optimization algorithm double POPULATIONS MULTI-OBJECTIVE Adaptive Strategy CHAOTIC SEQUENCE
下载PDF
Relocation of the M_S≥2.0 Earthquakes in the Northern Tianshan Region, Xinjiang, Using the Double-Difference Earthquake Relocation Algorithm
15
作者 Wang Haitao Li Zhihai +1 位作者 Zhao Cuiping Qu Yanjun 《Earthquake Research in China》 2007年第4期388-396,共9页
We applied the double-difference earthquake rdocation algorithm to 1348 earthquakes with Ms ≥2.0 that occurred in the northern Tianshan region, Xinjiang, from April 1988 to June 2003, using a total of 28701 P- and S-... We applied the double-difference earthquake rdocation algorithm to 1348 earthquakes with Ms ≥2.0 that occurred in the northern Tianshan region, Xinjiang, from April 1988 to June 2003, using a total of 28701 P- and S-wave arrival times recorded by 32 seismic stations in Xinjiang. Aiming to obtain most of these Ms ≥ 2.0 earthquakes relocations, and considering the requirements of the DD method and the condition of data, we added the travel time data of another 437 earthquakes with 1.5 ≤ Ms 〈 2.0. Finally, we obtained the relocation results for 1253 earthquakes with Ms ≥2.0, which account for 93 % of all the 1348 earthquakes with Ms ≥ 2.0 and includes all the Ms ≥ 3.0 earthquakes. The reason for not relocating the 95 earthquakes with 2.0 ≤ Ms 〈 3.0 is analyzed in the paper. After relocation, the RMS residual decreased from 0.83s to 0.14s, the average error is 0.993 km in E-W direction, 1.10 km in N- S direction, and 1.33 km in vertical direction. The hypocenter depths are more convergent than before and distributed from 5 km to 35 kin, with 94% being from 5km to 35 kin, 68.2% from 10 km to 25 kin. The average hypocenter depth is 19 kin. 展开更多
关键词 double difference earthquake relocation algorithm Hypocenter parameter Northern Tianshan region
下载PDF
Research on WSN Double-Radius Localization Algorithm Based on Partition Judgment Mechanism
16
作者 Jijun Zhao Hua Li +1 位作者 Zhiyuan Tang Xiang Sun 《Wireless Sensor Network》 2010年第8期639-644,共6页
Localization technology is an important support technology for WSN(Wireless Sensor Networks). The centroid algorithm is a typical range-free localization algorithm, which possesses the advantages such as simple locali... Localization technology is an important support technology for WSN(Wireless Sensor Networks). The centroid algorithm is a typical range-free localization algorithm, which possesses the advantages such as simple localization principle and easy realization. However, susceptible to be influenced by the density of anchor node and uniformity of deployment, its localization accuracy is not high. We study localization principal and error source of the centroid algorithm. Meanwhile, aim to resolve the problem of low localization accuracy, we proposes a new double-radius localization algorithm, which makes WSN node launch periodically two rounded communications area with different radius to enable localization region to achieve the second partition, thus there are some small overlapping regions which can narrow effectively localization range of unknown node. Besides, partition judgment mechanism is proposed to ascertain the area of unknown node, and then the localization of small regions is realized by the centroid algorithm. Simulation results show that the algorithm without adding additional hardware and anchor nodes but increases effectively localization accuracy and reduces the dependence on anchor node. 展开更多
关键词 Wireless Sensor Networks Localization Technology CENTROID algorithm double-Radius
下载PDF
An Improved Apriori Algorithm Based on Matrix and Double Correlation Profit Constraint
17
作者 Yuan Liu Ya Li +3 位作者 Jian Yang Yan Ren Guoqiang Sun Quansheng Li 《国际计算机前沿大会会议论文集》 2018年第1期27-27,共1页
下载PDF
基于双估计器的改进Speedy Q-learning算法 被引量:6
18
作者 郑帅 罗飞 +2 位作者 顾春华 丁炜超 卢海峰 《计算机科学》 CSCD 北大核心 2020年第7期179-185,共7页
Q-learning算法是一种经典的强化学习算法,更新策略由于保守和过估计的原因,存在收敛速度慢的问题。Speedy Q-learning算法和Double Q-learning算法是Q-learning算法的两个变种,分别用于解决Q-learning算法收敛速度慢和过估计的问题。... Q-learning算法是一种经典的强化学习算法,更新策略由于保守和过估计的原因,存在收敛速度慢的问题。Speedy Q-learning算法和Double Q-learning算法是Q-learning算法的两个变种,分别用于解决Q-learning算法收敛速度慢和过估计的问题。文中基于Speedy Q-learning算法Q值的更新规则和蒙特卡洛强化学习的更新策略,通过理论分析及数学证明提出了其等价形式,从该等价形式可以看到,Speedy Q-learning算法由于将当前Q值的估计函数作为历史Q值的估计,虽然整体上提升了智能体的收敛速度,但是同样存在过估计问题,使得算法在迭代初期的收敛速度较慢。针对该问题,文中基于Double Q-learning算法中双估计器可以改善智能体收敛速度的特性,提出了一种改进算法Double Speedy Q-learning。其通过双估计器,分离最优动作和最大Q值的选择,改善了Speedy Q-learning算法在迭代初期的学习策略,提升了Speedy Q-learning算法的整体收敛速度。在不同规模的格子世界中进行实验,分别采用线性学习率和多项式学习率,来对比Q-learning算法及其改进算法在迭代初期的收敛速度和整体收敛速度。实验结果表明,Double Speedy Q-learning算法在迭代初期的收敛速度快于Speedy Q-learning算法,且其整体收敛速度明显快于对比算法,其实际平均奖励值和期望奖励值之间的差值最小。 展开更多
关键词 q-learning double q-learning Speedy q-learning 强化学习
下载PDF
JMM与double/long变量同步方法探究
19
作者 俞松 郑骏 杨云 《微处理机》 2010年第1期79-82,85,共5页
Java存储模型是Java语言和Java虚拟机研究中的核心关键部分。Java语言规范规定:所有对基本类型的操作,除了对double/long类型的操作之外,都必须是原子级的。Java提供的volatile关键字可以使double/long变量实现变量级同步,但仍不能保证... Java存储模型是Java语言和Java虚拟机研究中的核心关键部分。Java语言规范规定:所有对基本类型的操作,除了对double/long类型的操作之外,都必须是原子级的。Java提供的volatile关键字可以使double/long变量实现变量级同步,但仍不能保证线程间同步。针对这些不足,结合实例分析了Java内存模型,synchronized方法,引入了硬件原语(CAS),给出了非阻塞算法的同步策略。 展开更多
关键词 Java内存模型 double/long变量 Volatile关键字 Synchronized方法 非阻塞算法
下载PDF
基于Double-Bagging决策树的基因微阵列数据研究 被引量:1
20
作者 袁科 《湖北汽车工业学院学报》 2009年第2期40-43,共4页
Bagging通过组合不稳定的分类器在很大程度上降低了"弱"学习算法的分类误差。基于Torsten等人提出的Double-Bagging算法,本文对其加以修改并应用于基因微阵列数据的处理。在给定的训练数据集和测试集上试验并比较了多种分类器... Bagging通过组合不稳定的分类器在很大程度上降低了"弱"学习算法的分类误差。基于Torsten等人提出的Double-Bagging算法,本文对其加以修改并应用于基因微阵列数据的处理。在给定的训练数据集和测试集上试验并比较了多种分类器,结果表明Double-Bagging决策树分类精确度优于Bagging决策树和C4.5算法。 展开更多
关键词 double—Bagging算法 double-Bagging决策树 基因微阵列数据 分类器
下载PDF
上一页 1 2 83 下一页 到第
使用帮助 返回顶部