期刊文献+
共找到34篇文章
< 1 2 >
每页显示 20 50 100
Decoding topological XYZ^(2) codes with reinforcement learning based on attention mechanisms
1
作者 陈庆辉 姬宇欣 +2 位作者 王柯涵 马鸿洋 纪乃华 《Chinese Physics B》 SCIE EI CAS CSCD 2024年第6期262-270,共9页
Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum co... Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum computer. For this new topological stabilizer code-XYZ^(2) code defined on the cellular lattice, it is implemented on a hexagonal lattice of qubits and it encodes the logical qubits with the help of stabilizer measurements of weight six and weight two. However topological stabilizer codes in cellular lattice quantum systems suffer from the detrimental effects of noise due to interaction with the environment. Several decoding approaches have been proposed to address this problem. Here, we propose the use of a state-attention based reinforcement learning decoder to decode XYZ^(2) codes, which enables the decoder to more accurately focus on the information related to the current decoding position, and the error correction accuracy of our reinforcement learning decoder model under the optimisation conditions can reach 83.27% under the depolarizing noise model, and we have measured thresholds of 0.18856 and 0.19043 for XYZ^(2) codes at code spacing of 3–7 and 7–11, respectively. our study provides directions and ideas for applications of decoding schemes combining reinforcement learning attention mechanisms to other topological quantum error-correcting codes. 展开更多
关键词 quantum error correction topological quantum stabilizer code reinforcement learning attention mechanism
下载PDF
Approximate error correction scheme for three-dimensional surface codes based reinforcement learning
2
作者 曲英杰 陈钊 +1 位作者 王伟杰 马鸿洋 《Chinese Physics B》 SCIE EI CAS CSCD 2023年第10期229-240,共12页
Quantum error correction technology is an important method to eliminate errors during the operation of quantum computers.In order to solve the problem of influence of errors on physical qubits,we propose an approximat... Quantum error correction technology is an important method to eliminate errors during the operation of quantum computers.In order to solve the problem of influence of errors on physical qubits,we propose an approximate error correction scheme that performs dimension mapping operations on surface codes.This error correction scheme utilizes the topological properties of error correction codes to map the surface code dimension to three dimensions.Compared to previous error correction schemes,the present three-dimensional surface code exhibits good scalability due to its higher redundancy and more efficient error correction capabilities.By reducing the number of ancilla qubits required for error correction,this approach achieves savings in measurement space and reduces resource consumption costs.In order to improve the decoding efficiency and solve the problem of the correlation between the surface code stabilizer and the 3D space after dimension mapping,we employ a reinforcement learning(RL)decoder based on deep Q-learning,which enables faster identification of the optimal syndrome and achieves better thresholds through conditional optimization.Compared to the minimum weight perfect matching decoding,the threshold of the RL trained model reaches 0.78%,which is 56%higher and enables large-scale fault-tolerant quantum computation. 展开更多
关键词 fault-tolerant quantum computing surface code approximate error correction reinforcement learning
下载PDF
基于多目标强化学习的抗强干扰Polar编码优化方法 被引量:1
3
作者 梁豪 叶淦华 +2 位作者 陆锐敏 王恒 魏鹏 《电子与信息学报》 EI CSCD 北大核心 2023年第11期4092-4100,共9页
为提升跳频(FH)通信系统信息传输的可靠性和抗干扰能力,该文基于新型Polar编码的慢跳频抗干扰通信系统模型,提出一种适应强干扰环境的Polar编码构造优化方法。首先,面向包含常态和干扰态的混合信道设计多目标强化学习算法,然后优化编码... 为提升跳频(FH)通信系统信息传输的可靠性和抗干扰能力,该文基于新型Polar编码的慢跳频抗干扰通信系统模型,提出一种适应强干扰环境的Polar编码构造优化方法。首先,面向包含常态和干扰态的混合信道设计多目标强化学习算法,然后优化编码过程中的信息位比特信道序列,提升码字的纠错性能,并通过初始化预处理和理论计算回报值降低算法执行复杂度。仿真结果表明,在包含强干扰的混合信道条件下,所提编码优化方法的全局误码性能优于传统编码构造方法,相比于第5代移动通信系统(5G)第3代合作伙伴计划(3GPP)标准方案全局编码增益达0.5 dB,有效改善Polar编码跳频通信高可靠抗干扰传输性能。 展开更多
关键词 信道编码 抗干扰 Polar码 强化学习 可靠性能
下载PDF
A SPEECH RECOGNITION METHOD USING COMPETITIVE AND SELECTIVE LEARNING NEURAL NETWORKS
4
作者 徐雄 胡光锐 严永红 《Journal of Shanghai Jiaotong university(Science)》 EI 2000年第2期10-13,共4页
On the basis of asymptotic theory of Gersho, the isodistortion principle of vector clustering was discussed and a kind of competitive and selective learning method (CSL) which may avoid local optimization and have exc... On the basis of asymptotic theory of Gersho, the isodistortion principle of vector clustering was discussed and a kind of competitive and selective learning method (CSL) which may avoid local optimization and have excellent result in application to clusters of HMM model was also proposed. In combining the parallel, self organizational hierarchical neural networks (PSHNN) to reclassify the scores of every form output by HMM, the CSL speech recognition rate is obviously elevated. 展开更多
关键词 SPEECH recognition COMPETITIVE learning classification NEURaL networks document code:a
下载PDF
基于类型辅助引导的代码注释生成模型
5
作者 刘利 吕韦岑 汪洋 《无线电通信技术》 北大核心 2024年第4期807-814,共8页
代码注释生成方法通常基于结构-序列(Structure-Sequence, Struct2Seq)框架,但忽略了代码注释的类型信息,例如操作符、字符串等。由于类型信息之间的层次具有依赖性,将类型信息引入已有的Struct2Seq框架并不适用。为了解决上述问题,提... 代码注释生成方法通常基于结构-序列(Structure-Sequence, Struct2Seq)框架,但忽略了代码注释的类型信息,例如操作符、字符串等。由于类型信息之间的层次具有依赖性,将类型信息引入已有的Struct2Seq框架并不适用。为了解决上述问题,提出一种基于类型辅助引导的代码注释生成(Code Comment Generation based on Type-assisted Guidance, CCG-TG)模型,将源代码视为带有类型信息的n元树。该模型包含一个关联类型编码器和一个限制类型解码器,可以对源代码进行自适应总结。此外,提出一种多级强化学习(Multi-level Reinforcement Learning, MRL)方法来优化所提模型的训练过程。在多个数据集上进行实验,与多种基准模型对比,证明所提CCG-TG模型在所有评价指标上的性能最优。 展开更多
关键词 代码注释生成 类型信息 结构序列框架 类型辅助引导 强化学习
下载PDF
面向漏洞检测模型的强化学习式对抗攻击方法
6
作者 陈思然 吴敬征 +3 位作者 凌祥 罗天悦 刘镓煜 武延军 《软件学报》 EI CSCD 北大核心 2024年第8期3647-3667,共21页
基于深度学习的代码漏洞检测模型因其检测效率高和精度准的优势,逐步成为检测软件漏洞的重要方法,并在代码托管平台GitHub的代码审计服务中发挥重要作用.然而,深度神经网络已被证明容易受到对抗攻击的干扰,这导致基于深度学习的漏洞检... 基于深度学习的代码漏洞检测模型因其检测效率高和精度准的优势,逐步成为检测软件漏洞的重要方法,并在代码托管平台GitHub的代码审计服务中发挥重要作用.然而,深度神经网络已被证明容易受到对抗攻击的干扰,这导致基于深度学习的漏洞检测模型存在遭受攻击、降低检测准确率的风险.因此,构建针对漏洞检测模型的对抗攻击不仅可以发掘此类模型的安全缺陷,而且有助于评估模型的鲁棒性,进而通过相应的方法提升模型性能.但现有的面向漏洞检测模型的对抗攻击方法依赖于通用的代码转换工具,并未提出针对性的代码扰动操作和决策算法,因此难以生成有效的对抗样本,且对抗样本的合法性依赖于人工检查.针对上述问题,提出了一种面向漏洞检测模型的强化学习式对抗攻击方法.该方法首先设计了一系列语义约束且漏洞保留的代码扰动操作作为扰动集合;其次,将具备漏洞的代码样本作为输入,利用强化学习模型选取具体的扰动操作序列;最后,根据代码样本的语法树节点类型寻找扰动的潜在位置,进行代码转换,从而生成对抗样本.基于SARD和NVD构建了两个实验数据集,共14278个代码样本,并以此训练了4个具备不同特点的漏洞检测模型作为攻击目标.针对每个目标模型,训练了一个强化学习网络进行对抗攻击.结果显示,该攻击方法导致模型的召回率降低了74.34%,攻击成功率达到96.71%,相较基线方法,攻击成功率平均提升了68.76%.实验证明了当前的漏洞检测模型存在被攻击的风险,需要进一步研究提升模型的鲁棒性. 展开更多
关键词 对抗攻击 漏洞检测 强化学习 代码转换
下载PDF
软件中代码注释质量问题研究综述 被引量:1
7
作者 王潮 徐卫伟 周明辉 《软件学报》 EI CSCD 北大核心 2024年第2期513-531,共19页
代码注释作为辅助软件开发群体协作的关键机制,被开发者所广泛使用以提升开发效率.然而,由于代码注释并不直接影响软件运行,使其常被开发者忽视,导致出现代码注释质量问题,进而影响开发效率.代码注释中存在的质量问题会影响开发者理解... 代码注释作为辅助软件开发群体协作的关键机制,被开发者所广泛使用以提升开发效率.然而,由于代码注释并不直接影响软件运行,使其常被开发者忽视,导致出现代码注释质量问题,进而影响开发效率.代码注释中存在的质量问题会影响开发者理解相关代码,甚至可能产生误解从而引入代码缺陷,因此这一问题受到研究者的广泛关注.采用系统文献调研,对近年来国内外学者在代码注释质量问题上的研究工作进行系统的分析.从代码注释质量的评价维度、度量指标以及提升策略这3个方面总结研究现状,并提出当前研究所存在的不足、挑战及建议. 展开更多
关键词 代码注释 软件文档 自然语言处理 机器学习
下载PDF
利用强化学习的改进遗传算法求解柔性作业车间调度问题
8
作者 陈祉烨 胡毅 +2 位作者 刘俊 王军 张曦阳 《科学技术与工程》 北大核心 2024年第25期10848-10856,共9页
针对传统遗传算法在解决柔性作业车间调度问题时易陷入局部最优解、参数不能智能调整、局部搜索能力差的问题,建立以最大完工时间最小为目标的柔性作业车间调度模型,并提出一种基于强化学习的改进遗传算法(reinforcement learning impro... 针对传统遗传算法在解决柔性作业车间调度问题时易陷入局部最优解、参数不能智能调整、局部搜索能力差的问题,建立以最大完工时间最小为目标的柔性作业车间调度模型,并提出一种基于强化学习的改进遗传算法(reinforcement learning improved genetic algorithm,RLIGA)求解该模型。首先,在遗传算法迭代过程中,利用强化学习动态调整关键参数。其次,引入基于工序编码距离的离散莱维飞行机制,改进求解空间。最后,引入变邻域搜索机制,提升算法的局部开发能力。使用PyCharm运行Brandimarte算例,验证算法的求解性能,实验证明所提算法求解效率较高,跳出局部最优能力更强,求解结果更好。 展开更多
关键词 强化学习 遗传算法 离散莱维飞行 工序编码距离 变邻域搜索
下载PDF
基于深度强化学习的二进制代码模糊测试方法
9
作者 王栓奇 赵健鑫 +2 位作者 刘驰 武伟 刘钊 《计算机科学》 CSCD 北大核心 2024年第S01期852-858,共7页
漏洞挖掘是计算机软件安全领域的主要研究方向,其中模糊测试是重要的动态挖掘方法。为解决二进制代码漏洞挖掘中汇编代码体积庞大导致检测既困难又耗时、模糊测试效率低下等问题,提出基于深度强化学习的二进制代码模糊测试方法。首先将... 漏洞挖掘是计算机软件安全领域的主要研究方向,其中模糊测试是重要的动态挖掘方法。为解决二进制代码漏洞挖掘中汇编代码体积庞大导致检测既困难又耗时、模糊测试效率低下等问题,提出基于深度强化学习的二进制代码模糊测试方法。首先将模糊测试过程建模为面向强化学习的多步马尔可夫决策过程,通过构建深度强化学习模型辅助模糊测试变异策略选择,实现对变异策略的动态优化。然后设计和搭建基于深度强化学习的二进制代码模糊测试平台,利用AFL实现模糊测试环境,并使用Keras-RL2库和OpenAI Gym框架实现深度强化学习算法和强化学习环境。最后通过实验分析来验证所提方法和测试平台的有效性和适用性,实验结果显示深度强化学习模型能够辅助模糊测试过程快速覆盖更多路径,能够暴露更多漏洞缺陷,显著提高二进制代码漏洞挖掘和定位的效率。 展开更多
关键词 二进制代码 漏洞挖掘 模糊测试 深度强化学习 测试平台
下载PDF
基于Tile Coding编码和模型学习的Actor-Critic算法 被引量:3
10
作者 金玉净 朱文文 +1 位作者 伏玉琛 刘全 《计算机科学》 CSCD 北大核心 2014年第6期239-242,249,共5页
Actor-Critic是一类具有较好性能及收敛保证的强化学习方法,然而,Agent在学习和改进策略的过程中并没有对环境的动态性进行学习,导致Actor-Critic方法的性能受到一定限制。此外,Actor-Critic方法中需要近似地表示策略以及值函数,其中状... Actor-Critic是一类具有较好性能及收敛保证的强化学习方法,然而,Agent在学习和改进策略的过程中并没有对环境的动态性进行学习,导致Actor-Critic方法的性能受到一定限制。此外,Actor-Critic方法中需要近似地表示策略以及值函数,其中状态和动作的编码方法以及参数对Actor-Critic方法有重要的影响。Tile Coding编码具有简单易用、计算时间复杂度较低等优点,因此,将Tile Coding编码与基于模型的Actor-Critic方法结合,并将所得算法应用于强化学习仿真实验。实验结果表明,所得算法具有较好的性能。 展开更多
关键词 强化学习 TILE CODING actor-Critic 模型学习 函数逼近
下载PDF
Rich-text document styling restoration via reinforcement learning 被引量:1
11
作者 Hongwei LI Yingpeng HU +2 位作者 Yixuan CAO Ganbin ZHOU Ping LUO 《Frontiers of Computer Science》 SCIE EI CSCD 2021年第4期93-103,共11页
Richly formatted documents,such as financial disclosures,scientific articles,government regulations,widely exist on Web.However,since most of these documents are only for public reading,the styling information inside ... Richly formatted documents,such as financial disclosures,scientific articles,government regulations,widely exist on Web.However,since most of these documents are only for public reading,the styling information inside them is usually missing,making them improper or even burdensome to be displayed and edited in different formats and platforms.In this study we formulate the task of document styling restoration as an optimization problem,which aims to identify the styling settings on the document elements,e.g.,lines,table cells,text,so that rendering with the output styling settings results in a document,where each element inside it holds the(closely)exact position with the one in the original document.Considering that each styling setting is a decision,this problem can be transformed as a multi-step decision-making task over all the document elements,and then be solved by reinforcement learning.Specifically,Monte-Carlo Tree Search(MCTS)is leveraged to explore the different styling settings,and the policy function is learnt under the supervision of the delayed rewards.As a case study,we restore the styling information inside tables,where structural and functional data in the documents are usually presented.Experiment shows that,our best reinforcement method successfully restores the stylings in 87.65%of the tables,with 25.75%absolute improvement over the greedymethod.We also discuss the tradeoff between the inference time and restoration success rate,and argue that although the reinforcement methods cannot be used in real-time scenarios,it is suitable for the offline tasks with high-quality requirement.Finally,this model has been applied in a PDF parser to support cross-format display. 展开更多
关键词 styling restoration monte-carlo tree search reinforcement learning richly formatted documents TaBLES
原文传递
Cooperative Caching for Scalable Video Coding Using Value-Decomposed Dimensional Networks 被引量:1
12
作者 Youjia Chen Yuekai Cai +2 位作者 Haifeng Zheng Jinsong Hu Jun Li 《China Communications》 SCIE CSCD 2022年第9期146-161,共16页
Scalable video coding(SVC)has been widely used in video-on-demand(VOD)service,to efficiently satisfy users’different video quality requirements and dynamically adjust video stream to timevariant wireless channels.Und... Scalable video coding(SVC)has been widely used in video-on-demand(VOD)service,to efficiently satisfy users’different video quality requirements and dynamically adjust video stream to timevariant wireless channels.Under the 5G network structure,we consider a cooperative caching scheme inside each cluster with SVC to economically utilize the limited caching storage.A novel multi-agent deep reinforcement learning(MADRL)framework is proposed to jointly optimize the video access delay and users’satisfaction,where an aggregation node is introduced helping individual agents to achieve global observations and overall system rewards.Moreover,to cope with the large action space caused by the large number of videos and users,a dimension decomposition method is embedded into the neural network in each agent,which greatly reduce the computational complexity and memory cost of the reinforcement learning.Experimental results show that:1)the proposed value-decomposed dimensional network(VDDN)algorithm achieves an obvious performance gain versus the traditional MADRL;2)the proposed VDDN algorithm can handle an extremely large action space and quickly converge with a low computational complexity. 展开更多
关键词 cooperative caching multi-agent deep reinforcement learning scalable video coding value-decomposition network
下载PDF
RSOFCPN:CONTROL SYSTEM STRUCTURE ANDALGORITHM DESIGN
13
作者 马勇 杨煜普 +1 位作者 张卫东 许晓鸣 《Journal of Shanghai Jiaotong university(Science)》 EI 2000年第2期57-61,共5页
A stable control scheme for a class of unknown nonlinear systems was presented. The control architecture is composed of two parts, the fuzzy sliding mode controller (FSMC) is applied to drive the state to a designed s... A stable control scheme for a class of unknown nonlinear systems was presented. The control architecture is composed of two parts, the fuzzy sliding mode controller (FSMC) is applied to drive the state to a designed switching hyperplane, and a reinforcement self organizing fuzzy CPN (RSOFCPN) as a feedforward compensator is used to reduce the influence of system uncertainties. The simulation results demonstrate the effectiveness of the proposed control scheme. 展开更多
关键词 nonlinear systems fuzzy SLIDING mode control self ORGaNIZED CPN reinforcement learning document code:a
下载PDF
考虑5G基站储能可调度容量的有源配电网协同优化调度方法 被引量:8
14
作者 陈实 郭正伟 +3 位作者 周步祥 刘艺洪 臧天磊 罗欢 《电网技术》 EI CSCD 北大核心 2023年第12期5225-5237,共13页
随着移动通信向5G快速更新换代,5G基站建设规模快速增长,可将海量5G通信基站中的闲置储能视作灵活性资源参与电力系统调度,以减轻新能源发电的随机性和波动性对系统的不利影响。针对含分布式风力发电有源配电网的基站储能经济优化调度问... 随着移动通信向5G快速更新换代,5G基站建设规模快速增长,可将海量5G通信基站中的闲置储能视作灵活性资源参与电力系统调度,以减轻新能源发电的随机性和波动性对系统的不利影响。针对含分布式风力发电有源配电网的基站储能经济优化调度问题,首先计及配电网潜在电力中断以及停电恢复时间2个因素,建立基站可靠性评估模型,系统地评估各基站储能的实时可调度容量。进一步以最小化系统运行成本为目标,采用基于变分自编码器(variational auto-encoder,VAE)模型的改进双延迟深度确定性策略梯度(twin delayed deep deterministic policy gradient,TD3)算法求解5G基站储能最优充放电策略。该算法将多基站储能状态用隐变量的形式表征以挖掘数据中隐含的关联,从而降低模型的求解复杂度,提升算法性能。通过迭代求解至收敛,实现多基站储能(multi-base station energy storage,MBSES)系统的实时调控并为每个基站制定符合实际工况的个性化充放电策略。最后通过算例验证了所提方法的有效性。 展开更多
关键词 5G基站 备用储能 可再生能源 可调度容量 特征编码 深度强化学习
下载PDF
Arc-length technique for nonlinear finite element analysis 被引量:9
15
作者 MEMONBashir-Ahmed 苏小卒 《Journal of Zhejiang University Science》 EI CSCD 2004年第5期618-628,共11页
Nonlinear solution of reinforced concrete structures, particularly complete load-deflection response, requires tracing of the equilibrium path and proper treatment of the limit and bifurcation points. In this regard, ... Nonlinear solution of reinforced concrete structures, particularly complete load-deflection response, requires tracing of the equilibrium path and proper treatment of the limit and bifurcation points. In this regard, ordinary solution techniques lead to instability near the limit points and also have problems in case of snap-through and snap-back. Thus they fail to predict the complete load-displacement response. The arc-length method serves the purpose well in principle, received wide acceptance in finite element analysis, and has been used extensively. However modifications to the basic idea are vital to meet the particular needs of the analysis. This paper reviews some of the recent developments of the method in the last two decades, with particular emphasis on nonlinear finite element analysis of reinforced concrete structures. 展开更多
关键词 arc-length method Nonlinear analysis Finite element method Reinforced concrete Load-deflection path document code: a CLC number: TU31 arc-length technique for nonlinear finite element analysis* MEMON Bashir-ahmed# SU Xiao-zu (苏小卒) (Department of Structural Engineering Tongji University Shanghai 200092 China) E-mail: bashirmemon@sohu.com xiaozub@online.sh.cn Received July 30 2003 revision accepted Sept. 11 2003 abstract: Nonlinear solution of reinforced concrete structures particularly complete load-deflection response requires tracing of the equilibrium path and proper treatment of the limit and bifurcation points. In this regard ordinary solution techniques lead to instability near the limit points and also have problems in case of snap-through and snap-back. Thus they fail to predict the complete load-displacement response. The arc-length method serves the purpose well in principle received wide acceptance in finite element analysis and has been used extensively. However modifications to the basic idea are vital to meet the particular needs of the analysis. This paper reviews some of the recent developments of the method in the last two decades with particular emphasis on nonlinear finite element analysis of reinforced concrete structures. Key words: arc-length method Nonlinear analysis Finite element method Reinforced concrete Load-deflection path
下载PDF
Movement and behavior analysis using neural spike signals in CA1 of rat hippocampus
16
作者 Hyejin An Kyungjin You +1 位作者 Minwhan Jung Hyunchool Shin 《Journal of Measurement Science and Instrumentation》 CAS 2013年第4期392-396,共5页
The hippocampus which lies in the temporal lobe plays an important role in spatial navigation,learning and memory.Several studies have been made on the place cell activity,spatial memory,prediction of future locations... The hippocampus which lies in the temporal lobe plays an important role in spatial navigation,learning and memory.Several studies have been made on the place cell activity,spatial memory,prediction of future locations and various learning paradigms.However,there are no attempts which have focused on finding whether neurons which contribute largely to both spatial memory and learning about the reward exist.This paper proposes that there are neurons that can simultaneously engage in forming place memory and reward learning in a rat hippocampus' s CA1 area.With a trained rat,a reward experiment was conducted in a modified 8-shaped maze with five stages,and utterance information was obtained from a CA1 neuron.The firing rate which is the count of spikes per unit time was calculated.The decoding was conducted with log-maximum likelihood estimation(Log-MLE) using Gaussian distribution model.Our outcomes provide evidence of neurons which play a part in spatial memory and learning regarding reward. 展开更多
关键词 HIPPOCaMPUS Ca1 place cell reward learning spatial memory Gaussian distribution maximum likelihood estimation(MLE)document codeaarticle ID:1674-8042(2013)04-0392-05
下载PDF
基于深度强化学习的干扰探测共享信号设计 被引量:1
17
作者 肖易寒 刘禹汐 +1 位作者 于祥祯 赵忠凯 《天津大学学报(自然科学与工程技术版)》 EI CAS CSCD 北大核心 2023年第12期1326-1336,共11页
针对当前雷达电子战越来越向着智能化的方向发展、传统干扰机无法适应环境变化、极大地降低了作战效果等问题,考虑将探测信号隐藏在干扰信号中,实现干扰探测共享信号,使侦察干扰机设备发射的干扰信号兼具探测的效果;针对当前干扰探测共... 针对当前雷达电子战越来越向着智能化的方向发展、传统干扰机无法适应环境变化、极大地降低了作战效果等问题,考虑将探测信号隐藏在干扰信号中,实现干扰探测共享信号,使侦察干扰机设备发射的干扰信号兼具探测的效果;针对当前干扰探测共享信号中存在的复杂度低、频谱宽度较窄等问题,设计了一种基于多载频多相位编码(multi-carrier phase code,MCPC)的干扰探测共享信号,其具有良好的类噪声宽频谱特性以及较好的距离探测能力和速度探测能力,可以在对目标雷达实现压制干扰的同时对目标信号及周围环境进行隐蔽探测;为了使共享信号能够适应对战场环境的感知与博弈,进一步引入深度强化学习算法对MCPC干扰探测共享信号进行优化;首先在竞争深度Q学习网络(dueling deep Q-learning network,Du DQN)的基础上对Q值进行正则化,解决了Du DQN中易出现的由过估计导致的局部最优问题;其次,在奖励值中引入状态价值函数形成复合奖励值,将其称为复合奖励值竞争深度正则化Q学习网络(composite reward-dueling deep Q-learning network based on regularization,CR-Du DQNReg),使MCPC共享信号对奖励值的敏感度随自身状态调整,自适应优化相位编码初值,达到更好的干扰和隐蔽探测的效果.实验仿真结果表明:经CR-DuDQNReg算法优化后的MCPC共享信号频谱最高幅度提升17.48%,脉压最高幅度提升17.25%,多普勒模糊函数第1旁瓣幅度降低12.69%,且与传统深度强化学习算法相比,CR-Du DQNReg算法的优化效果更好. 展开更多
关键词 干扰探测共享信号 多载频多相位编码 深度强化学习 复合奖励值
下载PDF
融合对比预测编码的深度双Q网络 被引量:1
18
作者 刘剑锋 普杰信 孙力帆 《计算机工程与应用》 CSCD 北大核心 2023年第6期162-170,共9页
在模型未知的部分可观测马尔可夫决策过程(partially observable Markov decision process,POMDP)下,智能体无法直接获取环境的真实状态,感知的不确定性为学习最优策略带来挑战。为此,提出一种融合对比预测编码表示的深度双Q网络强化学... 在模型未知的部分可观测马尔可夫决策过程(partially observable Markov decision process,POMDP)下,智能体无法直接获取环境的真实状态,感知的不确定性为学习最优策略带来挑战。为此,提出一种融合对比预测编码表示的深度双Q网络强化学习算法,通过显式地对信念状态建模以获取紧凑、高效的历史编码供策略优化使用。为改善数据利用效率,提出信念回放缓存池的概念,直接存储信念转移对而非观测与动作序列以减少内存占用。此外,设计分段训练策略将表示学习与策略学习解耦来提高训练稳定性。基于Gym-MiniGrid环境设计了POMDP导航任务,实验结果表明,所提出算法能够捕获到与状态相关的语义信息,进而实现POMDP下稳定、高效的策略学习。 展开更多
关键词 部分可观测马尔可夫决策过程 表示学习 强化学习 对比预测编码 深度双Q网络
下载PDF
基于强化学习的自适应编码调制策略
19
作者 马颖 王珂 +1 位作者 吴戈男 邢哲 《电子技术应用》 2023年第5期35-40,共6页
NTN(Non-Terrestrial Network)是面向卫星通信和低空通信的重要应用场景,标志着5G技术应用从陆地通信走向了空间通信,可以预见卫星网络将是未来6G通信网络中重要组成。为了满足卫星通信质量要求、最大程度地增大系统容量,需要应用自适... NTN(Non-Terrestrial Network)是面向卫星通信和低空通信的重要应用场景,标志着5G技术应用从陆地通信走向了空间通信,可以预见卫星网络将是未来6G通信网络中重要组成。为了满足卫星通信质量要求、最大程度地增大系统容量,需要应用自适应编码调制技术根据信道状态信息在不断变化的通信环境下动态调整调制阶数和编码码率。人工智能在解决卫星高动态场景下信道条件快速变化所产生的问题具有明显的潜力。采用基于强化学习的低轨卫星自适应编码调制策略,解决了卫星通信环境的变化造成的门限表与实际信道不匹配的问题,与传统ARIMA (Autoregressive Integrated Moving Average)算法相比提升达到20%以上。 展开更多
关键词 强化学习 6G 自适应编码调制 NTN
下载PDF
基于自适应网络编码的异构无线链路并发传输控制方法研究 被引量:11
20
作者 赵夙 王伟 +1 位作者 朱晓荣 倪钦崟 《电子与信息学报》 EI CSCD 北大核心 2022年第8期2777-2784,共8页
随着高清视频直播、虚拟现实等高速率业务不断兴起,单一的网络很难满足用户的业务需求。利用多种异构链路实现并发传输,可以有效聚合带宽资源,提高服务质量。但是,在异构无线网络中,由于链路状况复杂多变,多条链路质量不一,现有的多路... 随着高清视频直播、虚拟现实等高速率业务不断兴起,单一的网络很难满足用户的业务需求。利用多种异构链路实现并发传输,可以有效聚合带宽资源,提高服务质量。但是,在异构无线网络中,由于链路状况复杂多变,多条链路质量不一,现有的多路径并发传输算法并不能自适应地根据复杂的网络状况做出最优的决策。该文提出了一种自适应网络编码的多路径并发传输控制算法,引入Asynchronous Advantage Actor-Critic(A3C)强化学习,通过自适应的网络编码,根据当前网络状况智能地选择编码分组大小和冗余大小,从而解决数据包的乱序问题。仿真结果表明,该算法能够提高10%左右的传输速率,提升了用户体验。 展开更多
关键词 无线网络 并发传输 网络编码 强化学习
下载PDF
上一页 1 2 下一页 到第
使用帮助 返回顶部