Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum co...Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum computer. For this new topological stabilizer code-XYZ^(2) code defined on the cellular lattice, it is implemented on a hexagonal lattice of qubits and it encodes the logical qubits with the help of stabilizer measurements of weight six and weight two. However topological stabilizer codes in cellular lattice quantum systems suffer from the detrimental effects of noise due to interaction with the environment. Several decoding approaches have been proposed to address this problem. Here, we propose the use of a state-attention based reinforcement learning decoder to decode XYZ^(2) codes, which enables the decoder to more accurately focus on the information related to the current decoding position, and the error correction accuracy of our reinforcement learning decoder model under the optimisation conditions can reach 83.27% under the depolarizing noise model, and we have measured thresholds of 0.18856 and 0.19043 for XYZ^(2) codes at code spacing of 3–7 and 7–11, respectively. our study provides directions and ideas for applications of decoding schemes combining reinforcement learning attention mechanisms to other topological quantum error-correcting codes.展开更多
Quantum error correction technology is an important method to eliminate errors during the operation of quantum computers.In order to solve the problem of influence of errors on physical qubits,we propose an approximat...Quantum error correction technology is an important method to eliminate errors during the operation of quantum computers.In order to solve the problem of influence of errors on physical qubits,we propose an approximate error correction scheme that performs dimension mapping operations on surface codes.This error correction scheme utilizes the topological properties of error correction codes to map the surface code dimension to three dimensions.Compared to previous error correction schemes,the present three-dimensional surface code exhibits good scalability due to its higher redundancy and more efficient error correction capabilities.By reducing the number of ancilla qubits required for error correction,this approach achieves savings in measurement space and reduces resource consumption costs.In order to improve the decoding efficiency and solve the problem of the correlation between the surface code stabilizer and the 3D space after dimension mapping,we employ a reinforcement learning(RL)decoder based on deep Q-learning,which enables faster identification of the optimal syndrome and achieves better thresholds through conditional optimization.Compared to the minimum weight perfect matching decoding,the threshold of the RL trained model reaches 0.78%,which is 56%higher and enables large-scale fault-tolerant quantum computation.展开更多
On the basis of asymptotic theory of Gersho, the isodistortion principle of vector clustering was discussed and a kind of competitive and selective learning method (CSL) which may avoid local optimization and have exc...On the basis of asymptotic theory of Gersho, the isodistortion principle of vector clustering was discussed and a kind of competitive and selective learning method (CSL) which may avoid local optimization and have excellent result in application to clusters of HMM model was also proposed. In combining the parallel, self organizational hierarchical neural networks (PSHNN) to reclassify the scores of every form output by HMM, the CSL speech recognition rate is obviously elevated.展开更多
Richly formatted documents,such as financial disclosures,scientific articles,government regulations,widely exist on Web.However,since most of these documents are only for public reading,the styling information inside ...Richly formatted documents,such as financial disclosures,scientific articles,government regulations,widely exist on Web.However,since most of these documents are only for public reading,the styling information inside them is usually missing,making them improper or even burdensome to be displayed and edited in different formats and platforms.In this study we formulate the task of document styling restoration as an optimization problem,which aims to identify the styling settings on the document elements,e.g.,lines,table cells,text,so that rendering with the output styling settings results in a document,where each element inside it holds the(closely)exact position with the one in the original document.Considering that each styling setting is a decision,this problem can be transformed as a multi-step decision-making task over all the document elements,and then be solved by reinforcement learning.Specifically,Monte-Carlo Tree Search(MCTS)is leveraged to explore the different styling settings,and the policy function is learnt under the supervision of the delayed rewards.As a case study,we restore the styling information inside tables,where structural and functional data in the documents are usually presented.Experiment shows that,our best reinforcement method successfully restores the stylings in 87.65%of the tables,with 25.75%absolute improvement over the greedymethod.We also discuss the tradeoff between the inference time and restoration success rate,and argue that although the reinforcement methods cannot be used in real-time scenarios,it is suitable for the offline tasks with high-quality requirement.Finally,this model has been applied in a PDF parser to support cross-format display.展开更多
Scalable video coding(SVC)has been widely used in video-on-demand(VOD)service,to efficiently satisfy users’different video quality requirements and dynamically adjust video stream to timevariant wireless channels.Und...Scalable video coding(SVC)has been widely used in video-on-demand(VOD)service,to efficiently satisfy users’different video quality requirements and dynamically adjust video stream to timevariant wireless channels.Under the 5G network structure,we consider a cooperative caching scheme inside each cluster with SVC to economically utilize the limited caching storage.A novel multi-agent deep reinforcement learning(MADRL)framework is proposed to jointly optimize the video access delay and users’satisfaction,where an aggregation node is introduced helping individual agents to achieve global observations and overall system rewards.Moreover,to cope with the large action space caused by the large number of videos and users,a dimension decomposition method is embedded into the neural network in each agent,which greatly reduce the computational complexity and memory cost of the reinforcement learning.Experimental results show that:1)the proposed value-decomposed dimensional network(VDDN)algorithm achieves an obvious performance gain versus the traditional MADRL;2)the proposed VDDN algorithm can handle an extremely large action space and quickly converge with a low computational complexity.展开更多
A stable control scheme for a class of unknown nonlinear systems was presented. The control architecture is composed of two parts, the fuzzy sliding mode controller (FSMC) is applied to drive the state to a designed s...A stable control scheme for a class of unknown nonlinear systems was presented. The control architecture is composed of two parts, the fuzzy sliding mode controller (FSMC) is applied to drive the state to a designed switching hyperplane, and a reinforcement self organizing fuzzy CPN (RSOFCPN) as a feedforward compensator is used to reduce the influence of system uncertainties. The simulation results demonstrate the effectiveness of the proposed control scheme.展开更多
随着移动通信向5G快速更新换代,5G基站建设规模快速增长,可将海量5G通信基站中的闲置储能视作灵活性资源参与电力系统调度,以减轻新能源发电的随机性和波动性对系统的不利影响。针对含分布式风力发电有源配电网的基站储能经济优化调度问...随着移动通信向5G快速更新换代,5G基站建设规模快速增长,可将海量5G通信基站中的闲置储能视作灵活性资源参与电力系统调度,以减轻新能源发电的随机性和波动性对系统的不利影响。针对含分布式风力发电有源配电网的基站储能经济优化调度问题,首先计及配电网潜在电力中断以及停电恢复时间2个因素,建立基站可靠性评估模型,系统地评估各基站储能的实时可调度容量。进一步以最小化系统运行成本为目标,采用基于变分自编码器(variational auto-encoder,VAE)模型的改进双延迟深度确定性策略梯度(twin delayed deep deterministic policy gradient,TD3)算法求解5G基站储能最优充放电策略。该算法将多基站储能状态用隐变量的形式表征以挖掘数据中隐含的关联,从而降低模型的求解复杂度,提升算法性能。通过迭代求解至收敛,实现多基站储能(multi-base station energy storage,MBSES)系统的实时调控并为每个基站制定符合实际工况的个性化充放电策略。最后通过算例验证了所提方法的有效性。展开更多
Nonlinear solution of reinforced concrete structures, particularly complete load-deflection response, requires tracing of the equilibrium path and proper treatment of the limit and bifurcation points. In this regard, ...Nonlinear solution of reinforced concrete structures, particularly complete load-deflection response, requires tracing of the equilibrium path and proper treatment of the limit and bifurcation points. In this regard, ordinary solution techniques lead to instability near the limit points and also have problems in case of snap-through and snap-back. Thus they fail to predict the complete load-displacement response. The arc-length method serves the purpose well in principle, received wide acceptance in finite element analysis, and has been used extensively. However modifications to the basic idea are vital to meet the particular needs of the analysis. This paper reviews some of the recent developments of the method in the last two decades, with particular emphasis on nonlinear finite element analysis of reinforced concrete structures.展开更多
The hippocampus which lies in the temporal lobe plays an important role in spatial navigation,learning and memory.Several studies have been made on the place cell activity,spatial memory,prediction of future locations...The hippocampus which lies in the temporal lobe plays an important role in spatial navigation,learning and memory.Several studies have been made on the place cell activity,spatial memory,prediction of future locations and various learning paradigms.However,there are no attempts which have focused on finding whether neurons which contribute largely to both spatial memory and learning about the reward exist.This paper proposes that there are neurons that can simultaneously engage in forming place memory and reward learning in a rat hippocampus' s CA1 area.With a trained rat,a reward experiment was conducted in a modified 8-shaped maze with five stages,and utterance information was obtained from a CA1 neuron.The firing rate which is the count of spikes per unit time was calculated.The decoding was conducted with log-maximum likelihood estimation(Log-MLE) using Gaussian distribution model.Our outcomes provide evidence of neurons which play a part in spatial memory and learning regarding reward.展开更多
针对当前雷达电子战越来越向着智能化的方向发展、传统干扰机无法适应环境变化、极大地降低了作战效果等问题,考虑将探测信号隐藏在干扰信号中,实现干扰探测共享信号,使侦察干扰机设备发射的干扰信号兼具探测的效果;针对当前干扰探测共...针对当前雷达电子战越来越向着智能化的方向发展、传统干扰机无法适应环境变化、极大地降低了作战效果等问题,考虑将探测信号隐藏在干扰信号中,实现干扰探测共享信号,使侦察干扰机设备发射的干扰信号兼具探测的效果;针对当前干扰探测共享信号中存在的复杂度低、频谱宽度较窄等问题,设计了一种基于多载频多相位编码(multi-carrier phase code,MCPC)的干扰探测共享信号,其具有良好的类噪声宽频谱特性以及较好的距离探测能力和速度探测能力,可以在对目标雷达实现压制干扰的同时对目标信号及周围环境进行隐蔽探测;为了使共享信号能够适应对战场环境的感知与博弈,进一步引入深度强化学习算法对MCPC干扰探测共享信号进行优化;首先在竞争深度Q学习网络(dueling deep Q-learning network,Du DQN)的基础上对Q值进行正则化,解决了Du DQN中易出现的由过估计导致的局部最优问题;其次,在奖励值中引入状态价值函数形成复合奖励值,将其称为复合奖励值竞争深度正则化Q学习网络(composite reward-dueling deep Q-learning network based on regularization,CR-Du DQNReg),使MCPC共享信号对奖励值的敏感度随自身状态调整,自适应优化相位编码初值,达到更好的干扰和隐蔽探测的效果.实验仿真结果表明:经CR-DuDQNReg算法优化后的MCPC共享信号频谱最高幅度提升17.48%,脉压最高幅度提升17.25%,多普勒模糊函数第1旁瓣幅度降低12.69%,且与传统深度强化学习算法相比,CR-Du DQNReg算法的优化效果更好.展开更多
基金supported by the Natural Science Foundation of Shandong Province,China (Grant No. ZR2021MF049)Joint Fund of Natural Science Foundation of Shandong Province (Grant Nos. ZR2022LLZ012 and ZR2021LLZ001)。
文摘Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum computer. For this new topological stabilizer code-XYZ^(2) code defined on the cellular lattice, it is implemented on a hexagonal lattice of qubits and it encodes the logical qubits with the help of stabilizer measurements of weight six and weight two. However topological stabilizer codes in cellular lattice quantum systems suffer from the detrimental effects of noise due to interaction with the environment. Several decoding approaches have been proposed to address this problem. Here, we propose the use of a state-attention based reinforcement learning decoder to decode XYZ^(2) codes, which enables the decoder to more accurately focus on the information related to the current decoding position, and the error correction accuracy of our reinforcement learning decoder model under the optimisation conditions can reach 83.27% under the depolarizing noise model, and we have measured thresholds of 0.18856 and 0.19043 for XYZ^(2) codes at code spacing of 3–7 and 7–11, respectively. our study provides directions and ideas for applications of decoding schemes combining reinforcement learning attention mechanisms to other topological quantum error-correcting codes.
基金Project supported by the Natural Science Foundation of Shandong Province,China(Grant Nos.ZR2021MF049,ZR2022LLZ012,and ZR2021LLZ001)。
文摘Quantum error correction technology is an important method to eliminate errors during the operation of quantum computers.In order to solve the problem of influence of errors on physical qubits,we propose an approximate error correction scheme that performs dimension mapping operations on surface codes.This error correction scheme utilizes the topological properties of error correction codes to map the surface code dimension to three dimensions.Compared to previous error correction schemes,the present three-dimensional surface code exhibits good scalability due to its higher redundancy and more efficient error correction capabilities.By reducing the number of ancilla qubits required for error correction,this approach achieves savings in measurement space and reduces resource consumption costs.In order to improve the decoding efficiency and solve the problem of the correlation between the surface code stabilizer and the 3D space after dimension mapping,we employ a reinforcement learning(RL)decoder based on deep Q-learning,which enables faster identification of the optimal syndrome and achieves better thresholds through conditional optimization.Compared to the minimum weight perfect matching decoding,the threshold of the RL trained model reaches 0.78%,which is 56%higher and enables large-scale fault-tolerant quantum computation.
基金National Natural Science Foundation ofChina!( No.69672 0 0 7)
文摘On the basis of asymptotic theory of Gersho, the isodistortion principle of vector clustering was discussed and a kind of competitive and selective learning method (CSL) which may avoid local optimization and have excellent result in application to clusters of HMM model was also proposed. In combining the parallel, self organizational hierarchical neural networks (PSHNN) to reclassify the scores of every form output by HMM, the CSL speech recognition rate is obviously elevated.
基金This work was supported by the National Key Research and Development Program of China(2017YFB1002104)the National Natural Science Foundation of China(Grant No.U1811461)the Innovation Program of Institute of Computing Technology,CAS.
文摘Richly formatted documents,such as financial disclosures,scientific articles,government regulations,widely exist on Web.However,since most of these documents are only for public reading,the styling information inside them is usually missing,making them improper or even burdensome to be displayed and edited in different formats and platforms.In this study we formulate the task of document styling restoration as an optimization problem,which aims to identify the styling settings on the document elements,e.g.,lines,table cells,text,so that rendering with the output styling settings results in a document,where each element inside it holds the(closely)exact position with the one in the original document.Considering that each styling setting is a decision,this problem can be transformed as a multi-step decision-making task over all the document elements,and then be solved by reinforcement learning.Specifically,Monte-Carlo Tree Search(MCTS)is leveraged to explore the different styling settings,and the policy function is learnt under the supervision of the delayed rewards.As a case study,we restore the styling information inside tables,where structural and functional data in the documents are usually presented.Experiment shows that,our best reinforcement method successfully restores the stylings in 87.65%of the tables,with 25.75%absolute improvement over the greedymethod.We also discuss the tradeoff between the inference time and restoration success rate,and argue that although the reinforcement methods cannot be used in real-time scenarios,it is suitable for the offline tasks with high-quality requirement.Finally,this model has been applied in a PDF parser to support cross-format display.
基金supported by the National Natural Science Foundation of China under Grant No.61801119。
文摘Scalable video coding(SVC)has been widely used in video-on-demand(VOD)service,to efficiently satisfy users’different video quality requirements and dynamically adjust video stream to timevariant wireless channels.Under the 5G network structure,we consider a cooperative caching scheme inside each cluster with SVC to economically utilize the limited caching storage.A novel multi-agent deep reinforcement learning(MADRL)framework is proposed to jointly optimize the video access delay and users’satisfaction,where an aggregation node is introduced helping individual agents to achieve global observations and overall system rewards.Moreover,to cope with the large action space caused by the large number of videos and users,a dimension decomposition method is embedded into the neural network in each agent,which greatly reduce the computational complexity and memory cost of the reinforcement learning.Experimental results show that:1)the proposed value-decomposed dimensional network(VDDN)algorithm achieves an obvious performance gain versus the traditional MADRL;2)the proposed VDDN algorithm can handle an extremely large action space and quickly converge with a low computational complexity.
基金National Natural Science Foundation ofChina! under grant No.69674 0 2 3
文摘A stable control scheme for a class of unknown nonlinear systems was presented. The control architecture is composed of two parts, the fuzzy sliding mode controller (FSMC) is applied to drive the state to a designed switching hyperplane, and a reinforcement self organizing fuzzy CPN (RSOFCPN) as a feedforward compensator is used to reduce the influence of system uncertainties. The simulation results demonstrate the effectiveness of the proposed control scheme.
文摘随着移动通信向5G快速更新换代,5G基站建设规模快速增长,可将海量5G通信基站中的闲置储能视作灵活性资源参与电力系统调度,以减轻新能源发电的随机性和波动性对系统的不利影响。针对含分布式风力发电有源配电网的基站储能经济优化调度问题,首先计及配电网潜在电力中断以及停电恢复时间2个因素,建立基站可靠性评估模型,系统地评估各基站储能的实时可调度容量。进一步以最小化系统运行成本为目标,采用基于变分自编码器(variational auto-encoder,VAE)模型的改进双延迟深度确定性策略梯度(twin delayed deep deterministic policy gradient,TD3)算法求解5G基站储能最优充放电策略。该算法将多基站储能状态用隐变量的形式表征以挖掘数据中隐含的关联,从而降低模型的求解复杂度,提升算法性能。通过迭代求解至收敛,实现多基站储能(multi-base station energy storage,MBSES)系统的实时调控并为每个基站制定符合实际工况的个性化充放电策略。最后通过算例验证了所提方法的有效性。
文摘Nonlinear solution of reinforced concrete structures, particularly complete load-deflection response, requires tracing of the equilibrium path and proper treatment of the limit and bifurcation points. In this regard, ordinary solution techniques lead to instability near the limit points and also have problems in case of snap-through and snap-back. Thus they fail to predict the complete load-displacement response. The arc-length method serves the purpose well in principle, received wide acceptance in finite element analysis, and has been used extensively. However modifications to the basic idea are vital to meet the particular needs of the analysis. This paper reviews some of the recent developments of the method in the last two decades, with particular emphasis on nonlinear finite element analysis of reinforced concrete structures.
基金The MSIP(Ministry of Science,ICT&Future Planning),Korea,under the ITRC(Information Technology Research Center)support program(NIPA-2013-H0301-13-2006)supervised by the NIPA(National IT Industry Promotion Agency)The Brain Research Program through the National Research Foundation of Korea funded by the Ministry of Science,ICT&Future Planning(2011-0019212)
文摘The hippocampus which lies in the temporal lobe plays an important role in spatial navigation,learning and memory.Several studies have been made on the place cell activity,spatial memory,prediction of future locations and various learning paradigms.However,there are no attempts which have focused on finding whether neurons which contribute largely to both spatial memory and learning about the reward exist.This paper proposes that there are neurons that can simultaneously engage in forming place memory and reward learning in a rat hippocampus' s CA1 area.With a trained rat,a reward experiment was conducted in a modified 8-shaped maze with five stages,and utterance information was obtained from a CA1 neuron.The firing rate which is the count of spikes per unit time was calculated.The decoding was conducted with log-maximum likelihood estimation(Log-MLE) using Gaussian distribution model.Our outcomes provide evidence of neurons which play a part in spatial memory and learning regarding reward.
文摘针对当前雷达电子战越来越向着智能化的方向发展、传统干扰机无法适应环境变化、极大地降低了作战效果等问题,考虑将探测信号隐藏在干扰信号中,实现干扰探测共享信号,使侦察干扰机设备发射的干扰信号兼具探测的效果;针对当前干扰探测共享信号中存在的复杂度低、频谱宽度较窄等问题,设计了一种基于多载频多相位编码(multi-carrier phase code,MCPC)的干扰探测共享信号,其具有良好的类噪声宽频谱特性以及较好的距离探测能力和速度探测能力,可以在对目标雷达实现压制干扰的同时对目标信号及周围环境进行隐蔽探测;为了使共享信号能够适应对战场环境的感知与博弈,进一步引入深度强化学习算法对MCPC干扰探测共享信号进行优化;首先在竞争深度Q学习网络(dueling deep Q-learning network,Du DQN)的基础上对Q值进行正则化,解决了Du DQN中易出现的由过估计导致的局部最优问题;其次,在奖励值中引入状态价值函数形成复合奖励值,将其称为复合奖励值竞争深度正则化Q学习网络(composite reward-dueling deep Q-learning network based on regularization,CR-Du DQNReg),使MCPC共享信号对奖励值的敏感度随自身状态调整,自适应优化相位编码初值,达到更好的干扰和隐蔽探测的效果.实验仿真结果表明:经CR-DuDQNReg算法优化后的MCPC共享信号频谱最高幅度提升17.48%,脉压最高幅度提升17.25%,多普勒模糊函数第1旁瓣幅度降低12.69%,且与传统深度强化学习算法相比,CR-Du DQNReg算法的优化效果更好.