Offline signature verification(OfSV)is essential in preventing the falsification of documents.Deep learning(DL)based OfSVs require a high number of signature images to attain acceptable performance.However,a limited n...Offline signature verification(OfSV)is essential in preventing the falsification of documents.Deep learning(DL)based OfSVs require a high number of signature images to attain acceptable performance.However,a limited number of signature samples are available to train these models in a real-world scenario.Several researchers have proposed models to augment new signature images by applying various transformations.Others,on the other hand,have used human neuromotor and cognitive-inspired augmentation models to address the demand for more signature samples.Hence,augmenting a sufficient number of signatures with variations is still a challenging task.This study proposed OffSig-SinGAN:a deep learning-based image augmentation model to address the limited number of signatures problem on offline signature verification.The proposed model is capable of augmenting better quality signatures with diversity from a single signature image only.It is empirically evaluated on widely used public datasets;GPDSsyntheticSignature.The quality of augmented signature images is assessed using four metrics like pixel-by-pixel difference,peak signal-to-noise ratio(PSNR),structural similarity index measure(SSIM),and frechet inception distance(FID).Furthermore,various experiments were organised to evaluate the proposed image augmentation model’s performance on selected DL-based OfSV systems and to prove whether it helped to improve the verification accuracy rate.Experiment results showed that the proposed augmentation model performed better on the GPDSsyntheticSignature dataset than other augmentation methods.The improved verification accuracy rate of the selected DL-based OfSV system proved the effectiveness of the proposed augmentation model.展开更多
Offline reinforcement learning(ORL)aims to learn a rational agent purely from behavior data without any online interaction.One of the major challenges encountered in ORL is the problem of distribution shift,i.e.,the m...Offline reinforcement learning(ORL)aims to learn a rational agent purely from behavior data without any online interaction.One of the major challenges encountered in ORL is the problem of distribution shift,i.e.,the mismatch between the knowledge of the learned policy and the reality of the underlying environment.Recent works usually handle this in a too pessimistic manner to avoid out-of-distribution(OOD)queries as much as possible,but this can influence the robustness of the agents at unseen states.In this paper,we propose a simple but effective method to address this issue.The key idea of our method is to enhance the robustness of the new policy learned offline by weakening its confidence in highly uncertain regions,and we propose to find those regions by simulating them with modified Generative Adversarial Nets(GAN)such that the generated data not only follow the same distribution with the old experience but are very difficult to deal with by themselves,with regard to the behavior policy or some other reference policy.We then use this information to regularize the ORL algorithm to penalize the overconfidence behavior in these regions.Extensive experiments on several publicly available offline RL benchmarks demonstrate the feasibility and effectiveness of the proposed method.展开更多
Since the socialism with Chinese characteristics has entered this new era,the“curriculum ideology and politics”concept has become one of the innovative achievements in the reformation of ideological and political ed...Since the socialism with Chinese characteristics has entered this new era,the“curriculum ideology and politics”concept has become one of the innovative achievements in the reformation of ideological and political education courses in colleges as well as universities.Based on the emphasis of“curriculum ideology and politics”among graduate students and the influence of the“learning to strengthen the country”concept,this article analyzes universities in regard to the curriculum settings,faculties,and their graduate students.It also explores the“curriculum ideology and politics”concept in consideration of the ontology of teaching,school education,social influence,etc.,and propose practical and extendable countermeasures.展开更多
The coronavirus has affected many areas of life,especially in the field of education.With the beginning of the Pandemic,the transition to online learning began,which affected the development of students and teachers i...The coronavirus has affected many areas of life,especially in the field of education.With the beginning of the Pandemic,the transition to online learning began,which affected the development of students and teachers in terms of using innovative technologies and programs,such as Zoom,Webex,Discord,Google Meet,Moodle,EDX,Coursera,www.examus.network,etc.In this regard,many teachers are wondering whether the online method of teaching is as effective as the offline method.In this article,we focused on finding out whether there is a significant difference in student performance between online and offline modes of learning in the study of mathematics.58 students were in a group where they studied online and 58 students were in a group where they studied offline.The study involved first-year college students of Jambyl Innovative Higher College(JICH)in Taraz,Kazakhstan.The final control work was carried out at the end of week 18,which tested all areas covered by the topic in both groups.The average scores of students studying offline were compared with the average of students studying online.To avoid confusion,the researchers also conducted and analyzed an independent t-test.The results showed that there is a significant difference in the academic performance of students who study online and offline.The offline teaching method has proven to be more effective for improving students’understanding and comprehension of mathematics topics.展开更多
With the construction of the power Internet of Things(IoT),communication between smart devices in urban distribution networks has been gradually moving towards high speed,high compatibility,and low latency,which provi...With the construction of the power Internet of Things(IoT),communication between smart devices in urban distribution networks has been gradually moving towards high speed,high compatibility,and low latency,which provides reliable support for reconfiguration optimization in urban distribution networks.Thus,this study proposed a deep reinforcement learning based multi-level dynamic reconfiguration method for urban distribution networks in a cloud-edge collaboration architecture to obtain a real-time optimal multi-level dynamic reconfiguration solution.First,the multi-level dynamic reconfiguration method was discussed,which included feeder-,transformer-,and substation-levels.Subsequently,the multi-agent system was combined with the cloud-edge collaboration architecture to build a deep reinforcement learning model for multi-level dynamic reconfiguration in an urban distribution network.The cloud-edge collaboration architecture can effectively support the multi-agent system to conduct“centralized training and decentralized execution”operation modes and improve the learning efficiency of the model.Thereafter,for a multi-agent system,this study adopted a combination of offline and online learning to endow the model with the ability to realize automatic optimization and updation of the strategy.In the offline learning phase,a Q-learning-based multi-agent conservative Q-learning(MACQL)algorithm was proposed to stabilize the learning results and reduce the risk of the next online learning phase.In the online learning phase,a multi-agent deep deterministic policy gradient(MADDPG)algorithm based on policy gradients was proposed to explore the action space and update the experience pool.Finally,the effectiveness of the proposed method was verified through a simulation analysis of a real-world 445-node system.展开更多
At present,the parameters of radar detection rely heavily on manual adjustment and empirical knowledge,resulting in low automation.Traditional manual adjustment methods cannot meet the requirements of modern radars fo...At present,the parameters of radar detection rely heavily on manual adjustment and empirical knowledge,resulting in low automation.Traditional manual adjustment methods cannot meet the requirements of modern radars for high efficiency,high precision,and high automation.Therefore,it is necessary to explore a new intelligent radar control learning framework and technology to improve the capability and automation of radar detection.Reinforcement learning is popular in decision task learning,but the shortage of samples in radar control tasks makes it difficult to meet the requirements of reinforcement learning.To address the above issues,we propose a practical radar operation reinforcement learning framework,and integrate offline reinforcement learning and meta-reinforcement learning methods to alleviate the sample requirements of reinforcement learning.Experimental results show that our method can automatically perform as humans in radar detection with real-world settings,thereby promoting the practical application of reinforcement learning in radar operation.展开更多
Reinforcement Learning(RL)has emerged as a promising data-driven solution for wargaming decision-making.However,two domain challenges still exist:(1)dealing with discrete-continuous hybrid wargaming control and(2)acce...Reinforcement Learning(RL)has emerged as a promising data-driven solution for wargaming decision-making.However,two domain challenges still exist:(1)dealing with discrete-continuous hybrid wargaming control and(2)accelerating RL deployment with rich offline data.Existing RL methods fail to handle these two issues simultaneously,thereby we propose a novel offline RL method targeting hybrid action space.A new constrained action representation technique is developed to build a bidirectional mapping between the original hybrid action space and a latent space in a semantically consistent way.This allows learning a continuous latent policy with offline RL with better exploration feasibility and scalability and reconstructing it back to a needed hybrid policy.Critically,a novel offline RL optimization objective with adaptively adjusted constraints is designed to balance the alleviation and generalization of out-of-distribution actions.Our method demonstrates superior performance and generality across different tasks,particularly in typical realistic wargaming scenarios.展开更多
Offline Urdu Nastaleeq text recognition has long been a serious problem due to its very cursive nature. In order to get rid of the character segmentation problems, many researchers are shifting focus towards segmentat...Offline Urdu Nastaleeq text recognition has long been a serious problem due to its very cursive nature. In order to get rid of the character segmentation problems, many researchers are shifting focus towards segmentation free ligature based recognition approaches. Majority of the prevalent ligature based recognition systems heavily rely on hand-engineered feature extraction techniques. However, such techniques are more error prone and may often lead to a loss of useful information that might hardly be captured later by any manual features. Most of the prevalent Urdu Nastaleeq test recognition was trained and tested on small sets. This paper proposes the use of stacked denoising autoencoder for automatic feature extraction directly from raw pixel values of ligature images. Such deep learning networks have not been applied for the recognition of Urdu text thus far. Different stacked denoising autoencoders have been trained on 178573 ligatures with 3732 classes from un-degraded(noise free) UPTI(Urdu Printed Text Image) data set. Subsequently, trained networks are validated and tested on degraded versions of UPTI data set. The experimental results demonstrate accuracies in range of 93% to 96% which are better than the existing Urdu OCR systems for such large dataset of ligatures.展开更多
Offline reinforcement learning(RL)is a data-driven learning paradigm for sequential decision making.Mitigating the overestimation of values originating from out-of-distribution(OOD)states induced by the distribution s...Offline reinforcement learning(RL)is a data-driven learning paradigm for sequential decision making.Mitigating the overestimation of values originating from out-of-distribution(OOD)states induced by the distribution shift between the learning policy and the previously-collected offline dataset lies at the core of offline RL.To tackle this problem,some methods underestimate the values of states given by learned dynamics models or state-action pairs with actions sampled from policies different from the behavior policy.However,since these generated states or state-action pairs are not guaranteed to be OOD,staying conservative on them may adversely affect the in-distribution ones.In this paper,we propose an OOD state-conservative offline RL method(OSCAR),which aims to address the limitation by explicitly generating reliable OOD states that are located near the manifold of the offline dataset,and then design a conservative policy evaluation approach that combines the vanilla Bellman error with a regularization term that only underestimates the values of these generated OOD states.In this way,we can prevent the value errors of OOD states from propagating to in-distribution states through value bootstrapping and policy improvement.We also theoretically prove that the proposed conservative policy evaluation approach guarantees to underestimate the values of OOD states.OSCAR along with several strong baselines is evaluated on the offline decision-making benchmarks D4RL and autonomous driving benchmark SMARTS.Experimental results show that OSCAR outperforms the baselines on a large portion of the benchmarks and attains the highest average return,substantially outperforming existing offline RL methods.展开更多
文摘Offline signature verification(OfSV)is essential in preventing the falsification of documents.Deep learning(DL)based OfSVs require a high number of signature images to attain acceptable performance.However,a limited number of signature samples are available to train these models in a real-world scenario.Several researchers have proposed models to augment new signature images by applying various transformations.Others,on the other hand,have used human neuromotor and cognitive-inspired augmentation models to address the demand for more signature samples.Hence,augmenting a sufficient number of signatures with variations is still a challenging task.This study proposed OffSig-SinGAN:a deep learning-based image augmentation model to address the limited number of signatures problem on offline signature verification.The proposed model is capable of augmenting better quality signatures with diversity from a single signature image only.It is empirically evaluated on widely used public datasets;GPDSsyntheticSignature.The quality of augmented signature images is assessed using four metrics like pixel-by-pixel difference,peak signal-to-noise ratio(PSNR),structural similarity index measure(SSIM),and frechet inception distance(FID).Furthermore,various experiments were organised to evaluate the proposed image augmentation model’s performance on selected DL-based OfSV systems and to prove whether it helped to improve the verification accuracy rate.Experiment results showed that the proposed augmentation model performed better on the GPDSsyntheticSignature dataset than other augmentation methods.The improved verification accuracy rate of the selected DL-based OfSV system proved the effectiveness of the proposed augmentation model.
基金supported by the National Key R&D program of China under Grant No.2021ZD0113203National Science Foundation of China under Grant No.61976115.
文摘Offline reinforcement learning(ORL)aims to learn a rational agent purely from behavior data without any online interaction.One of the major challenges encountered in ORL is the problem of distribution shift,i.e.,the mismatch between the knowledge of the learned policy and the reality of the underlying environment.Recent works usually handle this in a too pessimistic manner to avoid out-of-distribution(OOD)queries as much as possible,but this can influence the robustness of the agents at unseen states.In this paper,we propose a simple but effective method to address this issue.The key idea of our method is to enhance the robustness of the new policy learned offline by weakening its confidence in highly uncertain regions,and we propose to find those regions by simulating them with modified Generative Adversarial Nets(GAN)such that the generated data not only follow the same distribution with the old experience but are very difficult to deal with by themselves,with regard to the behavior policy or some other reference policy.We then use this information to regularize the ORL algorithm to penalize the overconfidence behavior in these regions.Extensive experiments on several publicly available offline RL benchmarks demonstrate the feasibility and effectiveness of the proposed method.
基金Anhui University of Finance and Economics Postgraduate Research and Innovation Fund Project(ACYC2020280).
文摘Since the socialism with Chinese characteristics has entered this new era,the“curriculum ideology and politics”concept has become one of the innovative achievements in the reformation of ideological and political education courses in colleges as well as universities.Based on the emphasis of“curriculum ideology and politics”among graduate students and the influence of the“learning to strengthen the country”concept,this article analyzes universities in regard to the curriculum settings,faculties,and their graduate students.It also explores the“curriculum ideology and politics”concept in consideration of the ontology of teaching,school education,social influence,etc.,and propose practical and extendable countermeasures.
文摘The coronavirus has affected many areas of life,especially in the field of education.With the beginning of the Pandemic,the transition to online learning began,which affected the development of students and teachers in terms of using innovative technologies and programs,such as Zoom,Webex,Discord,Google Meet,Moodle,EDX,Coursera,www.examus.network,etc.In this regard,many teachers are wondering whether the online method of teaching is as effective as the offline method.In this article,we focused on finding out whether there is a significant difference in student performance between online and offline modes of learning in the study of mathematics.58 students were in a group where they studied online and 58 students were in a group where they studied offline.The study involved first-year college students of Jambyl Innovative Higher College(JICH)in Taraz,Kazakhstan.The final control work was carried out at the end of week 18,which tested all areas covered by the topic in both groups.The average scores of students studying offline were compared with the average of students studying online.To avoid confusion,the researchers also conducted and analyzed an independent t-test.The results showed that there is a significant difference in the academic performance of students who study online and offline.The offline teaching method has proven to be more effective for improving students’understanding and comprehension of mathematics topics.
基金supported by the National Natural Science Foundation of China under Grant 52077146.
文摘With the construction of the power Internet of Things(IoT),communication between smart devices in urban distribution networks has been gradually moving towards high speed,high compatibility,and low latency,which provides reliable support for reconfiguration optimization in urban distribution networks.Thus,this study proposed a deep reinforcement learning based multi-level dynamic reconfiguration method for urban distribution networks in a cloud-edge collaboration architecture to obtain a real-time optimal multi-level dynamic reconfiguration solution.First,the multi-level dynamic reconfiguration method was discussed,which included feeder-,transformer-,and substation-levels.Subsequently,the multi-agent system was combined with the cloud-edge collaboration architecture to build a deep reinforcement learning model for multi-level dynamic reconfiguration in an urban distribution network.The cloud-edge collaboration architecture can effectively support the multi-agent system to conduct“centralized training and decentralized execution”operation modes and improve the learning efficiency of the model.Thereafter,for a multi-agent system,this study adopted a combination of offline and online learning to endow the model with the ability to realize automatic optimization and updation of the strategy.In the offline learning phase,a Q-learning-based multi-agent conservative Q-learning(MACQL)algorithm was proposed to stabilize the learning results and reduce the risk of the next online learning phase.In the online learning phase,a multi-agent deep deterministic policy gradient(MADDPG)algorithm based on policy gradients was proposed to explore the action space and update the experience pool.Finally,the effectiveness of the proposed method was verified through a simulation analysis of a real-world 445-node system.
基金supported by Science and Technology Innovation 2030 New Generation Artificial Intelligence Major Project under Grant No.2021ZD0113303the National Natural Science Foundation of China under Grant Nos.62192783 and 62276128,and in part by the Collaborative Innovation Center of Novel Software Technology and Industrialization.
文摘At present,the parameters of radar detection rely heavily on manual adjustment and empirical knowledge,resulting in low automation.Traditional manual adjustment methods cannot meet the requirements of modern radars for high efficiency,high precision,and high automation.Therefore,it is necessary to explore a new intelligent radar control learning framework and technology to improve the capability and automation of radar detection.Reinforcement learning is popular in decision task learning,but the shortage of samples in radar control tasks makes it difficult to meet the requirements of reinforcement learning.To address the above issues,we propose a practical radar operation reinforcement learning framework,and integrate offline reinforcement learning and meta-reinforcement learning methods to alleviate the sample requirements of reinforcement learning.Experimental results show that our method can automatically perform as humans in radar detection with real-world settings,thereby promoting the practical application of reinforcement learning in radar operation.
文摘Reinforcement Learning(RL)has emerged as a promising data-driven solution for wargaming decision-making.However,two domain challenges still exist:(1)dealing with discrete-continuous hybrid wargaming control and(2)accelerating RL deployment with rich offline data.Existing RL methods fail to handle these two issues simultaneously,thereby we propose a novel offline RL method targeting hybrid action space.A new constrained action representation technique is developed to build a bidirectional mapping between the original hybrid action space and a latent space in a semantically consistent way.This allows learning a continuous latent policy with offline RL with better exploration feasibility and scalability and reconstructing it back to a needed hybrid policy.Critically,a novel offline RL optimization objective with adaptively adjusted constraints is designed to balance the alleviation and generalization of out-of-distribution actions.Our method demonstrates superior performance and generality across different tasks,particularly in typical realistic wargaming scenarios.
基金National Natural Science Foundation of China (Project No. 61273365)111 Project (No. B08004) are gratefully acknowledged
文摘Offline Urdu Nastaleeq text recognition has long been a serious problem due to its very cursive nature. In order to get rid of the character segmentation problems, many researchers are shifting focus towards segmentation free ligature based recognition approaches. Majority of the prevalent ligature based recognition systems heavily rely on hand-engineered feature extraction techniques. However, such techniques are more error prone and may often lead to a loss of useful information that might hardly be captured later by any manual features. Most of the prevalent Urdu Nastaleeq test recognition was trained and tested on small sets. This paper proposes the use of stacked denoising autoencoder for automatic feature extraction directly from raw pixel values of ligature images. Such deep learning networks have not been applied for the recognition of Urdu text thus far. Different stacked denoising autoencoders have been trained on 178573 ligatures with 3732 classes from un-degraded(noise free) UPTI(Urdu Printed Text Image) data set. Subsequently, trained networks are validated and tested on degraded versions of UPTI data set. The experimental results demonstrate accuracies in range of 93% to 96% which are better than the existing Urdu OCR systems for such large dataset of ligatures.
基金supported by the National Key R&D Program of China(No.2022ZD0116402)the National Natural Science Foundation of China(No.62106172).
文摘Offline reinforcement learning(RL)is a data-driven learning paradigm for sequential decision making.Mitigating the overestimation of values originating from out-of-distribution(OOD)states induced by the distribution shift between the learning policy and the previously-collected offline dataset lies at the core of offline RL.To tackle this problem,some methods underestimate the values of states given by learned dynamics models or state-action pairs with actions sampled from policies different from the behavior policy.However,since these generated states or state-action pairs are not guaranteed to be OOD,staying conservative on them may adversely affect the in-distribution ones.In this paper,we propose an OOD state-conservative offline RL method(OSCAR),which aims to address the limitation by explicitly generating reliable OOD states that are located near the manifold of the offline dataset,and then design a conservative policy evaluation approach that combines the vanilla Bellman error with a regularization term that only underestimates the values of these generated OOD states.In this way,we can prevent the value errors of OOD states from propagating to in-distribution states through value bootstrapping and policy improvement.We also theoretically prove that the proposed conservative policy evaluation approach guarantees to underestimate the values of OOD states.OSCAR along with several strong baselines is evaluated on the offline decision-making benchmarks D4RL and autonomous driving benchmark SMARTS.Experimental results show that OSCAR outperforms the baselines on a large portion of the benchmarks and attains the highest average return,substantially outperforming existing offline RL methods.