The potential for being able to identify individuals at high disease risk solely based on genotype data has garnered significant interest.Although widely applied,traditional polygenic risk scoring methods fall short,a...The potential for being able to identify individuals at high disease risk solely based on genotype data has garnered significant interest.Although widely applied,traditional polygenic risk scoring methods fall short,as they are built on additive models that fail to capture the intricate associations among single nucleotide polymorphisms(SNPs).This presents a limitation,as genetic diseases often arise from complex interactions between multiple SNPs.To address this challenge,we developed DeepRisk,a biological knowledge-driven deep learning method for modeling these complex,nonlinear associations among SNPs,to provide a more effective method for scoring the risk of common diseases with genome-wide genotype data.Evaluations demonstrated that DeepRisk outperforms existing PRs-based methods in identifying individuals at high risk for four common diseases:Alzheimer's disease,inflammatory bowel disease,type 2diabetes,and breast cancer.展开更多
Offline reinforcement learning(RL)is a data-driven learning paradigm for sequential decision making.Mitigating the overestimation of values originating from out-of-distribution(OOD)states induced by the distribution s...Offline reinforcement learning(RL)is a data-driven learning paradigm for sequential decision making.Mitigating the overestimation of values originating from out-of-distribution(OOD)states induced by the distribution shift between the learning policy and the previously-collected offline dataset lies at the core of offline RL.To tackle this problem,some methods underestimate the values of states given by learned dynamics models or state-action pairs with actions sampled from policies different from the behavior policy.However,since these generated states or state-action pairs are not guaranteed to be OOD,staying conservative on them may adversely affect the in-distribution ones.In this paper,we propose an OOD state-conservative offline RL method(OSCAR),which aims to address the limitation by explicitly generating reliable OOD states that are located near the manifold of the offline dataset,and then design a conservative policy evaluation approach that combines the vanilla Bellman error with a regularization term that only underestimates the values of these generated OOD states.In this way,we can prevent the value errors of OOD states from propagating to in-distribution states through value bootstrapping and policy improvement.We also theoretically prove that the proposed conservative policy evaluation approach guarantees to underestimate the values of OOD states.OSCAR along with several strong baselines is evaluated on the offline decision-making benchmarks D4RL and autonomous driving benchmark SMARTS.Experimental results show that OSCAR outperforms the baselines on a large portion of the benchmarks and attains the highest average return,substantially outperforming existing offline RL methods.展开更多
Multi-agent reinforcement learning is difficult to apply in practice,partially because of the gap between simulated and real-world scenarios.One reason for the gap is that simulated systems always assume that agents c...Multi-agent reinforcement learning is difficult to apply in practice,partially because of the gap between simulated and real-world scenarios.One reason for the gap is that simulated systems always assume that agents can work normally all the time,while in practice,one or more agents may unexpectedly“crash”during the coordination process due to inevitable hardware or software failures.Such crashes destroy the cooperation among agents and lead to performance degradation.In this work,we present a formal conceptualization of a cooperative multi-agent reinforcement learning system with unexpected crashes.To enhance the robustness of the system to crashes,we propose a coach-assisted multi-agent reinforcement learning framework that introduces a virtual coach agent to adjust the crash rate during training.We have designed three coaching strategies(fixed crash rate,curriculum learning,and adaptive crash rate)and a re-sampling strategy for our coach agent.To our knowledge,this work is the first to study unexpected crashes in a multi-agent system.Extensive experiments on grid-world and StarCraft II micromanagement tasks demonstrate the efficacy of the adaptive strategy compared with the fixed crash rate strategy and curriculum learning strategy.The ablation study further illustrates the effectiveness of our re-sampling strategy.展开更多
基金the National Natural Science Foundation of China(62072376 and U1811262)Guangdong Provincial Basic and Applied Research Fund Project(2022A1515010144)+1 种基金Innovation Capability Support Program of Shaanxi(2022KJXX-75)the Fundamental Research Funds for the Central Universities(D5000230056).
文摘The potential for being able to identify individuals at high disease risk solely based on genotype data has garnered significant interest.Although widely applied,traditional polygenic risk scoring methods fall short,as they are built on additive models that fail to capture the intricate associations among single nucleotide polymorphisms(SNPs).This presents a limitation,as genetic diseases often arise from complex interactions between multiple SNPs.To address this challenge,we developed DeepRisk,a biological knowledge-driven deep learning method for modeling these complex,nonlinear associations among SNPs,to provide a more effective method for scoring the risk of common diseases with genome-wide genotype data.Evaluations demonstrated that DeepRisk outperforms existing PRs-based methods in identifying individuals at high risk for four common diseases:Alzheimer's disease,inflammatory bowel disease,type 2diabetes,and breast cancer.
基金supported by the National Key R&D Program of China(No.2022ZD0116402)the National Natural Science Foundation of China(No.62106172).
文摘Offline reinforcement learning(RL)is a data-driven learning paradigm for sequential decision making.Mitigating the overestimation of values originating from out-of-distribution(OOD)states induced by the distribution shift between the learning policy and the previously-collected offline dataset lies at the core of offline RL.To tackle this problem,some methods underestimate the values of states given by learned dynamics models or state-action pairs with actions sampled from policies different from the behavior policy.However,since these generated states or state-action pairs are not guaranteed to be OOD,staying conservative on them may adversely affect the in-distribution ones.In this paper,we propose an OOD state-conservative offline RL method(OSCAR),which aims to address the limitation by explicitly generating reliable OOD states that are located near the manifold of the offline dataset,and then design a conservative policy evaluation approach that combines the vanilla Bellman error with a regularization term that only underestimates the values of these generated OOD states.In this way,we can prevent the value errors of OOD states from propagating to in-distribution states through value bootstrapping and policy improvement.We also theoretically prove that the proposed conservative policy evaluation approach guarantees to underestimate the values of OOD states.OSCAR along with several strong baselines is evaluated on the offline decision-making benchmarks D4RL and autonomous driving benchmark SMARTS.Experimental results show that OSCAR outperforms the baselines on a large portion of the benchmarks and attains the highest average return,substantially outperforming existing offline RL methods.
基金Project supported by the National Natural Science Foundation of China(No.61836011)the Youth Innovation Promotion Association of the Chinese Academy of Sciences(No.2018497)the GPU cluster built by the MCC Lab of Information Science and Technology Institution,USTC,China。
文摘Multi-agent reinforcement learning is difficult to apply in practice,partially because of the gap between simulated and real-world scenarios.One reason for the gap is that simulated systems always assume that agents can work normally all the time,while in practice,one or more agents may unexpectedly“crash”during the coordination process due to inevitable hardware or software failures.Such crashes destroy the cooperation among agents and lead to performance degradation.In this work,we present a formal conceptualization of a cooperative multi-agent reinforcement learning system with unexpected crashes.To enhance the robustness of the system to crashes,we propose a coach-assisted multi-agent reinforcement learning framework that introduces a virtual coach agent to adjust the crash rate during training.We have designed three coaching strategies(fixed crash rate,curriculum learning,and adaptive crash rate)and a re-sampling strategy for our coach agent.To our knowledge,this work is the first to study unexpected crashes in a multi-agent system.Extensive experiments on grid-world and StarCraft II micromanagement tasks demonstrate the efficacy of the adaptive strategy compared with the fixed crash rate strategy and curriculum learning strategy.The ablation study further illustrates the effectiveness of our re-sampling strategy.