Artificial intelligence(AI)models usually require large amounts of high-quality training data,which is in striking contrast to the situation of small and biased data faced by current drug discovery pipelines.The conce...Artificial intelligence(AI)models usually require large amounts of high-quality training data,which is in striking contrast to the situation of small and biased data faced by current drug discovery pipelines.The concept of federated learning has been proposed to utilize distributed data from different sources without leaking sensitive information of the data.This emerging decentralized machine learning paradigm is expected to dramatically improve the success rate of AI-powered drug discovery.Here,we simulated the federated learning process with different property and activity datasets from different sources,among which overlapping molecules with high or low biases exist in the recorded values.Beyond the benefit of gaining more data,we also demonstrated that federated training has a regularization effect superior to centralized training on the pooled datasets with high biases.Moreover,different network architectures for clients and aggregation algorithms for coordinators have been compared on the performance of federated learning,where personalized federated learning shows promising results.Our work demonstrates the applicability of federated learning in predicting drug-related properties and highlights its promising role in addressing the small and biased data dilemma in drug discovery.展开更多
Dear Editor,Recent achievements in large-scale pre-trained models like GPT-3 and PanGu-α have demonstrated astounding performances in many downstream tasks of natural language processing (NLP),confirming AI to be use...Dear Editor,Recent achievements in large-scale pre-trained models like GPT-3 and PanGu-α have demonstrated astounding performances in many downstream tasks of natural language processing (NLP),confirming AI to be user-oriented for even industrial applications.Deep learning has been recognized as the most promising technology for pharmaceuticals,a powerful molecule pre-trained model that could economize researchers’tons of time.For the strategic application of AI capabilities to the drug discovery field,we pre-trained a model called PanGu Drug Model with 1.7 billion small molecules from ZINC20 (Irwin et al.,2020),DrugSpaceX(Yang et al.,2021),and UniChem (Chambers et al.,2013).展开更多
基金supported by the Shanghai Municipal Science and Technology Major Projectthe National Natural Science Foundation of China(81773634)+1 种基金the National Science and Technology Major Project of the Ministry of Science and Technology of China(2018ZX09711002)the Strategic Priority Research Program of the Chinese Academy of Sciences(XDA12050201 and XDA12020368)。
文摘Artificial intelligence(AI)models usually require large amounts of high-quality training data,which is in striking contrast to the situation of small and biased data faced by current drug discovery pipelines.The concept of federated learning has been proposed to utilize distributed data from different sources without leaking sensitive information of the data.This emerging decentralized machine learning paradigm is expected to dramatically improve the success rate of AI-powered drug discovery.Here,we simulated the federated learning process with different property and activity datasets from different sources,among which overlapping molecules with high or low biases exist in the recorded values.Beyond the benefit of gaining more data,we also demonstrated that federated training has a regularization effect superior to centralized training on the pooled datasets with high biases.Moreover,different network architectures for clients and aggregation algorithms for coordinators have been compared on the performance of federated learning,where personalized federated learning shows promising results.Our work demonstrates the applicability of federated learning in predicting drug-related properties and highlights its promising role in addressing the small and biased data dilemma in drug discovery.
文摘Dear Editor,Recent achievements in large-scale pre-trained models like GPT-3 and PanGu-α have demonstrated astounding performances in many downstream tasks of natural language processing (NLP),confirming AI to be user-oriented for even industrial applications.Deep learning has been recognized as the most promising technology for pharmaceuticals,a powerful molecule pre-trained model that could economize researchers’tons of time.For the strategic application of AI capabilities to the drug discovery field,we pre-trained a model called PanGu Drug Model with 1.7 billion small molecules from ZINC20 (Irwin et al.,2020),DrugSpaceX(Yang et al.,2021),and UniChem (Chambers et al.,2013).