Acid-base dissociation constant(pK_(a)) is a key physicochemical parameter in chemical science, especially in organic synthesis and drug discovery. Current methodologies for pK_(a) prediction still suffer from limited...Acid-base dissociation constant(pK_(a)) is a key physicochemical parameter in chemical science, especially in organic synthesis and drug discovery. Current methodologies for pK_(a) prediction still suffer from limited applicability domain and lack of chemical insight. Here we present MF-SuP-pK_(a)(multi-fidelity modeling with subgraph pooling for pK_(a) prediction), a novel pK_(a) prediction model that utilizes subgraph pooling, multi-fidelity learning and data augmentation. In our model, a knowledgeaware subgraph pooling strategy was designed to capture the local and global environments around the ionization sites for micro-pK_(a) prediction. To overcome the scarcity of accurate pK_(a) data, lowfidelity data(computational pK_(a)) was used to fit the high-fidelity data(experimental pK_(a)) through transfer learning. The final MF-SuP-pK_(a) model was constructed by pre-training on the augmented ChEMBL data set and fine-tuning on the DataWarrior data set. Extensive evaluation on the DataWarrior data set and three benchmark data sets shows that MF-SuP-pK_(a) achieves superior performances to the state-of-theart pK_(a) prediction models while requires much less high-fidelity training data. Compared with Attentive FP, MF-SuP-pK_(a) achieves 23.83% and 20.12% improvement in terms of mean absolute error(MAE) on the acidic and basic sets, respectively.展开更多
基金financially supported by National Key Research and Development Program of China (2021YFF1201400)National Natural Science Foundation of China (22220102001)Natural Science Foundation of Zhejiang Province (LZ19H300001, LD22H300001, China)。
文摘Acid-base dissociation constant(pK_(a)) is a key physicochemical parameter in chemical science, especially in organic synthesis and drug discovery. Current methodologies for pK_(a) prediction still suffer from limited applicability domain and lack of chemical insight. Here we present MF-SuP-pK_(a)(multi-fidelity modeling with subgraph pooling for pK_(a) prediction), a novel pK_(a) prediction model that utilizes subgraph pooling, multi-fidelity learning and data augmentation. In our model, a knowledgeaware subgraph pooling strategy was designed to capture the local and global environments around the ionization sites for micro-pK_(a) prediction. To overcome the scarcity of accurate pK_(a) data, lowfidelity data(computational pK_(a)) was used to fit the high-fidelity data(experimental pK_(a)) through transfer learning. The final MF-SuP-pK_(a) model was constructed by pre-training on the augmented ChEMBL data set and fine-tuning on the DataWarrior data set. Extensive evaluation on the DataWarrior data set and three benchmark data sets shows that MF-SuP-pK_(a) achieves superior performances to the state-of-theart pK_(a) prediction models while requires much less high-fidelity training data. Compared with Attentive FP, MF-SuP-pK_(a) achieves 23.83% and 20.12% improvement in terms of mean absolute error(MAE) on the acidic and basic sets, respectively.