摘要
在生物信息学领域,人工智能方法在预测药物分子的物理化学性质和生物活性中获得了重大成功,特别是神经网络已被广泛应用到药物研发中。但是浅层神经网络的预测精度低,深度神经网络又容易出现过拟合的问题,而模型融合策略有望提升机器学习中弱学习器的预测能力。据此,文中将模型融合方法首次应用到药物分子性质的预测中,通过对药物分子的化学结构进行信息化编码,采用平均法、堆叠法融合浅层神经网络,提高对药物分子pKa预测的能力。与深度学习方法相比,堆叠法(Stacking)融合的模型具有更高的预测准确性,其预测结果的相关系数达到0.86。通过将多个弱学习器的神经网络有机组合可使其达到深度神经网络的预测精度,同时保留更好的模型泛化能力。研究结果表明,模型融合方法可提高神经网络对药物分子pKa预测结果的准确性和可靠性。
Artificial intelligence(AI)methods have made great success in predicting chemical properties and bioactivity of drug molecules in the Bioinformatics field.Neural network gains wide applications in the process of drug discovery.However,the shallow neural network(SNN)gives lower accuracy while deep neural networks(DNN)are easy to be overfitting.Model ensembling is expected to further improve the predictive performance of weak learners in traditional machine learning methods.Therefore,it is the first time to apply model ensembling strategy to predict the properties of drug molecules.By encoding molecular structures,the combination strategies,averaging,and stacking methods are adopted to increase predicting accuracy of pKa of drug molecules.Compared with DNN,the stacking strategy presents the best predictive accuracy and the Pearson coefficient reaches to 0.86.Ensembling weak learners of the neural networks can reproduce the accuracy of DNN while keeping the satisfied generalization ability.The results show that ensembling method can increase the predictive accuracy and reliability.
作者
谢良旭
李峰
谢建平
许晓军
XIE Liang-xu;LI Feng;XIE Jian-ping;XU Xiao-jun(Institute of Bioinformatics and Medical Engineering,School of Electrical and Information Engineering,Jiangsu University of Technology,Changzhou,Jiangsu 213001,China;Jiangsu Sino-Israel Industrial Technology Research Institute,Changzhou,Jiangsu 213100,China;School of Electrical and Information Engineering,Jiangsu University of Technology,Changzhou,Jiangsu 213001,China;School of Science,Huzhou University,Huzhou,Zhejiang 313000,China)
出处
《计算机科学》
CSCD
北大核心
2021年第9期251-256,共6页
Computer Science
基金
国家自然科学基金(12074151,22003020)
江苏省自然科学基金(BK20191032)
常州市重点研发项目(CJ20200045)
江苏省中以产业技术研究院开放课题(JSIITRI202009)。