Solubility has been widely regarded as a fundamental property of small molecule drugs and drug candidates,as it has a profound impact on the crystallization process.Solubility prediction,as an alternative to experimen...Solubility has been widely regarded as a fundamental property of small molecule drugs and drug candidates,as it has a profound impact on the crystallization process.Solubility prediction,as an alternative to experiments which can reduce waste and improve crystallization process efficiency,has attracted increasing attention.However,there are still many urgent challenges thus far.Herein we used seven descriptors based on understanding dissolution behavior to establish two solubility prediction models by machine learning algorithms.The solubility data of 120 active pharmaceutical ingredients(APIs)in ethanol were considered in the prediction models,which were constructed by random decision forests and artificial neural network with optimized data structure and model accuracy.Furthermore,a comparison with traditional prediction methods including the modified solubility equation and the quantitative structure-property relationships model was carried out.The highest accuracy shown by the testing set proves that the ML models have the best solubility prediction ability.Multiple linear regression and stepwise regression were used to further investigate the critical factor in determining solubility value.The results revealed that the API properties and the solute-solvent interaction both provide a nonnegligible contribution to the solubility value.展开更多
基金supported by the National Natural Science Foundation of China(Grant No.21938009).
文摘Solubility has been widely regarded as a fundamental property of small molecule drugs and drug candidates,as it has a profound impact on the crystallization process.Solubility prediction,as an alternative to experiments which can reduce waste and improve crystallization process efficiency,has attracted increasing attention.However,there are still many urgent challenges thus far.Herein we used seven descriptors based on understanding dissolution behavior to establish two solubility prediction models by machine learning algorithms.The solubility data of 120 active pharmaceutical ingredients(APIs)in ethanol were considered in the prediction models,which were constructed by random decision forests and artificial neural network with optimized data structure and model accuracy.Furthermore,a comparison with traditional prediction methods including the modified solubility equation and the quantitative structure-property relationships model was carried out.The highest accuracy shown by the testing set proves that the ML models have the best solubility prediction ability.Multiple linear regression and stepwise regression were used to further investigate the critical factor in determining solubility value.The results revealed that the API properties and the solute-solvent interaction both provide a nonnegligible contribution to the solubility value.