Co-crystal formation can improve the physicochemical properties of a compound,thus enhancing its druggability.Therefore,artificial intelligence-based co-crystal virtual screening in the early stage of drug development...Co-crystal formation can improve the physicochemical properties of a compound,thus enhancing its druggability.Therefore,artificial intelligence-based co-crystal virtual screening in the early stage of drug development has attracted extensive attention from researchers.However,the complexity of developing and applying algorithms hinders it wide application.This study presents a data-driven co-crystal prediction method based on the XGBoost machine learning model of the scikit-learn package.The simplified molecular input line entry specification(SMILES)information of two compounds is simply inputted to determine whether a co-crystal can be formed.The data set includs the co-crystal records presented in the Cambridge Structural Database(CSD)and the records of no co-crystal formation from extant literature and experiments.RDKit molecular descriptors are adopted as the features of a compound in the data set.The developed model shows excellent performance in the proposed co-crystal training and validation sets with high accuracy,sensitivity,and F1 score.The prediction success rate of the model exceeds 90%.The model therefore provides a simple and feasible scheme for designing and screening co-crystal drugs efficiently and accurately.展开更多
基金The authors acknowledge the National Natural Science Foundation of China(No.22278443)CAMS Innovation Fund for Medical Sciences(No.2022-I2M-1-015)+1 种基金the Key R&D Program of Shan Dong Province(No.2019JZZY020909)the Xinjiang Uygur Autonomous Region Innovation Environment Construction Special Fund and Technology Innovation Base Construction Key Laboratory Open Project(No.2022D04016)for the financial support.
文摘Co-crystal formation can improve the physicochemical properties of a compound,thus enhancing its druggability.Therefore,artificial intelligence-based co-crystal virtual screening in the early stage of drug development has attracted extensive attention from researchers.However,the complexity of developing and applying algorithms hinders it wide application.This study presents a data-driven co-crystal prediction method based on the XGBoost machine learning model of the scikit-learn package.The simplified molecular input line entry specification(SMILES)information of two compounds is simply inputted to determine whether a co-crystal can be formed.The data set includs the co-crystal records presented in the Cambridge Structural Database(CSD)and the records of no co-crystal formation from extant literature and experiments.RDKit molecular descriptors are adopted as the features of a compound in the data set.The developed model shows excellent performance in the proposed co-crystal training and validation sets with high accuracy,sensitivity,and F1 score.The prediction success rate of the model exceeds 90%.The model therefore provides a simple and feasible scheme for designing and screening co-crystal drugs efficiently and accurately.