A 3-Dimension-Quantitative Structure-Activity Relationship</span></span><span><span><span style="font-family:""> (</span></span></span><span><spa...A 3-Dimension-Quantitative Structure-Activity Relationship</span></span><span><span><span style="font-family:""> (</span></span></span><span><span><span style="font-family:"">3D-QSAR</span></span></span><span><span><sup><span style="font-family:"">1</span></sup></span></span><span><span><span style="font-family:"">) </span></span></span><span><span><span style="font-family:"">approach is applied for the prediction of accurate chemical</span></span></span><span><span><span style="font-family:""> products made from biological activity and toxicity. Quantum chemical technique allows the construction of the molecular descriptors. The molecular quantum descriptors are classified into five principal component factors. Various linear <span>regression equations are obtained using the statistical technique. In this</span> study, the researchers propose the three best regression equations based on quantum molecular descriptors discussed earlier in this study. The observed EC50 vs calculated EC50 is plotted using the best fitting with the quantum descriptors.展开更多
Fenton oxidation is a promising water treatment method to degrade organic pollutants. In this study, 30 different organic compounds were selected and their reaction rate constants(k) were determined for the Fenton o...Fenton oxidation is a promising water treatment method to degrade organic pollutants. In this study, 30 different organic compounds were selected and their reaction rate constants(k) were determined for the Fenton oxidation process. Gaussian09 and Material Studio software sets were used to carry out calculations and obtain values of 10 different molecular descriptors for each studied compound. Ferric-oxyhydroxide coagulation experiments were conducted to determine the coagulation percentage. Based upon the adsorption capacity,all of the investigated organic compounds were divided into two groups(Group A and Group B). The percentage adsorption of organic compounds in Group A was less than 15%(wt./wt.)and that in the Group B was higher than 15%(wt./wt.). For Group A, removal of the compounds by oxidation was the dominant process while for Group B, removal by both oxidation and coagulation(as a synergistic process) took place. Results showed that the relationship between the rate constants(k values) and the molecular descriptors of Group A was more pronounced than for Group B compounds. For the oxidation-dominated process,EHOMOand Fukui indices(f(0)_x, f(-)_x, f(+)_x) were the most significant factors. The influence of bond order was more significant for the synergistic process of oxidation and coagulation than for the oxidation-dominated process. The influences of all other molecular descriptors on the synergistic process were weaker than on the oxidation-dominated process.展开更多
Using density functional theory, noncovalent interactions and two mechanisms of covalent functionalization of drug carmustine with functionalized carbon nanotube(CNT) have been investigated. Quantum molecular descri...Using density functional theory, noncovalent interactions and two mechanisms of covalent functionalization of drug carmustine with functionalized carbon nanotube(CNT) have been investigated. Quantum molecular descriptors of noncovalent configurations were studied. It was specified that binding of drug carmustine with functionalized CNT is thermodynamically suitable. NTCOOH and NTCOCl can bond to the NH group of carmustine through OH(COOH mechanism) and Cl(COCl mechanism) groups, respectively. The activation energies, activation enthalpies and activation Gibbs free energies of two pathways were calculated and compared with each other. The activation parameters related to COOH mechanism are higher than those related to COCl mechanism, and therefore COCl mechanism is suitable for covalent functionalization. COOH functionalized CNT(NTCOOH) has more binding energy than COCl functionalized CNT(NTCOCl) and can act as a favorable system for carmustine drug delivery within biological and chemical systems(noncovalent). These results could be generalized to other similar drugs.展开更多
A new method of quantitative structure retention relationship(QSRR) studies was reported for predicting gas chromatography(GC) relative retention times(RRTs) of chlorinated phenols (CPs) using a DB 5 column. Chemica...A new method of quantitative structure retention relationship(QSRR) studies was reported for predicting gas chromatography(GC) relative retention times(RRTs) of chlorinated phenols (CPs) using a DB 5 column. Chemical descriptors were calculated from the molecular structure of CPs and related to their gas chromatographic RRTs by using multiple linear regression analysis. The proposed model had a multiple square correlation coefficient R 2=0.970, standard error SE =0.0472, and significant level P =0.0000. The QSRR model also reveals that the gas chromatographic relative retention times of CPs are associated with physicochemical property interactions with the stationary phase,and influenced by the number of chlorine and oxygen in the CP melecules.展开更多
Direct application of bio-oil from fast pyrolysis as a fuel has remained a challenge due to its undesirable attributes such as low heating value,high viscosity,high corrosiveness and storage instability.Solvent additi...Direct application of bio-oil from fast pyrolysis as a fuel has remained a challenge due to its undesirable attributes such as low heating value,high viscosity,high corrosiveness and storage instability.Solvent addition is a simple method for circumventing these disadvantages to allow further processing and storage.In this work,computer-aided molecular design tools were developed to design optimal solvents to upgrade bio-oil whilst having low environmental impact.Firstly,target solvent requirements were translated into measurable physical properties.As different property prediction models consist different levels of structural information,molecular signature descriptor was used as a common platform to formulate the design problem.Because of the differences in the required structural information of different property prediction models,signatures of different heights were needed in formulating the design problem.Due to the combinatorial nature of higher-order signatures,the complexity of a computer-aided molecular design problem increases with the height of signatures.Thus,a multi-stage framework was developed by developing consistency rules that restrict the number of higher-order signatures.Finally,phase stability analysis was conducted to evaluate the stability of the solvent-oil blend.As a result,optimal solvents that improve the solvent-oil blend properties while displaying low environmental impact were identified.展开更多
A molecular vector-type descriptor containing 6 variables is used to describe the structure of aromatic hydrocarbons (AHs) and relate to normal boiling points (bp) of AHs. The col relation coefficient (R) between the ...A molecular vector-type descriptor containing 6 variables is used to describe the structure of aromatic hydrocarbons (AHs) and relate to normal boiling points (bp) of AHs. The col relation coefficient (R) between the estimated bp and experimental bp is 0.9988 and the root mean square error (RMS) is 7.907 degreesC for 66 AHs. The RMS obtained by cross-validation is 9.131 degreesC, which implies the relationship model having good prediction ability.展开更多
Structure-based virtual screening(molecular docking)is now one of the most pragmatic techniques to leverage target structure for ligand discovery.Accurate binding pose prediction is critical to molecular docking.Her...Structure-based virtual screening(molecular docking)is now one of the most pragmatic techniques to leverage target structure for ligand discovery.Accurate binding pose prediction is critical to molecular docking.Here,we describe a general strategy to improve the accuracy of docking pose prediction by implementing the structural descriptor-based fltering and KGS-penalty function-based conformational clustering in an unbiased manner.We assessed our method against 150 high-quality protein–ligand complex structures.Surprisingly,such simple components are suffcient to improve the accuracy of docking pose prediction.The success rate of predicting near-native docking pose increased from 53%of the targets to 78%.We expect that our strategy may have general usage in improving currently available molecular docking programs.展开更多
Co-crystal formation can improve the physicochemical properties of a compound,thus enhancing its druggability.Therefore,artificial intelligence-based co-crystal virtual screening in the early stage of drug development...Co-crystal formation can improve the physicochemical properties of a compound,thus enhancing its druggability.Therefore,artificial intelligence-based co-crystal virtual screening in the early stage of drug development has attracted extensive attention from researchers.However,the complexity of developing and applying algorithms hinders it wide application.This study presents a data-driven co-crystal prediction method based on the XGBoost machine learning model of the scikit-learn package.The simplified molecular input line entry specification(SMILES)information of two compounds is simply inputted to determine whether a co-crystal can be formed.The data set includs the co-crystal records presented in the Cambridge Structural Database(CSD)and the records of no co-crystal formation from extant literature and experiments.RDKit molecular descriptors are adopted as the features of a compound in the data set.The developed model shows excellent performance in the proposed co-crystal training and validation sets with high accuracy,sensitivity,and F1 score.The prediction success rate of the model exceeds 90%.The model therefore provides a simple and feasible scheme for designing and screening co-crystal drugs efficiently and accurately.展开更多
Chemical substances are essential in all aspects of human life,and understanding their properties is essential for developing chemical systems.The properties of chemical species can be accurately obtained by experimen...Chemical substances are essential in all aspects of human life,and understanding their properties is essential for developing chemical systems.The properties of chemical species can be accurately obtained by experiments or ab initio computational calculations;however,these are time-consuming and costly.In this work,machine learning models(ML)for estimating entropy,S,and constant pressure heat capacity,Cp,at 298.15 K,are developed for alkanes,alkenes,and alkynes.The training data for entropy and heat capacity are collected from the literature.Molecular descriptors generated using alvaDesc software are used as input features for the ML models.Support vector regression(SVR),v-support vector regression(v-SVR),and random forest regression(RFR)algorithms were trained with K-fold cross-validation on two levels.The first level assessed the models’performance,and the second level generated the final models.Between the three ML models chosen,SVR shows better performance on the test dataset.The SVR model was then compared against traditional Benson’s group additivity to illustrate the advantages of using the ML model.Finally,a sensitivity analysis is performed to find the most critical descriptors in the property estimations.展开更多
The enzyme leukotriene A4 (LTA4) plays an important role as precursor of slow reactive substances as LTC4, LTD4, and LTE4. It is an attractive target for molecular modeling and QSAR study. Our effort is mainly focused...The enzyme leukotriene A4 (LTA4) plays an important role as precursor of slow reactive substances as LTC4, LTD4, and LTE4. It is an attractive target for molecular modeling and QSAR study. Our effort is mainly focused on exploring the SAR for inhibitors of the LTA4 hydrolase through docking study, pharmacophore modeling and molecular descriptor study. The binding of these small molecules on LTA4 hydrolase enzyme was described by the models developed on 2D molecular descriptors, with good predictive power (39 compounds, 6 descriptors, r2 0.98, SEE 0.167, F-value 268.53, q2 0.90, r2adj 0.97, P-value < 0.0001, SD of residuals 0.15). Docking studies were employed to presume the probable binding conformation of these analogues and exploring the SAR for the compounds. The novel pharmacophore represents the ligand features that are involved in interactions with the target protein, as well as the space around the ligand occupied by the protein. The efforts are aimed to discover the SAR for the inhibitors of LTA4 hydrolase through techniques of QSAR, docking and pharmacophore.展开更多
Quantitative structure property relationship (QSPR) method is used to study the correlation models between the structures of a set of diverse organic compounds and their log P . Molecular descriptors calculated fr...Quantitative structure property relationship (QSPR) method is used to study the correlation models between the structures of a set of diverse organic compounds and their log P . Molecular descriptors calculated from structure alone are used to describe the molecular structures. A subset of the calculated descriptors, selected using forward stepwise regression, is used in the QSPR models development. Multiple linear regression (MLR) and radial basis function neural networks (RBFNNs) are utilized to construct the linear and non linear correlation model, respectively. The optimal QSPR model developed is based on a 7 17 1 RBFNNs architecture using seven calculated molecular descriptors. The root mean square errors in predictions for the training, predicting and overall data sets are 0.284, 0.327 and 0.291 log P units, respectively.展开更多
文摘A 3-Dimension-Quantitative Structure-Activity Relationship</span></span><span><span><span style="font-family:""> (</span></span></span><span><span><span style="font-family:"">3D-QSAR</span></span></span><span><span><sup><span style="font-family:"">1</span></sup></span></span><span><span><span style="font-family:"">) </span></span></span><span><span><span style="font-family:"">approach is applied for the prediction of accurate chemical</span></span></span><span><span><span style="font-family:""> products made from biological activity and toxicity. Quantum chemical technique allows the construction of the molecular descriptors. The molecular quantum descriptors are classified into five principal component factors. Various linear <span>regression equations are obtained using the statistical technique. In this</span> study, the researchers propose the three best regression equations based on quantum molecular descriptors discussed earlier in this study. The observed EC50 vs calculated EC50 is plotted using the best fitting with the quantum descriptors.
基金supported by the National Natural Science Funds of China (No. NSFC21177083)the Shanghai Municipal Commission of Economy and Informatization Project (No. CXY-2013-52)
文摘Fenton oxidation is a promising water treatment method to degrade organic pollutants. In this study, 30 different organic compounds were selected and their reaction rate constants(k) were determined for the Fenton oxidation process. Gaussian09 and Material Studio software sets were used to carry out calculations and obtain values of 10 different molecular descriptors for each studied compound. Ferric-oxyhydroxide coagulation experiments were conducted to determine the coagulation percentage. Based upon the adsorption capacity,all of the investigated organic compounds were divided into two groups(Group A and Group B). The percentage adsorption of organic compounds in Group A was less than 15%(wt./wt.)and that in the Group B was higher than 15%(wt./wt.). For Group A, removal of the compounds by oxidation was the dominant process while for Group B, removal by both oxidation and coagulation(as a synergistic process) took place. Results showed that the relationship between the rate constants(k values) and the molecular descriptors of Group A was more pronounced than for Group B compounds. For the oxidation-dominated process,EHOMOand Fukui indices(f(0)_x, f(-)_x, f(+)_x) were the most significant factors. The influence of bond order was more significant for the synergistic process of oxidation and coagulation than for the oxidation-dominated process. The influences of all other molecular descriptors on the synergistic process were weaker than on the oxidation-dominated process.
文摘Using density functional theory, noncovalent interactions and two mechanisms of covalent functionalization of drug carmustine with functionalized carbon nanotube(CNT) have been investigated. Quantum molecular descriptors of noncovalent configurations were studied. It was specified that binding of drug carmustine with functionalized CNT is thermodynamically suitable. NTCOOH and NTCOCl can bond to the NH group of carmustine through OH(COOH mechanism) and Cl(COCl mechanism) groups, respectively. The activation energies, activation enthalpies and activation Gibbs free energies of two pathways were calculated and compared with each other. The activation parameters related to COOH mechanism are higher than those related to COCl mechanism, and therefore COCl mechanism is suitable for covalent functionalization. COOH functionalized CNT(NTCOOH) has more binding energy than COCl functionalized CNT(NTCOCl) and can act as a favorable system for carmustine drug delivery within biological and chemical systems(noncovalent). These results could be generalized to other similar drugs.
文摘A new method of quantitative structure retention relationship(QSRR) studies was reported for predicting gas chromatography(GC) relative retention times(RRTs) of chlorinated phenols (CPs) using a DB 5 column. Chemical descriptors were calculated from the molecular structure of CPs and related to their gas chromatographic RRTs by using multiple linear regression analysis. The proposed model had a multiple square correlation coefficient R 2=0.970, standard error SE =0.0472, and significant level P =0.0000. The QSRR model also reveals that the gas chromatographic relative retention times of CPs are associated with physicochemical property interactions with the stationary phase,and influenced by the number of chlorine and oxygen in the CP melecules.
基金The authors would like to express sincere gratitude to Ministry of Higher Education Malaysia for the realization of this research project under the Grant FRGS/1/2019/TK02/UNIM/02/1However,only the authors are responsible for the opinion expressed in this paper and for any remaining errors.
文摘Direct application of bio-oil from fast pyrolysis as a fuel has remained a challenge due to its undesirable attributes such as low heating value,high viscosity,high corrosiveness and storage instability.Solvent addition is a simple method for circumventing these disadvantages to allow further processing and storage.In this work,computer-aided molecular design tools were developed to design optimal solvents to upgrade bio-oil whilst having low environmental impact.Firstly,target solvent requirements were translated into measurable physical properties.As different property prediction models consist different levels of structural information,molecular signature descriptor was used as a common platform to formulate the design problem.Because of the differences in the required structural information of different property prediction models,signatures of different heights were needed in formulating the design problem.Due to the combinatorial nature of higher-order signatures,the complexity of a computer-aided molecular design problem increases with the height of signatures.Thus,a multi-stage framework was developed by developing consistency rules that restrict the number of higher-order signatures.Finally,phase stability analysis was conducted to evaluate the stability of the solvent-oil blend.As a result,optimal solvents that improve the solvent-oil blend properties while displaying low environmental impact were identified.
文摘A molecular vector-type descriptor containing 6 variables is used to describe the structure of aromatic hydrocarbons (AHs) and relate to normal boiling points (bp) of AHs. The col relation coefficient (R) between the estimated bp and experimental bp is 0.9988 and the root mean square error (RMS) is 7.907 degreesC for 66 AHs. The RMS obtained by cross-validation is 9.131 degreesC, which implies the relationship model having good prediction ability.
文摘Structure-based virtual screening(molecular docking)is now one of the most pragmatic techniques to leverage target structure for ligand discovery.Accurate binding pose prediction is critical to molecular docking.Here,we describe a general strategy to improve the accuracy of docking pose prediction by implementing the structural descriptor-based fltering and KGS-penalty function-based conformational clustering in an unbiased manner.We assessed our method against 150 high-quality protein–ligand complex structures.Surprisingly,such simple components are suffcient to improve the accuracy of docking pose prediction.The success rate of predicting near-native docking pose increased from 53%of the targets to 78%.We expect that our strategy may have general usage in improving currently available molecular docking programs.
基金The authors acknowledge the National Natural Science Foundation of China(No.22278443)CAMS Innovation Fund for Medical Sciences(No.2022-I2M-1-015)+1 种基金the Key R&D Program of Shan Dong Province(No.2019JZZY020909)the Xinjiang Uygur Autonomous Region Innovation Environment Construction Special Fund and Technology Innovation Base Construction Key Laboratory Open Project(No.2022D04016)for the financial support.
文摘Co-crystal formation can improve the physicochemical properties of a compound,thus enhancing its druggability.Therefore,artificial intelligence-based co-crystal virtual screening in the early stage of drug development has attracted extensive attention from researchers.However,the complexity of developing and applying algorithms hinders it wide application.This study presents a data-driven co-crystal prediction method based on the XGBoost machine learning model of the scikit-learn package.The simplified molecular input line entry specification(SMILES)information of two compounds is simply inputted to determine whether a co-crystal can be formed.The data set includs the co-crystal records presented in the Cambridge Structural Database(CSD)and the records of no co-crystal formation from extant literature and experiments.RDKit molecular descriptors are adopted as the features of a compound in the data set.The developed model shows excellent performance in the proposed co-crystal training and validation sets with high accuracy,sensitivity,and F1 score.The prediction success rate of the model exceeds 90%.The model therefore provides a simple and feasible scheme for designing and screening co-crystal drugs efficiently and accurately.
基金This work was supported by King Abdullah University of Science and Technology(KAUST)Office of Sponsored Research under the award number OSR-2019-CRG7-4077the KAUST Clean Fuels Consortium(KCFC)and its member companies.
文摘Chemical substances are essential in all aspects of human life,and understanding their properties is essential for developing chemical systems.The properties of chemical species can be accurately obtained by experiments or ab initio computational calculations;however,these are time-consuming and costly.In this work,machine learning models(ML)for estimating entropy,S,and constant pressure heat capacity,Cp,at 298.15 K,are developed for alkanes,alkenes,and alkynes.The training data for entropy and heat capacity are collected from the literature.Molecular descriptors generated using alvaDesc software are used as input features for the ML models.Support vector regression(SVR),v-support vector regression(v-SVR),and random forest regression(RFR)algorithms were trained with K-fold cross-validation on two levels.The first level assessed the models’performance,and the second level generated the final models.Between the three ML models chosen,SVR shows better performance on the test dataset.The SVR model was then compared against traditional Benson’s group additivity to illustrate the advantages of using the ML model.Finally,a sensitivity analysis is performed to find the most critical descriptors in the property estimations.
基金Project supported by Department of Science and Technology,Govt.of India for Awarding Young Scientist Fellowship (SR/FT/LS-161/2008)
文摘The enzyme leukotriene A4 (LTA4) plays an important role as precursor of slow reactive substances as LTC4, LTD4, and LTE4. It is an attractive target for molecular modeling and QSAR study. Our effort is mainly focused on exploring the SAR for inhibitors of the LTA4 hydrolase through docking study, pharmacophore modeling and molecular descriptor study. The binding of these small molecules on LTA4 hydrolase enzyme was described by the models developed on 2D molecular descriptors, with good predictive power (39 compounds, 6 descriptors, r2 0.98, SEE 0.167, F-value 268.53, q2 0.90, r2adj 0.97, P-value < 0.0001, SD of residuals 0.15). Docking studies were employed to presume the probable binding conformation of these analogues and exploring the SAR for the compounds. The novel pharmacophore represents the ligand features that are involved in interactions with the target protein, as well as the space around the ligand occupied by the protein. The efforts are aimed to discover the SAR for the inhibitors of LTA4 hydrolase through techniques of QSAR, docking and pharmacophore.
文摘Quantitative structure property relationship (QSPR) method is used to study the correlation models between the structures of a set of diverse organic compounds and their log P . Molecular descriptors calculated from structure alone are used to describe the molecular structures. A subset of the calculated descriptors, selected using forward stepwise regression, is used in the QSPR models development. Multiple linear regression (MLR) and radial basis function neural networks (RBFNNs) are utilized to construct the linear and non linear correlation model, respectively. The optimal QSPR model developed is based on a 7 17 1 RBFNNs architecture using seven calculated molecular descriptors. The root mean square errors in predictions for the training, predicting and overall data sets are 0.284, 0.327 and 0.291 log P units, respectively.