摘要
随着大量分子描述符应用于QSAR/QSPR,如何筛选出具有良好稳定性和预测能力的描述符集,成为亟待解决的一个瓶颈问题.将63个有机化合物的1664个描述符经过初步预选后,利用偏最小二乘(PLS)方法进行变量筛选,获得42个重要描述符;随机选择43个有机物,针对透聚乙烯膜性能进行训练研究,得优良估计能力和良好稳定性模型(A=6,r2=0.9647,RMSE=0.213,q2=0.8364,RMSV=0.467);对模型外部20个有机物进行预测,表明模型具有良好预测能力(rp 2=0.9306,RMSP=0.326).PLS变量筛选法可以快速有效地筛选与活性密切相关的重要描述符,进而构建具有良好稳定性和预测能力的QSAR模型.
Following the large number of descriptors used in QSAR/QSPR,it has become a bottleneck problem how to choose the descriptor set which can be used to develop a good stable and predictable model.In this work,the partial least squares(PLS) method was used to screen the important descriptors.The 42 molecular descriptors were selected from an original pool of 1664 descriptors of 63 organic compounds.The PLS regression model between 42 descriptors and the logarithm of the permeability coefficients of various organic compounds through low-density polyethylene was developed and validated by the variable selection and modeling based on prediction(VSMP) technique.It has been found that PLS regression model has good quality,r2=0.9647 and q2=0.8364 for the training set of 43 samples and =0.9306 for the test set of 20 compounds.Using PLS variable selection procedure,it is possible to rapidly and effectively select the im-portant variables closely related with the activity of compounds and construct a model with good stability and predictability.
出处
《化学学报》
SCIE
CAS
CSCD
北大核心
2011年第10期1232-1238,共7页
Acta Chimica Sinica
基金
国家水体污染控制与治理科技重大专项(No.2008ZX07421-001)资助项目