摘要
选用30个结构多样的CaM抑制剂分子作为数据集,采用多元线性回归(MLR)方法及主成分回归分析(PCA)方法对每个化合物的194个分子参数进行回归分析,分别建立了各自的最优预测模型.结果表明:多元线性回归分析方法所建模型与主成分回归所建模型相对比,发现逐步筛选法为最优建模方法.该方法所建模型统计结果良好(R2=0.952,SEE为0.289),应用于检验集时结果也比较令人满意(R2=0.941,SEP为0.295),模型表现出较强的可靠性和预测性.
In order to build a predictable mathematic model of calmodulin inhibitors and determine the key influence descriptors of calmodulin inhibitors, we built a dataset composed of 30 calmodulin inhibitors with diversiform structures, regressed the 194 molecular indices by multivariate linear regression and principal component regression analysis methods and finally got the best predictable mathematic models of their own. From the analysis of the model, stepwise regression analysis was found to be the optimal regression method compared with other multivariate linear regressions and principal component regression analysis. The model built by this method showed satisfactory statistical results ( R^2 = 0. 952, SEE is 0. 289), whose proper predictability was validated by the independent test set ( R^2 =0. 941, SEP is 0.295). The key descriptors were identified, which are valuable and helpful for further researching and development of new CaM inhibitor drugs.
出处
《分子科学学报》
CAS
CSCD
北大核心
2009年第3期168-173,共6页
Journal of Molecular Science
基金
大连理工大学青年教师培养基金资助项目(1000-893231)
大连理工大学博士科研启动基金资助项目(1000-893361)
国家自然科学基资助项目(10801025)
关键词
钙调蛋白抑制剂
分子参数
多元线性回归分析
主成分回归分析
calmodulin inhibitor
molecular indices
multivariate linear regression analysis
principal component regression analysis