摘要
支持向量机(SVM)已经成为药物构效关系数据分析的一种常用统计方法,但其存在变量子集选择问题,且大量的冗余变量还可能影响SVM模型的预测精度,因此需要借助变量筛选来实现降维.本文提出了一种新的基于受试者工作特征曲线下面积(AUC)的支持向量机分类方法(AUC-SVM).首先,计算出变量的AUC值;其次,利用AUC值和前向选择算法选择最具信息量的变量子集,并剔除数据中无关和冗余的变量;最后,以AUC作为提取变量重要性的标准,通过真实的药物构效关系数据集来评估AUC-SVM性能,并与传统SVM方法相比较.实证结果表明,AUC-SVM算法能明显提高分类预测性能.
Support vector machine(SVM)has become a common statistical method for drug structure-activity relationship data analysis,but it has the problem of variable subset selection,and a large number of redundant variables may affect the prediction accuracy of SVM model,so it is necessary to reduce dimension with the help of variable screening.This paper proposes a new support vector machine classification method(AUC-SVM)based on the area under the subject working characteristic curve(AUC).Firstly,the AUC value of the variable is calculated;secondly,the variable subset with the most information is selected by using the AUC value and the forward selection algorithm,and the irrelevant and redundant variables in the data are eliminated;finally,taking AUC as the criterion for extracting the importance of variables,the AUC-SVM performance is evaluated through the real drug structure-activity relationship data set and compared with the traditional SVM method.The empirical results show that AUC-SVM algorithm can obviously improve the performance of classification prediction.
作者
刘伟平
黄晨浩
LIU Weiping;HUANG Chenhao(Library,Hunan City University,Yiyang,Hunan 413000,China;School of Mathematics and Computational Science,Xiangtan University,Xiangtan,Hunan 411105,China)
出处
《湖南城市学院学报(自然科学版)》
CAS
2023年第6期69-73,共5页
Journal of Hunan City University:Natural Science
基金
湖南省教育厅科研项目(20A086)。
关键词
构效关系
支持向量机
AUC
变量筛选
structure-activity relationship
support vector machine
AUC
variable screening