摘要
目的探讨基于平均影响值(MIV)的支持向量机(SVM)在遗传数据疾病分类预测和变量筛选中的应用,为遗传数据的疾病分类与特征提取方面提供方法学参考。方法以GAW18(genetic analysis workshop 18)数据为例,采用基于MIV的SVM建立预测模型,并和logistic回归模型、SVM、多层感知机和决策树分类模型进行比较分析,评价基于MIV的SVM预测分类和变量筛选效果。结果经过平均影响值的支持向量机算法处理后,六个SNPs位点(1328567172、3127394820、11658093、9123969834、1174996637、1717498492)组合的变量子集,获得78.125%的分类准确率,明显优于其他分类模型。结论基于MIV的SVM能比较有效的在实现遗传数据变量筛选的同时提高分类预测能力,避免了变量间的交互作用,为探索各种疾病发病机制和寻找易感SNPs位点提供线索,具有一定的研究和应用价值。
Objective The application of support vector machine(SVM)based on average impact value(MIV)in genetic data classification,prediction and variable selection is discussed to provide methodological reference for disease classification and feature extraction. Methods Taking GAW18 data as an example,a prediction model was built based on MIV SVM,and compared with the logistic regression model,the SVM,the MLP and the tree algorithms model,and MIV based SVM prediction classification and variable selection effect were evaluated. Results After the support vector machine algorithm with MIV is processed,the subset of six SNPs loci(13_28567172、3_127394820、1_1658093、9_123969834、1_174996637、17_17498492)is combined to get 78.125%,which is obviously better than that of other models. Conclusion The SVM based on MIV can be more effective in improving the classification prediction ability while implementing genetic data variables screening.It avoids the interaction between variables,and provides clues for exploring the pathogenesis of various diseases and finding vulnerable SNPs loci.It has research value and application value.
作者
张阳阳
曹红艳
武淑琴
Zhang Yangyang;Cao Hongyan;Wu Shuqin(Department of Health Statistics,Shanxi Medical University(030001),Taiyuan)
出处
《中国卫生统计》
CSCD
北大核心
2019年第3期344-347,共4页
Chinese Journal of Health Statistics
基金
山西省回国留学人员科研资助项目(2017-054)