摘要
统计学t检验结合引入的变量筛选方法——遗传算法-偏最小二乘法(GAPLS)对卵巢癌SELDI-TOF MS数据进行特征筛选,从15154个原始变量中筛选得到4个特征质荷比值,采用支持向量机(SVM)模型的留一法交叉验证结果为95.26%.结果表明这4个质荷比值具有重要的生物学意义,它们或许可以作为卵巢癌的生物标记物,同时GAPLS可以作为一种有效的蛋白质组数据的特征筛选方法.
Statistics method of two-side t-test combined with a new feature selection method, genetic algorithm-partial least squares algorithm, are used in this paper for the feature extraction for SELDI-TOF MS ovarian cancer data. 4 m/z values are obtained from the original 15154 m/z values and the support vector machines (SVM) classifier works well based on these 4 m/z values. Both 3-fold cross validation and leave-one-out cross validation are used for checking the pattern's stability. The result of leave-one-out cross validation is 95.26 %. The results indicated that genetic algorithm-partial least squares algorithm is an efficient feature extraction method for proteomics data and potential ovarian cancer biomarkers may exist in the 4 m/z values selected in this paper.
出处
《四川大学学报(自然科学版)》
CAS
CSCD
北大核心
2007年第4期867-872,共6页
Journal of Sichuan University(Natural Science Edition)
基金
国家自然科学基金(29877016)
关键词
特征筛选
遗传算法-偏最小二乘法
支持向量机
卵巢癌
蛋白质组
feature selection, genetic algorithm-partial least squares algorithm, support vector machines, ovarian cancer, proteomics