期刊文献+

基因表达数据中加权SAM法的基因选择和分类预测研究 被引量:2

Gene selection and classification prediction of weighted SAM method in gene expression data
原文传递
导出
摘要 目的使用高斯核函数和欧式距离函数改进微阵列显著分析法(significance analysis of microarray,SAM)得到MSAM1法(modified significance analysis of microarray-1,MSAM1)和MSAM2法(modified significance analysis of microarray-2,MSAM2),与SAM法、Relief法、支持向量机递归特征消除法(support vector machine recursive feature elimination, SVM-RFE)进行对比,评价在基因表达数据中MSAM1法、MSAM2法的基因选择和分类预测能力。方法从Bioconductor中的golubEsets包获得leukemia数据集(Golub等人给出了该数据集所包含的50个差异基因),运用R软件实现5种算法,分别用正确率和ROC曲线下面积即AUC值评价基因选择能力和分类预测能力,用Kruskal-Wallis H检验比较5种方法的正确率和AUC值的组间差异,进一步的两两比较采用SNK-q检验。结果正确率和AUC值均表现为MSAM1和MSAM2最优,SAM和SVM-RFE法次之,Relief法排在最后;5种方法的组间差异有统计学意义(H=150.333,P<0.0001和H=293.2579,P<0.0001),两两比较结果显示虽然MSAM1和MSAM2之间差异无统计学意义(P>0.05),但两种方法与其他3种方法之间差异均有统计学意义(P<0.05)。结论用高斯核函数和欧式距离函数改进的加权SAM法提高了SAM法的基因选择和分类预测能力,在实际基因表达数据的应用中可以得到更为稳定的分析结果。 Objective The modified significance analysis of microarray-1(MSAM1) method and the modified significance analysis of microarray-2(MSAM2) method are obtained by using the Gaussian kernel function and the Euclidean distance function to improve the significance analysis of microarray(SAM) method, respectively. The original SAM method, the support vector machine recursive feature elimination(SVM-RFE) method, and the Relief method were compared to evaluate the gene selection and classification prediction ability of the MSAM1 method and the MSAM2 method in gene expression data. Methods The leukemia data set was obtained from the golubEsets package in Bioconductor(Golub, et al. gave 50 differential genes contained in the data set). Five kinds of gene selection methods were implemented using R software. The gene selection ability and classification prediction capability were evaluated by the correct rate and the area under the ROC curve, namely, the AUC value. Kruskal-Wallis H test was used to compare the between-group differences in the correct rate and AUC value among the five methods, and SNK-q test was used for further pairwise comparison. Results Both the correct rate and the AUC value were optimal for MSAM1 and MSAM2, followed by the SAM and SVM-RFE methods, and the Relief method was ranked last. The between-group differences among the five methods were statistically significant(H=150.333, P<0.0001;H=293.2579, P<0.0001). The results of the pairwise comparison showed that there was no statistically significant difference between MSAM1 and MSAM2(P>0.05), but the differences between the above-mentioned two methods and the other three methods were statistically significant(P<0.05). Conclusions The weighted SAM method modified by Gaussian kernel function and Euclidean distance function improves the gene selection and classification prediction ability of SAM method, and can obtain more stable analysis results in the application of actual gene expression data.
作者 任雨冬 陆震 李婧惟 刘艳 REN Yu-dong;LU Zhen;LI Jing-wei;LIU Yan(Department of Health Statistics,Harbin Medical University,Harbin,Heilongjiang 150081,China)
出处 《实用预防医学》 CAS 2020年第12期1537-1540,共4页 Practical Preventive Medicine
基金 黑龙江省自然科学基金(LH2019H005)。
关键词 SAM 基因表达数据 基因选择 分类预测 significance analysis of microarray gene expression data gene selection classification prediction
  • 相关文献

参考文献4

二级参考文献107

共引文献3093

同被引文献24

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部