期刊文献+

K-S检验与mRMR相结合的基因选择算法 被引量:5

Gene selection algorithm based on K-S test and mRMR
下载PDF
导出
摘要 为了解决基因数据集的基因选择难题,提出一种基于K-S检验与最小冗余最大相关(minimum redundancy-maximum relevance,mRMR)原则的基因选择算法。该算法先采用K-S检验选择出具有一定区分能力的基因,然后对选择到的基因进行mRMR判断,保留与类别高度相关而其间相关性较小的基因构成最终被选基因子集。以SVM为分类器,以F1_measure、分类准确率和AUC为评价指标对该算法选择的基因子集进行评估,并将本算法与K-S检验、mRMR,以及经典的RELIEF和FAST算法进行比较。五个经典基因数据集上的平均实验结果表明:本算法的运行时间远低于mRMR算法,且其各项评价指标值优于其他比较算法。因此,提出的K-S检验与mRMR结合的基因选择算法能选择到非常有效的基因子集。 To deal with the challenging problem of selecting the distinguished genes in the gene expression datasets,this paper presented a gene subset selection algorithm based on K-S test and mRMR principles. The algorithm selected the distinguished genes in K-S test firstly,then it used the minimum redundancy-maximum relevance principle to select the genes from those selected by K-S test. It adopted SVM as the classification tool,and used the criteria of F1_measure,accuracy and AUC to evaluate the performance of the classifiers on the selected gene subsets. It compared the proposed gene subset selection algorithm with K-S,mRMR,RELIEF and FAST algorithms. The average experimental results of the aforementioned gene selection algorithms on 5 popular gene expression datasets demonstrate that the new K-S and mRMR based algorithm is significantly faster than mRMR,and the performance of it under the criteria of F1_measure,accuracy and AUC is better than those of K-S,mRMR,RELIEF and FAST. So,the proposed gene subset selection algorithm can find the excellent gene subset.
出处 《计算机应用研究》 CSCD 北大核心 2016年第4期1013-1018,1043,共7页 Application Research of Computers
基金 陕西省科技攻关项目(2013K12-03-24) 国家自然科学基金资助项目(31372250) 中央高校基本科研业务费专项资金项目(GK201503067)
关键词 基因选择 K-S检验 最小见余最大相关 支持向量机 F1_measure AUC RELIEF FAST gene selection K-S test mRMR SVM F1_measure AUC RELIEF FAST
  • 相关文献

参考文献33

  • 1Maruyama K, Yamaguchi-Shinozaki K, Shinozaki K. Gene expression profiling using DNA microarrays[J] . Methods in Molecular Biology, 2014, 1062:381-391.
  • 2Shah M, Marchand M, Corbeil J. Feature selection with conjunctions of decision stumps and learning from microarray data[J] . IEEE Trans on Pattern Analysis and Machine Intelligence, 2012, 34(1):174-186.
  • 3王树林,王戟,陈火旺,李树涛,张波云.肿瘤信息基因启发式宽度优先搜索算法研究[J].计算机学报,2008,31(4):636-649. 被引量:17
  • 4谢娟英,高红超.基于统计相关性与K-means的区分基因子集选择算法[J].软件学报,2014,25(9):2050-2075. 被引量:56
  • 5Wu Xindong, Zhu Xingquan, Wu Gongqing, et al. Data mining with big data[J] . IEEE Trans on Knowledge and Data Engineering, 2014, 26(1):97-107.
  • 6Wu Xindong, Yu Kui, Ding Wei, et al. Online feature selection with streaming features[J] . IEEE Trans on Pattern Analysis and Machine Intelligence, 2013, 35(5):1178-1192.
  • 7谢娟英,谢维信.基于特征子集区分度与支持向量机的特征选择算法[J].计算机学报,2014,37(8):1704-1718. 被引量:64
  • 8Xie Juanying, Wang C X. Using support vector machines with a novel hybrid feature selection method for diagnosis of erythemato-squamous diseases[J] . Expert Systems with Applications, 2011, 38(5):5809-5815.
  • 9谢娟英,王春霞,蒋帅,张琰.基于改进的F-score与支持向量机的特征选择方法[J].计算机应用,2010,30(4):993-996. 被引量:31
  • 10Golub T, Slonim D, Tamayo P, et al. Molecular classification of cancer:class discovery and class prediction by gene expression[J] . Science, 1999, 286(5439):531-537.

二级参考文献91

共引文献165

同被引文献26

引证文献5

二级引证文献48

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部