期刊文献+

基于Spark的肿瘤基因混合特征选择方法 被引量:3

Hybrid Feature Selection Method for Tumor Gene Based on Spark
下载PDF
导出
摘要 为处理随微阵列技术发展而急剧增长的肿瘤基因数据,实现对肿瘤基因数据的特征选择,结合集成特征选择和混合特征选择,提出一种Spark分布式计算框架的混合特征选择方法。利用F-score特征选择方法去除无关特征,进行初步特征选择,结合F-score、多分类支持向量机递归消除法、基于随机森林的特征选择3种方法得到最优的特征子集,并采用支持向量机对特征子集进行分类预测。实验结果表明,该方法能通过选择较少的基因达到较高的分类准确率。 In order to deal with the tumor gene data which grows rapidly with the development of microarray technology,and achieve the feature selection of tumor gene data,combined with integrated feature selection and mixed feature selection,a hybrid feature selection method of Spark distributed computing framework is proposed.The F-score feature selection method is used to remove the extraneous features,and the preliminary feature selection is carried out.The optimal feature subsets are obtained by integrating F-score,multi-class support vector machine recursive elimination method and random forest based feature selection,and the feature subset is classified and predicted by support vector machine.Experimental results show that this method can select fewer genes to achieve higher classification accuracy.
作者 汪丽丽 邓丽 余玥 费敏锐 WANG Lili;DENG Li;YU Yue;FEI Minrui(School of Mechatronics Engineering and Automation,Shanghai University,Shanghai 200072,China;Shanghai Key Laboratory of Power Station Automation Technology,Shanghai 200072,China)
出处 《计算机工程》 CAS CSCD 北大核心 2018年第11期1-6,共6页 Computer Engineering
基金 上海市科委重点项目(14DZ1206302)
关键词 肿瘤基因数据 Spark分布式计算框架 混合特征选择 集成特征选择 分类 tumor gene data Spark distributed computing framework hybrid feature selection integrated feature selection classification
  • 相关文献

参考文献4

二级参考文献26

  • 1王明春,王正欧,张楷,郝玺龙.一种基于CHI值特征选取的粗糙集文本分类规则抽取方法[J].计算机应用,2005,25(5):1026-1028. 被引量:8
  • 2毛勇,周晓波,夏铮,尹征,孙优贤.特征选择算法研究综述[J].模式识别与人工智能,2007,20(2):211-218. 被引量:93
  • 3Saeys Y, Inza I, Larran-aga P. A review of feature selection techniques in bioinformatics. Bioinformatics, 2007,23 (19): 2507-2517
  • 4Breiman L. Random forests. Machine Learning, 2001,45:5-32
  • 5Kononenko I, Robnik-Sikonja, Pompe U. ReliefF for estimation and discretization of atributes in classification, regression and ILP problems. In: Artificial Intelligence: Methodology Systems Application: Proceedings of AIMSA 96, 31-40. IOS Press, 1996
  • 6Diaz-Uriarte R, Alvarez de Andres S. Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 2006,7:3
  • 7Robnik-Sikonja M. Improving random forests. Machine Learning. ECML 2004 Proceedings. Berlin: Springer, 2004
  • 8Alon U, Rarkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide array. Proc Natl Acad Sci USA, 1999,96:6745-6750
  • 9Landi S, Moreno V, Gioia-Patricola L, Guino E, Bavarro M, De Oca J, Capella G, Canzian F, Bellvitge Colorectal Cancer Study Group. Association of common polymorphisms in inflammatory genes interleukin (IL) 6, IL8, tumor necrosis factor alpha, NFKB1, and peroxisome proliferators-activated receptor gamma with colorectal cancer. Cancer Res, 2003,63(13):3560-3566
  • 10Ogasawara M, Murata J, Ayukawa, Saiki I. Differential effect of intestinal neuropeptides on invasion and migration of colon carcinoma cells in vitro. Cancer Lett, 1997,119(1): 125-130

共引文献55

同被引文献29

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部