期刊文献+

两种基于树结构的基因选择算法 被引量:2

Two Novel Tree Structure-based Methods for Gene Selection
下载PDF
导出
摘要 癌症诊断是生物信息学领域的重要课题,其中从基因表达数据中选择与癌症相关的基因子集是癌症诊断的关键。随机森林是近年来很热门的算法,它能够评估分类中特征的重要性(该方法简称为PBM)。受此启发,提出了两种基于树结构的基因选择方法 FBM和ABM,分别以树结构中特征出现的频率和重要性打分的平均值作为属性重要性的指标。数值实验中,使用提出的方法选取特征子集,并建立随机森林分类器,通过AUC结果评估基因选择的优劣。实验结果表明,当PBM的AUC值不低于0.900时,其在Leukemia数据集上至少需要26个基因,在Colon Cancer数据集上至少需要48个基因。而在仅选取前10个基因时,FBM和ABM在Leukemia数据集的AUC值均达到0.989,在Colon Cancer数据集的AUC值达到0.900。此外,与其它典型的基因选择方法 mRMR和ECRP等相比,提出的方法也有较高的精度,这对癌症的精确诊断和及早治疗具有重要的现实意义。 Cancer diagnosis is one of the most significant topics in bioinformatics.For the microarray datasets,selecting a small subset of genes from thousands of genes(named gene selection)is helpful for accurate identification and treatment of cancerous tumors.Motivated by the instinct of random forests measuring variable importance(named‘PBM'),we proposed two novel methods based on the tree structures for gene selection,namely FBM and ABM.They respectively make use of gene frequency and average scores yielded by agreat number of decision trees,which are constructed on the microarray datasets.In computational experiments,the optimal gene subsets are determined by three methods,and random-forest classifiers are built on subsets to evaluate the performance of gene selection methods.AUC scores of PBM are greater than 0.900 when selecting 26 genes for leukemia dataset and 48 genes for colon cancer dataset,while the classifiers with FBM and ABM can achieve the AUC score of 0.989 for leukemia dataset and AUC score of 0.900 for colon cancer dataset respectively with top ten genes selected.In addition,the proposed methods have better performance than the developed methods(such as mRMR and ECRP),which play the critical roles in the accurate diagnosis and treatment of cancer.
出处 《计算机科学》 CSCD 北大核心 2015年第7期250-253,共4页 Computer Science
基金 国家自然科学基金(61271337 61103126) 教育部博士点基金(20100141120049) 湖北省自然科学基金(2011CDB454) 深圳市战略新兴产业发展专项资金项目(JCYJ20130401160028781)资助
关键词 分类 基因选择 随机森林 Classification Gene selection Random forests
  • 相关文献

参考文献21

  • 1Xing E P,Jordan M I,Karp R M. Feature selection for high-di- mensional genomic mieroarray data [C] // Proceedings of the 15th International Conference on Machine Learning. 2001:601-608.
  • 2Andrew Y N. On feature selection: learning with exponentially many irrelevant features as training examples[C]//Proceedings of the 15th International Conference on Machine Learning. 1998:404-412.
  • 3Bhattacharjee A, Richards W G, Staunton J, et al. Classification of human lung carcinomas by mRNA expression profiling re- veals distinct adenocarcinoma subclasses [J]. Proceedings of the National Academy of Sciences of the United States of America, 2001,98(24) : 13790-13795.
  • 4Golub T R, Slonim D K, Tamayo P, et al. Molecular classifica- tion of cancer, class discovery and class prediction by gene ex- pression monitoring[J]. Science, 1999,286 (5439) : 531-537.
  • 5Faivishevsky L, Goldberger J. Unsupervised feature selection based on non-parametric mutual information [C]//2012 IEEE International Workshop on Machine Learning for Signal Pro- ceeding (MLSP). IEEE, 2012,1-6.
  • 6冶晓隆,兰巨龙,郭通.基于PCA和禁忌搜索的网络流量特征选择算法[J].计算机科学,2014,41(1):187-191. 被引量:5
  • 7Zhu Qiu-sha, Lin Lin, Shyu Mei-ling, el al. Feature Selection U- sing Correlation and Reliability Based Scoring Metric for Video Semantic Detecti0n[C]//IEEE Fourth International Conference on Semantic Computing. 2010:462-469.
  • 8Ogura H, Amano H, Kondo M. Comparison of metrics for fea- ture selection in imbalanced text classification [J]. Expert Sys- tems with Applications, 2011,38 (5) : 4978-4989.
  • 9Saeys Y, Inza I, Larranaga P. A review of feature selection techni- ques in bioinformatics[J]. Bioinformatics, 2007,23(19) : 2507-2517.
  • 10Amiri F, Yousefi M R, Lucas C, et al. Mutual information-based feature selection for intrusion detection systems [J]. Journal of Network and Computer Applications,2011,34(4) : 1184-1199.

二级参考文献132

  • 1李颖新,阮晓钢.基于支持向量机的肿瘤分类特征基因选取[J].计算机研究与发展,2005,42(10):1796-1801. 被引量:51
  • 2李颖新,李建更,阮晓钢.肿瘤基因表达谱分类特征基因选取问题及分析方法研究[J].计算机学报,2006,29(2):324-330. 被引量:45
  • 3李建中,杨昆,高宏,骆吉洲,郭政.考虑样本不平衡的模型无关的基因选择方法[J].软件学报,2006,17(7):1485-1493. 被引量:24
  • 4Southern E M. DNA chips: analyzing sequence by hybridization to oligonucleotides on a large scale [J]. Trends in Genetics, 1996,12(3) : 110-115.
  • 5Hacia J H, Brody L C, Chee M S, et al. Detection of heterozy gous mutations in BRCA1 using high density oligonucleotide ar rays and two-colour fluorescence analysis[J]. Nature genetics 1996,14 (4) : 441-447.
  • 6Wang D G, et al. Large-scale Identification, Mapping, and Geno typing of Single-nucleotide Polymorphisms in the Human Genome[J]. Science, 1998,280(5366):1077-1082.
  • 7Golub T R, Slonim D K, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring[J]. Science, 1999,286 (5439) : 531-537.
  • 8Alon U,Barkai N,Notterman D A, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by Oligonueleotide array[J]. Proceedings of the National Academy of Sciences, 1999,96 (12) : 6745-6750.
  • 9Alizadeh A A,Eisen M B,Davis R E,et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling[J]. Nature,2000,403(6769) : 503-511.
  • 10Pomeroy S L, Tamayo P, Gaasenbeek M, et al. Prediction of central nervous system embryonal tumour outcome based on gene expression[J]. Nature, 2002,415 (6870) : 436-442.

共引文献24

同被引文献17

引证文献2

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部