期刊文献+

基于模糊粗糙集的肿瘤分类特征基因选取 被引量:11

Feature Selection for Cancer Classification Based on Fuzzy Rough Sets
下载PDF
导出
摘要 依据基因表达谱有效建立肿瘤分类模型的关键在于,准确找出决定样本类别的一组特征基因。粗糙集理论作为一种新的软计算方法能够保持在原数据集的分类能力不变的基础上,对属性极大约简,从大量基因中找到对分类有效的基因。由于基因表达谱数据集的连续性,为了避免运用粗糙集方法所必需的离散化过程带来的信息丢失,尝试将模糊粗糙集应用于特征基因的选取,提出了基于互信息的模糊粗糙集属性约简算法,运用于基因表达谱数据集的基因选取。然后分别采用KNN和C5.0分类器进行特征基因分类性能进行检验。以急性白血病亚型(leukemia Microarray)和直肠癌(colon Microarray)分类特征基因选取为例进行实验,结果表明了上述方法的可行性和有效性。 Feature selection is an essential step to perform cancer classification with DNA microarrays,for there are a large number of genes from which to predict classes and a relatively small number of samples. Rough set theory is a tool for reducing redundancy in information systems, thus successful application of rough set to gene selection is of great si- gnificance. Fuzzy rough set was introduced to avoid losing information caused by discretization of continuous gene expression data which is needed in rough set theory. A novel gene selection method called IMIBAFRAR was improved to reduce the computation of mutual infor-mation. Then KNN and C5.0 were applied to validate the classification perfor- mance of the genes selected for distinguishing different tissue type. The work was applied to two public gene expression datasets:leukemia and colon. Experimental results show the selected genes don't reflect the classification ability of the original genes. Compared with the unreduced genes and the genes selected by classical rough set method, our method leads to significantly improved recognition accuracy. Meanwhile, computational complexity is reduced.
出处 《计算机科学》 CSCD 北大核心 2009年第3期196-200,共5页 Computer Science
基金 国家自然科学基金项目(60475019) 国家自然科学基金重点项目(60534060) 国家重点基础研究发展计划(973计划)(2003CB316902) 2006年博士学科点专项科研基金(20060247039)资助
关键词 基因表达谱数据集 特征选取 粗糙集 模糊粗糙集 互信息 Gene expression data, Feature selection, Rough sets, Fuzzy rough sets, Mutual information
  • 相关文献

参考文献18

  • 1Lander E S. Array of hope. Nature Genetics, 1999,21 (Suppl) :3- 4
  • 2Ramaswamy S, Gloub T R. DNA microarrays in clinical oncology. Journal of Clinical Ontology,2002,20(7) :1932-1941
  • 3Derisi J, Penland L, Brown P O, et al. Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nature Genetics, 1996,14(4) :457-460
  • 4Gloub T R, Slonim D K, Tamayo P, et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 1999,286 (5439) : 531-537
  • 5Khan J, Wei J S, Ringner M, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 2001,7(6) : 673-679
  • 6Guyon I, Weston J, Barnhill S, et al. Gene selection for cancer classification using support vector machines. Machine Learning, 2000,46(13) :389-422
  • 7Tibshirani R, Hastie T, Narasimhan B, et al. Diagnosis of multiple cancer types by shrunken centroids of gene expression//Proceedings of the National Academy of Science. 2002, 99 (10) : 6567-6572
  • 8Pawlak Z. Rough sets. International Journal of Information and Computer Science, 1982,11 :341-356
  • 9Baxevanis A D, Ouellette B F F. Bioinformaties-A Practical Guide to the Analysis of Genes and Proteins. Tsinghua University Press, 2000
  • 10Li Dingfang, Zhang Wen. Gene selection using rough set theory //Rough Sets and Knowledge Technology 2006 (RSKT 2006). Lecture Notes in Artificial Intelligence. Chongqing, 2006,4062: 778-785

二级参考文献28

共引文献832

同被引文献80

引证文献11

二级引证文献36

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部