摘要
在基因表达数据中,有效的基因选择方法是癌症基因数据研究的重要内容。粗糙集是一个去掉冗余特征的有效工具。由于基因表达数据的连续性,为了避免运用粗糙集方法所必须的离散化过程带来的信息丢失,将相容粗糙集应用于基因的特征选择,提出基于相容粗糙集的基因特征选择方法,并在此方法基础上进一步对粗糙集的边界域进行研究,提出了基于相容粗糙集的改进的基因特征选择方法。在两个标准的基因表达数据上进行实验,结果表明与传统的基因特征选择方法相比,所提方法能够有效提高分类精度。
Gene selection is to select the most informative genes from the whole gene set,which is a key step of the discriminant analysis of microarray data. Rough set theory is an efficient mathematical tool for further reducing redundancy. The main limitation of traditional rough set theory is the lack of effective methods for dealing with real-valued data. However, gene expression data sets are always continuous. This has been addressed by employing discretization methods, which may result in information loss. This paper investigates one approach combining feature ranking together with features selection based on tolerance rough set theory. Moreover, this paper explores the other method which can utilize the information contained within the boundary region to improve classification accuracy in gene expression data. Compared with gene selection algorithm based on rough set theory, the proposed methods are more effective for selecting high discriminative genes in cancer classification task.
出处
《计算机科学》
CSCD
北大核心
2013年第06A期125-128,140,共5页
Computer Science
基金
上海高校青年教师培养资助计划(hdzf10008)
华东政法大学科研项目(11H2K034)资助
关键词
粗糙集
相容关系
基因特征选择
基因表达数据
癌症分类
Rough set theory, Tolerance relation, Gene selection, Gene expression data, Cancer classification