摘要
基于肿瘤基因表达数据,运用信息科学的方法和技术建立肿瘤的预测分类模型,对肿瘤的识别具有重要意义。在建立模型的过程中,如何能够有效地排除噪声基因进而挑选出分类特征基因对肿瘤预测的准确性有很大的影响。针对该类问题,提出了一种新的特征基因选取方法—CLUSTER_S2N法。该方法采取了"信噪比"指标与聚类相结合的方法来挑选特征基因,并分别以前列腺癌和急性白血病的基因表达谱为例,用支持向量机作为分类器进行了肿瘤的分类预测实验。实验结果表明该方法的可行性。
The problem on establishing tumor classfication, prediction models using methodology and technique of information science with gene expression data are dealt with. In the process of classification, selection of feature genes affects the outcome of classification greatly. To select feature genes, an approach is proposed, which uses the signal to noise ratio and cluster. A linear support vector machine is used as the classifier to classify samples, and the method is applied to prostate cancer dataset and human acute leukemias dataset as test case. The experiment resuits show the effectiveness and feasibility of the proposed method.
出处
《控制工程》
CSCD
2007年第4期373-375,379,共4页
Control Engineering of China
基金
国家自然科学基金重点资助项目(60234020)
关键词
特征基因
肿瘤分类
基因表达谱
feature genes
cancer classification
gene expression profiles