摘要
基于基因表达谱的特征基因提取方法已经成为当今研究肿瘤分子诊断的热点,但由于基因表达谱数据存在维数过高、样本量很小以及噪音很大等特点,使得肿瘤特征基因选择成为一件有挑战性的工作。提出了一种新的寻找特征基因的方法。首先基于区间间隔或覆盖比的方法来初步选出一些特征基因,而后删掉其中的冗余基因,达到以最少的基因数得到更高的分类准确率的目的。实验采用了3种肿瘤样本集来验证新算法的有效性。针对这3个样本集,只要2或3个特征基因就能得到100%的5-折交叉验证识别准确率。与其他肿瘤分类方法相比,显示了它的优越性。
Gene selection for cancer diagnosis method based on gene expression profile has become a hot topic in diagnosing cancer cells.However,the high dimensionality,small sample set and many noises of gene expression data make this task challenging.Thus,a novel gene selection method is provided.Firstly,use the ratio of interval gap or intersection cover to the whole span to select some discriminative genes,and then take use of an efficient procedure to cut off the redundancy genes in order to get higher accuracy and fewer genes.Finally,use three datasets to demonstrate the efficiency of the method.Using the 5-fold cross-validation method,only two or three genes can reach 100% accuracy in cancer classification.Compared with other cancer classification methods,it shows the competitive results.
出处
《计算机工程与应用》
CSCD
北大核心
2010年第7期218-220,共3页
Computer Engineering and Applications
关键词
基因表达谱
特征基因
肿瘤诊断
支持向量机
gene expression
feature gene
cancer diagnosis
support vector machine