摘要
提出了一种基于两轮遗传算法的用于结肠癌微阵列数据基因选择与样本分类的新方法。该方法先根据基因的Bhattacharyya距离指标过滤大部分与分类不相关的基因,而后使用结合了遗传算法和CFS(Correlation-based Feature Selection)的GA/CFS方法选择优秀基因子集,并存档记录这些子集。根据存档子集中基因被选择的频率选择进一步搜索的候选子集,最后以结合了遗传算法和SVM的GA/SVM从候选基因子集中选择分类特征子集。把这种GA/CFS-GA/SVM方法应用到结肠癌微阵列数据,实验结果及与文献的比较表明了该方法效果良好。
We describe a novel approach for gene selection and cancer classification of microarray data,which combines Support Vector Machines (SVM),Correlation-based Feature Selection(CFS) and Genetic Algorithms(GA).First,the Bhattacharyya distance of each gene is used as the criterion for filtering the irrelevant genes for classification.Then GA combined with CFS is adopted to find informative gene subsets.Finally,using archive records of these subsets,the 50 most frequently selected genes are defined as a candidate subset through which the GA is used to evolve gene subsets whose fitness is evaluated by a SVM classifier.Our method is assessed on the colon dataset and is able to select small subsets and still improve classification accuracy.
出处
《计算机工程与应用》
CSCD
北大核心
2007年第18期242-245,共4页
Computer Engineering and Applications
关键词
遗传算法
支持向量机
CFS
基因表达谱
Genetic Algorithms (GA)
Support Vector Machines (SVM)
Correlation-based Feature Selection (CFS)
gene expression profiles