基于多目标EDA的特征基因选择

Gene selection with MOEDA

下载PDF

导出

摘要基因(特征)数远大于条件(样本)数,基因表达数据中往往存在大量噪声,并且生物学或医学工作者期望能从大量的基因中挑选出与疾病诊断有关的标志基因,因此,应用基因表达数据进行疾病分类预测的关键环节是基因选择。目前常用的方法有过滤法和缠绕法。结合过滤法和缠绕法的优点,提出基因选择的多目标分布估计算法(MOEDA)。首先通过打分函数确定MOEDA的候选基因集合,在确定候选基因后,MOEDA通过对KNN分类器的多个性能指标及基因数目等多个目标进行优化,从候选基因中选取综合区分能力最强的特征基因子集。儿童小圆蓝细胞肿瘤数据SRBCT上的实验结果表明,本方法在不需要设置复杂参数的情况下,从2000个基因中仅选取了7个基因,就使分类器在独立测试集上的分类精度达到95%。 The number of genes is usually much more than that of patient samples. Meanwhile, influenced by systematical error, technique limitation and so on, much noise exists in the gene expression data. Moreover, in the view of biological scholars, they want to find a small group of biomarker genes from the raw dataset, which could help them find the relationship between genes and cancers. Therefore, it is necessary to select optimal genes from the raw dataset in the prognosis and diagnosis of cancers. This paper integrated above two gene selection strategies and proposed MOEDA to select final optimal genes. First, a process filtered the raw dataset to reserve genes with high evaluation score. Taking accuracy, sensitivity and scale into account, MOEDA optimized these objectives for KNN and produce final optimal genes. None of complex parameter setting, the experiment on the dataset SRBCT gets 95% accuracy on the independent testing set with 7 genes selected from the 2 000 genes.

作者叶奇明罗飞刘娟

机构地区茂名学院理学院武汉大学计算机学院

出处《计算机应用研究》 CSCD 北大核心 2009年第8期2891-2894,共4页 Application Research of Computers

基金国家自然科学基金资助项目(60773010)

关键词分类预测基因选择多目标演化 classification gene selection multi-objective estimation of distribution algorithm（MOEDA）

分类号 TP301 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献9

1DEHK,ARGAWAL S,PRATAP A,et al.Afast and elist non-domi-nated sorting genetic algorithm for multi-objective optimization:NS-GA-Ⅱ[].IEEE Trans on Evol Comput.2002
2M HLENBEIN H,PAASS G.From recombination of genes to the es-timation of distributions I.binary parameters[].Proc of the th In-ternational Conference on Parallel Problem Solving from Nature.1996
3Duggan DJ,Bittner M,Chen Y,et al.Expression profiling using cDNA microarrays[].Nature Genetics.1999
4Khan J,Wei JS,Ringner M,et al.Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks[].Nature Medicine.2001
5Dudoit S,Fridlyand J,Speed T P.Comparison of discrimination methods for the classification of tumors using gene expression data[].Journal of the American Statistical Association.2002
6Breitling,R.,Armengaud,P.,Amtmann,A.,Herzyk,P.Rank products: A simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments[].FEBS Letters.2004
7Efron B,Tibshirani R,Storey JD,et al.Empirical Bayes analysis of a microarray experiment[].Journal of the American Statistical Association.2001
8T.Jirapech-Umpai,S.Aitken.Feature selection and classification for microarray data analysis:Evolutionary methods for identifying predictive genes[].BMC Bioinformatics.2005
9OOI C H,TAN P.Genetic algorithms applied to multi-class prediction for the analysis of gene expression data[].Bioinformatics.2003

1李志杰.演化计算在多目标优化问题中的应用[J].科技创业月刊,2011,24(16):136-137.
2周莹,刘云霞.一种求解多目标无约束0-1二次规划问题的文化基因算法[J].深圳信息职业技术学院学报,2014,12(3):1-7.
3文瑛,廖伟志.非支配解集的质量评价方法[J].广西师范学院学报（自然科学版）,2006,23(2):36-41. 被引量：1
4闵文文,梅端,代婷婷,胡光华.基于遗传算法SVM的基因表达谱数据分析[J].云南大学学报（自然科学版）,2013,35(4):441-446. 被引量：3
5张世芝,张明锦.基于SVM的嵌入式特征基因选择方法研究[J].计算机与应用化学,2016,33(1):85-88. 被引量：1
6刘可,巩敦卫.手势分割问题的多目标优化模型及其进化求解方法[J].控制与决策,2017,32(1):100-104. 被引量：3
7龚劬,许凯强.有监督的无参数核局部保持投影及人脸识别[J].计算机科学,2016,43(9):301-304. 被引量：3
8李小波,彭司华.多类别肿瘤分类的特征基因选择方法研究[J].复旦学报（自然科学版）,2014,53(3):305-312. 被引量：1
9吕江婷,陈少斌,黄宴委.基于主元分析与近邻距离的特征基因选择与去噪[J].福州大学学报（自然科学版）,2013,41(1):49-52. 被引量：1
10黄丹凤,祁云嵩,许姗娜.基于粗糙集和蚁群算法的特征基因选择方法[J].计算机技术与发展,2012,22(6):68-70. 被引量：5

计算机应用研究

2009年第8期

浏览历史

内容加载中请稍等...

基于多目标EDA的特征基因选择

参考文献9

相关作者

相关机构

相关主题

浏览历史