期刊文献+

基于相关性分析的基因选择算法 被引量:4

Gene selection algorithm based on correlation analysis
下载PDF
导出
摘要 常用的排列法从微阵列数据中选择的基因集合会包含相关性较高的基因,这会影响分类器的性能,为了去除这些冗余基因(特征),提出了无监督的特征选择算法.该算法主要包含:将原始特征集划分为一组相似的子集(聚类);从每个聚类中选择代表性特征.特征的划分采用特征间的相关性作为测度以k近邻原则来完成.该算法无需指定聚类数量,时间复杂度低.真实的生物学数据实验证明该算法可显著提高分类器的分类准确性. Gene sets of interest typically selected by usual ranking methods from microarray data will contain many highly correlated genes, which will degrade the performance of classifiers. To filter these redundant genes (features), an unsupervised feature selection algorithm is proposed. The task of the algorithm involves partitioning the original feature set into a number of homogeneous subsets (clusters) and selecting a representative feature from each such cluster. Partitioning of the features is done based on κ-NN (κ nearest neighbor) principles using pairwise feature correlation measures. This method does not need to specify the optimal number of clusters in advance and has less computational complexity. Real biological data experiments show that this algorithm significantly increases the classification accuracy of existing classifiers.
出处 《浙江大学学报(工学版)》 EI CAS CSCD 北大核心 2004年第10期1289-1292,共4页 Journal of Zhejiang University:Engineering Science
关键词 微阵列 基因选择 相关性分析 无监督学习 Biocommunications Correlation methods DNA sequences Genes Learning algorithms Mathematical models
  • 相关文献

参考文献12

  • 1OOI C H,TAN P. Genetic algorithms applied to multiclass prediction for the analysis of gene expression data[J]. Bioinformatics, 2003, 19 (1): 37-44.
  • 2LEE K E, SHA N, DOUGHERTY E R, et al. Gene selection: A bayesian variable selection approach [J].Bioinformatics, 2003, 19 (1): 90 - 97.
  • 3DEUTSCH J M. Evolutionary algorithms for finding optimal gene sets in microarray prediction [J].Bioinformatics, 2003, 19(1): 45 - 52.
  • 4BEN-DOR A, FRIEDMAN N, YAKHINI Z. Scoring genes for relevance [R]. Palo Alto, USA: Agilent Laboratories, 2000.
  • 5GOLUB T R, SLONIM D K, TAMAYO P, et al.Molecularclassification of cancer: class discovery and class prediction by gene expression monitoring [J].Science, 1999, 286(5439) : 531 - 537.
  • 6RAMASWAMY S, TAMAYO P, RIFKIN R, et al.Multiclass cancer diagnosis using tumor gene expression signatures [J]. Proceedings of the National Academy of Sciences, 2001, 98(26): 15149 - 15154.
  • 7XING E, JORDAN M, KARP R. Feature selection for high-dimensional genomic microarray data [A]. Proceedings of the Eighteenth Internatlonal Conference on Machine Learning [C]. Massachusetts, USA: Morgan Kaufmann, 2001.
  • 8JAEGER J, SENGUPTA R, RUZZO W L. Improved gene selection for classification of microarrays [A]. Pacific Symposium on Biocomputing [C]. Hawaii, USA:[s. n], 2003, 8 : 53 - 64.
  • 9DUDOIT S, FRIDLYAND J, SPEED T P. Comparison of discrimination methods for the classification of tumors using gene expression data [J]. Journal of the American Statistical Association, 2002,97: 77 - 87.
  • 10MITRA P, MURTHY C A, PAL S K. Unsupervised feature selection using feature similarity [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(3): 301- 312.

同被引文献65

  • 1李霞,张田文,郭政.一种基于递归分类树的集成特征基因选择方法[J].计算机学报,2004,27(5):675-682. 被引量:26
  • 2彭红毅,朱思铭,蒋春福.数据挖掘中基于ICA的缺失数据值的估计[J].计算机科学,2005,32(12):203-205. 被引量:9
  • 3吴斌,沈自尹.基因表达谱芯片的数据分析[J].世界华人消化杂志,2006,14(1):68-74. 被引量:14
  • 4李建中,杨昆,高宏,骆吉洲,郭政.考虑样本不平衡的模型无关的基因选择方法[J].软件学报,2006,17(7):1485-1493. 被引量:24
  • 5Schena M, et al. Quantitative monitoring of gene expression patterns with a DNA microarray [J]. Science, 1995, 270 (5235) : 467-470
  • 6Ben-Dor A, Bruhn L, Friedman N, et al. Tissue classification with gene expression profiles [J]. Journal of Computational Biology, 2000, 7(3-4): 559-583
  • 7Golub T R, Slonim D K, et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J]. Science, 1999, 286(5439) : 531- 537
  • 8Dougherty E R. Small sample issue for microarray-based classification [J]. Comparative and Functional Genomics, 2001, 2(1): 28-34
  • 9Li W, Yang Y. How many genes are needed for a discriminant microarray data analysis? [C] //Proc of Critical Assessment of Techniques for Microarray Data Mining Workshop. Netherlands: Kluwer Academic, 2000:137-150
  • 10Blum A. Relevant examples &. relevant features: Thoughts from computational learning theory [C]//Proc of 1994 AAAI Fall Syrup. Menlo Park, CA: AAAI Press, 1994:14-18

引证文献4

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部