期刊文献+

Identification of differential gene expression for microarray data using recursive random forest 被引量:8

Identification of differential gene expression for microarray data using recursive random forest
原文传递
导出
摘要 Background The major difficulty in the research of DNA microarray data is the large number of genes compared with the relatively small number of samples as well as the complex data structure. Random forest has received much attention recently; its primary characteristic is that it can form a classification model from the data with high dimensionality. However, optimal results can not be obtained for gene selection since it is still affected by undifferentiated genes. We proposed recursive random forest analysis and applied it to gene selection. Methods Recursive random forest, which is an improvement of random forest, obtains optimal differentiated genes after step by step dropping of genes which, according to a certain algorithm, have no effects on classification. The method has the advantage of random forest and provides a gene importance scale as well. The value of the area under the curve (AUC) of the receiver operating characteristic (ROC) curve, which synthesizes the information of sensitivity and specificity, is adopted as the key standard for evaluating the performance of this method. The focus of the paper is to validate the effectiveness of gene selection using recursive random forest through the analysis of five microarray datasets; colon, prostate, leukemia, breast and skin data. Results Five microarray datasets were analyzed and better classification results have been attained using only a few genes after gene selection. The biological information of the selected genes from breast and skin data was confirmed according to the National Center for Biotechnology Information (NCBI). The results prove that the genes associated with diseases can be effectively retained by recursive random forest. Conclusions Recursive random forest can be effectively applied to microarray data analysis and gene selection. The retained genes in the optimal model provide important information for clinical diagnoses and research of the biological mechanism of diseases. Background The major difficulty in the research of DNA microarray data is the large number of genes compared with the relatively small number of samples as well as the complex data structure. Random forest has received much attention recently; its primary characteristic is that it can form a classification model from the data with high dimensionality. However, optimal results can not be obtained for gene selection since it is still affected by undifferentiated genes. We proposed recursive random forest analysis and applied it to gene selection. Methods Recursive random forest, which is an improvement of random forest, obtains optimal differentiated genes after step by step dropping of genes which, according to a certain algorithm, have no effects on classification. The method has the advantage of random forest and provides a gene importance scale as well. The value of the area under the curve (AUC) of the receiver operating characteristic (ROC) curve, which synthesizes the information of sensitivity and specificity, is adopted as the key standard for evaluating the performance of this method. The focus of the paper is to validate the effectiveness of gene selection using recursive random forest through the analysis of five microarray datasets; colon, prostate, leukemia, breast and skin data. Results Five microarray datasets were analyzed and better classification results have been attained using only a few genes after gene selection. The biological information of the selected genes from breast and skin data was confirmed according to the National Center for Biotechnology Information (NCBI). The results prove that the genes associated with diseases can be effectively retained by recursive random forest. Conclusions Recursive random forest can be effectively applied to microarray data analysis and gene selection. The retained genes in the optimal model provide important information for clinical diagnoses and research of the biological mechanism of diseases.
出处 《Chinese Medical Journal》 SCIE CAS CSCD 2008年第24期2492-2496,共5页 中华医学杂志(英文版)
基金 The project was supported by a grant from the National Natural Science Foundation of China (No. 30371253).Acknowledgement: We sincerely appreciate the comments from Edgar J. Love.
关键词 MICROARRAY gene selection recursive random forest microarray gene selection recursive random forest
  • 相关文献

同被引文献37

  • 1刘春红,赵春晖,张凌雁.一种新的高光谱遥感图像降维方法[J].中国图象图形学报(A辑),2005,10(2):218-222. 被引量:81
  • 2张华伟,王明文,甘丽新.基于随机森林的文本分类模型研究[J].山东大学学报(理学版),2006,41(3):5-9. 被引量:58
  • 3武晓岩,李康.基因表达数据判别分析的随机森林方法[J].中国卫生统计,2006,23(6):491-494. 被引量:21
  • 4武晓岩,闫晓光,李康.基因表达数据的随机森林逐步判别分析方法[J].中国卫生统计,2007,24(2):151-154. 被引量:14
  • 5Bernthaler A, Muhlberger I, Feehete R,et al. A dependency graph approach for the analysis of differential gene expression profiles[J]. Mol Biosyst, 2009,5 (12) : 1720-1731.
  • 6Pounds Stan, Rai Shesh N. Assumption adequacy averaging as a concept for developing more robust methods for differential gene expression analysis [J].Computational Statistics and Data Analysis , 2009,53 : 1604-1612.
  • 7Mehra R, Varambally S, Ding L,et al. Identification of GATA3 as a breast cancer prognostic marker by global gene expression meta-analysis[J]. Cancer Res,2005,65,11259 -11264.
  • 8Egger M, Smith G D. Meta-analysis. Potentials and promise[J].Bmj, 1997, 315:1371-1374.
  • 9Rhodes D R, Yu J, Shanker K, et al. Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression[C]//Proceedings of the National Academy of Sciences of the United States of America, USA,2004.
  • 10Schneider J, Ruschhaupt M, Buness A,et al. Identification and meta-analysis of a small gene expression signature for the diagnosis of estrogen receptor status in invasive ductal breast cancer[J]. Int J Cancer, 2006,119(12) : 2974-2979.

引证文献8

二级引证文献72

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部