期刊文献+

基于主成份分析的肿瘤分类检测算法研究 被引量:9

Research of a Tumor Diagnosis Algorithm Based on Principal Component Analysis
下载PDF
导出
摘要 基于基因表达谱的肿瘤诊断方法有望成为临床医学上一种快速而有效的诊断方法,但由于基因表达数据存在维数过高、样本量很小以及噪音大等特点,使得提取与肿瘤有关的信息基因成为一件有挑战性的工作。因此,在分析了目前肿瘤分类检测所采用方法的基础上,本文提出了一种结合基因特征记分和主成份分析的混合特征抽取方法。实验表明,这种方法能够有效地提取分类特征信息,并在保持较高的肿瘤识别准确率的前提下大幅度地降低基因表达数据的维数,使得分类器性能得到很大提高。实验采用了两种与肿瘤有关的基因表达数据集来验证这种混合特征抽取方法的有效性,采用支持向量机的分类实验结果表明,所提出的混合方法不仅交叉验证识别准确率高而且分类结果能够可视化。对于结肠癌组织样本集,其交叉验证识别准确率高达95.16%;而对于急性白血病组织样本集,其交叉验证识别准确率高达100%。 The tumor diagnosis method based on gene expression profiles will be developed into a fast and effective method in clinical domain in the near future. Although DNA microarray experiments provide us with a huge amount of gene expression data, in fact, only a few genes relate to tumor. Moreover, it is difficult to extract tumor-related genes from gene expression profiles because of its characteristics such as the high dimensionality, the small sample set, many noises and redundancies in gene expression profiles. In this paper we propose a novel feature extraction approach which projects high dimensional data onto a lower dimensional feature space,which improves the SVM-based classification performance of gene expression data. We have examined two sets of gene expression data (colon dataset and leukemia dataset) by means of SVM classifiers with different parameters to validate the proposed approach. Experimental results show that SVM has a superior performance in the classification of gene expression data using the principal components extracted from the top-ranked genes based on the gene ranking method. The cross-validation accuracy of 95.16% has been achieved for colon dataset using SVM classifiers and 100% for leukemia dataset also. Another advantage of the proposed method is that the results of the sample classification can be visualized in the form of 2D or 3D scatter plot.
出处 《计算机工程与科学》 CSCD 2007年第9期84-90,共7页 Computer Engineering & Science
基金 国家自然科学基金资助项目(60233020)
关键词 支持向量机 基因表达谱 肿瘤分类 主成份分析 SVM genc expression profile tumor classification principal component analysis
  • 相关文献

参考文献24

  • 1Kira K,Rendell L A.The Feature Selection Problem:Traditional Methods and a New Algorithm[A].Proc of the 10th National Conf on Artificial Intelligence[C].1992.129-134.
  • 2Zhang Xue Wu,Yap Yee Leng,Wei Dong,et al.Molecular Diagnosis of Human Cancer Type by Gene Expression Profiles and Independent Component Analysis[J].European Journal of Human Genetics,2005,5(9):1018-4813.
  • 3Eisen M T,Spellman P O,Botstein D,et al.Cluster Analysis and Display of Genome-Wide Expression Patterns[J].Proceeding of National Academy of Sciences,1998,95:14863-14868.
  • 4Vapnik V N.Statistical Learning Theory[M].New York:Springer,1998.
  • 5Furey T S,Cristianini N,Duffy N,et al.Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data[J].Bioinformatics,2000,16(10):906-914.
  • 6Guyon I,Weston J,Barnhill S,et al.Gene Selection for Cancer Classification Using Support Vector Machines[J].Machine Learning,2002,46 (1-3):389-422.
  • 7Cho Sung-Bae,Won Hong-Hee.Machine Learning in DNA Microarray Analysis for Cancer Classification[A].Proc of the 1st Asia-Pacific Bioinformatics Conf on Bioinformatics[C].2003.189-198.
  • 8Li L,Weinberg C R,Darden T A,et al.Gene Selection for Sample Classification Based on Gene Expression Data:Study of Sensitivity to Choice of Parameters of the GA/KNN Method[J].Bioinformatics,2001,17(12):1131-1142.
  • 9Simek K,Fujarewicz K,Swierniak A,et al.Using SVD and SVM Methods for Selection,Classification,Clustering and Modeling of DNA Microarray Data[J].Engineering Applications of Artificial Intelligence,2004,17(4):417-427.
  • 10李颖新,李建更,阮晓钢.肿瘤基因表达谱分类特征基因选取问题及分析方法研究[J].计算机学报,2006,29(2):324-330. 被引量:45

二级参考文献26

  • 1Lander E.S..Array of hope.Nature Genetics,1999,21(Supplement 1):3~4.
  • 2Ramaswamy S.,Golub T.R..DNA microarrays in clinical oncology.Journal of Clinical Oncology,2002,20 (7):1932 ~1941.
  • 3Ramaswamy S.,Tamayo P.,Rifkin R.et al..Multiclass cancer diagnosis using tumor gene expression signatures.Proceedings of the National Academy of Sciences of the United States of America,2001,98(26):15149~15154.
  • 4Golub T.R.,Slonim D.K.,Tamayo P.et al..Molecular classification of cancer:Class discovery and class prediction by gene expression monitoring.Science,1999,(5439):531~537.
  • 5Hedenfalk I.,Duggan D.,Chen Y.et al..Gene-expression profiles in hereditary breast cancer.New England Journal of Medicine,2001,344(8):529~548.
  • 6Li X.,Rao S.,Zhang T.et al..An ensemble method for gene discovery based on DNA microarray data.Science in China(Series C),2004,47(5):396~405.
  • 7Tibshirani R.,Hastie T.,Narasimhan B.et al..Diagnosis of multiple cancer types by shrunken centroids of gene expression.Proceedings of the National Academy of Sciences of the United States of America,2002,99(10):6567~6572.
  • 8Khan J.,Wei J.S.,Ringner M.et al..Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks.Nature Medicine,2001,7(6):673~679.
  • 9Guyon I.,Weston J.,Barnhill S.et al..Gene selection for cancer classification using support vector machines.Machine Learning,2000,46(13):389~422.
  • 10Kira K.,Rendell L.A..The feature selection problem:Traditional methods and a new algorithm.In:Swartout W.ed..Proceedings of the 10th National Conference on Artificial Intelligence.Cambridge,MA:AAAI Press/The MIT Press,1992,129~134.

共引文献87

同被引文献97

引证文献9

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部