摘要
基因芯片技术的发展为生物信息学带来了机遇,使在基因表达水平上进行癌症诊断成为可能。但基因芯片数据高维小样本的特征也使传统机器学习方法面临挑战。本文利用真实的基因表达数据,测试了目前主要的分类方法和降维方法在癌症诊断方面的效果,通过实验对比发现:基于线性核函数的支持向量机可以有效地分类肿瘤与非肿瘤的基因表达,从而为癌症诊断提供借鉴。
The development of microarray technology will bring opportunities to bioinformatics and makes it possible to diagnose cancer on the level of gene expression. But the high-dimensional characteristics and small number of samples in microarray data sets also challenges the traditional machine learning methods. In this paper, we compare the effect among the popular classification and dimensionality reduction methods in the diagnosis of cancer using the real gene expression data, the result demonstrates that SVM based on the linear kernel can better classify tumor and non-tumor gene expression, and thereby provide a reference for cancer diagonsis.
出处
《生物信息学》
2013年第3期161-166,共6页
Chinese Journal of Bioinformatics
基金
国家自然科学基金(61001013)
黑龙江省教育厅科学研究项目(12521392)
黑龙江省自然科学基金(F201119)
关键词
基因芯片
癌症诊断
分类
主成份分析
Microarray
Cancer Diagnosis
Classification
Principal Component Analysis