摘要
肿瘤基因表达数据是典型的高维小样本数据,直接对其进行识别存在维数灾难,需要对数据进行维数约简.提出了一种基于谱回归分析和核空间最近邻分类器的基因表达数据分类方法,采用谱回归分析得到可有效提取低维鉴别特征的投影矩阵,然后通过投影矩阵对基因表达数据进行维数约简,得到的低维数据用核空间最近邻分类器进行识别.通过在Prostate-Tumor,4-Tumors两种肿瘤数据集上的实验,证明了该方法的有效性;同时证明了核空间最近邻具有比最近邻更好的分类效果.
Cancer gene expression data is a typical data with high dimension and small sample,identifying it directly will encounter the curse of dimensionality,so needs dimensions reduction.This paper proposes a kind of classification approach based on Spectral Regression(SR)analysis and Kernel space K-Nearest Neighbor(KKNN) classifier for gene expression data,it gets the projection matrix through Spectral Regression Analysis witch can extract effectively discriminative characteristics of low dimensions,and reduces the dimensionality of gene expression data by projection matrix,then identifies the low-dimensional data reduced with the Kernel Space K-Nearest Neighbor Classifier.As the experiments operated on the cancer datasets Prostate-Tumor and 4-Tumors demonstrate the effectiveness of the proposed algorithm;simultaneously,compared with the K-Nearest Neighbor(KNN) classification approach,The Kernel space K-Nearest Neighbor has a better classification result.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2011年第8期1955-1960,共6页
Acta Electronica Sinica
基金
中央高校基本科研业务费(No.10120018)
关键词
基因表达数据分类
核空间最近邻
谱回归分析
维数约简
gene expression data classification
kernel space k-nearest neighbor
spectral repression analysis
dimensions reduction