摘要
针对局部线性嵌入算法(Local Linear Embedding,LLE)利用试凑法寻找近邻数耗时的缺陷性,提出一种增强的核局部线性嵌入算法(Enhanced Kernel Local Linear Embedding,EKLLE)自动为样本分配邻域;该算法以高斯核函数为核心改进标准LLE距离度量准则,结合样本的类别信息,无需人工干预自动为样本设置不同的近邻数,克服了试凑法获得最优结果时需要大量时间;最后在各样本近邻数不相同的情况下对数据进行维数简约及待测样本分类。EKLLE算法有效地将高维基因表达谱数据映射到低维本质空间中,解决了传统LLE算法不能很好地处理含噪声或者稀疏数据的缺点。通过对比其他肿瘤样本分类实验,验证本文方法的实时性和精确性。
Using the trial and error to find the optimal neighbors in a Locally Linear Embedding(LLE) requires much time to obtain the optimal result. In this article, a novel supervised Enhanced Kernel Local Linear Embedding (EKLLE) was proposed to optimize the samples' neighbors. Firstly, a new similarity measure function based on Gaussian kernel was proposed as the new criterion. Then, without human intervention, this approach estimated the optimal number of nearest neighbors by detecting the distribution of the each sample. Finally, the EKLLE classified the samples to be tested under the condition of different neighbors. The conducted experiments on gene expression profiles datasets validated that the proposed method could map the high dimensional datasets into low dimensional intrinsic space. The new algorithm could nicely deal with noise contaminated or sparsely sampled datasets and it also did well on new samples. The exactness and effectiveness of this algorithm were verified through the tests on tumors classification.
出处
《生物学杂志》
CAS
CSCD
2014年第1期82-86,共5页
Journal of Biology
基金
国家自然科学基金(61172127)
安徽省自然科学基金资助项目(1208085MF93
1208085QF104)
安徽大学"211工程"学术创新团队基金资助(KJTD007A)
关键词
局部线性嵌入
维数简约
基因表达谱
高斯核
locally linear embedding
dimension reduction
gene expression profile
Gaussian kernel