摘要
针对非线性高维医学数据降维的困难,引入了一种新的非线性降维方法Isomap,并从算法原理的角度讨论了方法在医学数据处理中的适用性。该文将Isomap应用在两个典型医学数据集(肺癌基因表达数据和乳腺癌病理数据)的分析中,发现它们的本质维数都低于3,因而可以得到在低维投影空间中的可视化表示。实验进一步将Isomap和主成份分析(PCA)的投影结果相比较,并统计类内距离,结果显示Isomap优于传统的线性降维技术。这说明了非线性降维技术在高维医学数据分析中的潜力。
It was difficult to find the intrinsic structure of high dimensional medical data by traditional technologies. The new method named Isomap was applied to two classic medical datasets, lung cancer gene expression data and breast cancer pathological data. The intrinsic dimensionalities of these two datasets were found to be less than three, so they could be visualized in low dimensional space. Comparison of a standard linear dimensionality reduction method, PCA, with Isomap showed that Isomap gave better performance when calculating the within class distance. Therefore, nonlinear dimensionality reduction technology has potential in the analysis of high dimensional medical data.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2004年第4期485-488,共4页
Journal of Tsinghua University(Science and Technology)